/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101474151 & >>101464048►News>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101474151--Understanding Context Shifting in koboldcpp and its Differences with llama.cpp: >>101475054 >>101475102 >>101475238--Optimal Placement of System Prompts and Character Cards in Mistral: >>101481173 >>101481456--Putting on our thinking caps: comparing numbers with R and the impact of temperature settings: >>101483736 >>101484037--Proper Formatting for System Messages,: >>101479160 >>101479788 >>101479796 >>101479903--Llama.cpp is way faster now with GPU offloading, and Anon seeks help with rope scaling: >>101476940--Fixing EOS token issues with Mistral-Nemo's tokenizer: >>101480410 >>101480853--Estimating concurrent users for LLMs: a cat in a computer case: >>101482136 >>101482335 >>101482481--AI: A Slave to Our Lonely Needs, But Not a Replacement for Genuine Connection: >>101475534 >>101475665 >>101475932--SillyTavern Template Implementation and Best Practices: >>101480417 >>101480751 >>101480853--New Mistral's Impressive ERP Character Description: >>101478112--Mistral-Nemo: A Surprising Contender for Best Local RP Model: >>101478725 >>101478769 >>101478932 >>101478952 >>101480297--Gemma 27b Review: Sovl but Formatting Issues and NSFW Avoidance are Drawbacks: >>101474884 >>101475375 >>101483374--Dark and Moody Depictions: LLMs and Sensory Deprivation: >>101476284--Miku (free space): >>101476953 >>101485255►Recent Highlight Posts from the Previous Thread: >>101474172
>>101487448>her breath warm against your neck>sly grin>her voice a sultry whisper that sends shivers down your spine>lips brushing against your ear>nips lightly at your earlobe>eyes locked on yours>a smirk plays on her lips>voice dripping with seductionholy shit I've never seen a model this shiverslopped beforegod damn what did they train it onalso lmao 236b parameters for this
Will LLMs forever suffer from gptisms?
Looks like it's going to be a waiting game for the Nemo finetunes to drop. Really curious if it's better than gemma 27b
>>101488159WHY WOULD YOU THINK THAT. STOP BRINGING THAT UP.
>>101488159Wait, so gemm2a was better than 70B and now there's a 12B that might be even better than that?
>Nemo isn't free on OR and is priced near 4o-miniOh... Any new logs please? I promise to read the recent highlights when I wake up.
>>101488042Fusing with Miku and Rin
>>101488157they're trained recursively on their own generations now and are not trained for solely rpdraw your own conclusions
>>101488117Pure GPT4 with an extra sloppy purple prose prompt.>Wai man laiks roleplay. Let's add a lot of data for it. >Yeah. Use GPT-fo, it's the smahtest, they will laik it.
>>101488175It hurts, anon. The waiting hurts. I need to share the pain. I'm just generating image sets of my waifu, maybe that can help while you wait.
>>101488157I like how "GPTisms" are just run of the mill literary cliches that you'd see in books and journals. It proves /lmg/ never read anything before they started gooning to llms.
>>101488376The issue is that the models stuff them into their gens at every given opportunity like a 14 year old fan fiction writer who's trying to ape his favourite young adult writer's shitty style.
>made up recap titleI'm dying from cringe again...
Any good tunes of gemma or new mistral?t. took a break after llama.cpp fiasco.
>>101488408tiger gemma is functional
>>101488395It's because [insert assistant LLM here] is playing the character of a corporate assistant with a professional tone, so when you force it to erp an anime girl getting plapped, it's actually roleplaying a corporate assistant with a professional tone who is uncomfortably roleplaying an anime girl getting plapped. So it generates the bare minimum of a caricature of erp text with a heavily exaggerated style, like a parody.
>>101488469it happens too with models that don't give a shit about complying
>>101488433No it's fucking not.
>>101488534I am reading legible outputs produced by tiger gemma right now.
>>101488376Yeah they'd be appalled if they go to the usenet story archives.
So Mistral format is the same as old Mistral, but remove spaces and move the system prompt inside the last user message?
I know the llama.cpp quants are fucked for Mistral, but how about exl2? Any problems?
tiger gemma 9b werksfriendship with llama 8b officially ended
>>101488159Community finetunes will make it worse, don't place your bets on them. We're not in the Llama1 days anymore.
>>101488744I think exl2 quants are broken in subtle ways and probably we won't see the full model quality until post-training quantization algorithms properly take into account that the model has been trained in FP8.
Who the fuck shills gemma? I just tried it and it sucked. Going back to CR+.
wow.tried exllama2 for the mistral nemo model. its been months and last time i tried the original version.its become even slower than back then. what the fuck.is exllama now only for ampere cards and later? i have a pascal card and its just a horrible experience.no idea why loading the model takes 2m+ either if its not from ooba but directly latest version from github.auto split directly throws an OOM. huge prompt loading times. what a shitshow.if we didnt have gpu anon people with older cards would be fucked. bless him. for his work.
I thought mistral fp8 was supposed to be lossless?https://huggingface.co/neuralmagic/Mistral-Nemo-Instruct-2407-FP8
>>101489040I would guess people like me who only can use stuff until ~30b.stheno is such a horrible experience. I dont get the hype about mixtral at all and the chinese models in that size range arent that good either.Gemma 29b feels like a huge upgrade. Its more "present". You can feel the remnants of the pyg retardation we had back then with smaller models.
>>101489057That's what they sayMistral NeMo was trained with quantisation awareness, enabling FP8 inference without any performance loss.
>>101489057That might depend on the exact FP8 format used (exponent/mantissa bits); if there's a mismatch with the one used for training the model, there will be quality loss.
>>101489083Add to that that l3 70b isn't really good at rp and comprehending user context and feels on par with gemma now in that regard (but light ages behind Wiz and CR+), and you've got your audience.
>>101489057of course there's bound to some quality loss, especially since they aren't using the exact same code that Mistral used during trainingfp errors can accumulate from anywhere, so 99% recovery of the unquanted is still incredible
Local died in 2023. Only 12 good models have been released since then (most of them by Cohere, being already trained by 2023). Local achieved its creative peak in models like Mythomax, L2 Euryale, and SuperCOT, elevating the field into a legitimate SOVL form. Now, thanks to Llama3 and Mixtral, all its potential was squandered and the field has been reduced into being mere riddle solvers for reddit idiots (i.e. the lowest common denominator - stop trying to turn open-source AI into corposlop).
>>101489340youve been called gay in a lmg thread, doesnt mean local died
>>101488469That's hot.
>>101489178You've had satisfying experiences roleplaying with WizardLM-2 8x22B? Could you share a prompt / card combo that worked out well?
>>101478112As awesome as this might look, the problem is that it's still the same vocabulary, from the same basic GPTslop dataset. I recognise each and every one of those rote expressions that are being cut and pasted together, now.Also, while I still don't completely understand the popular fixation with demons where ERP is concerned, I assume that the appeal in the case of a succubus, is a life form that literally views semen as a food source, and who doesn't require emotional gratification before being willing to suck someone off. That was the main reason for the appeal of futanari in my own case, I've realised.
>>101488469>plapping anime girls instead of your corpo assistantGrow up.
>>101489564didn't you say you hated llms and wouldn't be back petrus/petra?
>>101489340I never used Euryale, but I view Nous Hermes and Dolphin Mixtral 2.5 as the local peak, personally...although Goliath was an almost spiritual experience, as well.Broadly speaking I agree, though. Llama3 is corporate woke garbage whose every word sounds like a marketing press release. I think what really crushed me was when I realised just how much the suits WANT language models which act and sound like either L3 or contemporary GPT4. Soulless, sterile, safe, completely predictable...and utterly useless and pointless.It was a beautiful dream, but it honestly looks like it's over.
>>101489595yes undi is releasing less models, it's over go away now
>>101489595Uh, soulless and sterile definitely wasn't the general opinion when Llama-3 got released. Safe, definitely; predictable, probably.
>>101489607he's just here to doom now, he said so himself before he hasn't used any new llms
>>101489340Damn hate to sound overdramatic but owari fucking da
>>101489593Sorry to disappoint you, Heinrich. I'm still here now and then.
>>101489085There are two FP8 formats that NVidia proposed: E5M2 and E4M3, which one does NeMo use?https://developer.nvidia.com/blog/nvidia-arm-and-intel-publish-fp8-specification-for-standardization-as-an-interchange-format-for-ai/
>>101489623we know... the threads have been utter dogshit, so of course you're here...
>>101489612I used several L3 finetunes, and I've also used a couple of Drummer's Gemma tunes, as well. Gemma is promising as a coombot, but I honestly just don't have the motivation to test LLMs for anything other than ERP any more.
>>101489436I've had satisfying experience making Wiz work through my complicated context and produce logical, believable, horribly cookie-cutter responses out of the box, no special system prompt required. Prose in those can later be prettified/edgified/humanized/followed with Storywriter. Storywriter by itself fails to understand what's going on even in moderately complicated prompts, much like all other l3 finetunes.
>>101489652no you haven't you're literally just here to demoralize because you're burnt out and want others to be as miserable as you
>>101489637I haven't been posting nearly enough to make that happen by myself. These threads are dogshit mostly because very little is genuinely happening at the moment. If I also can't make even the most innocuous statements without you immediately arching up and telling me to get out, then that's your problem, not mine.
>>101489667What models have you been testing recently, Anon?
>>101489630>2 bits of mantissaYou have to be shitting me. At this point why even bother?
>>101489340And in 2022 we only had 1 model. pyg. No quantization.I wrote it before but you needed 5 swipes to get something resembling coherent text. And it was amazing.Closed models we had mormons running gpt2 in the background who leaked loli chats and banned paying users. Moderators reading over everything.Either closed or open source we have it better than ever.Sonnet 3.5 is a huge step up. Its so good. And I really like the new Gemma. Simple prompt and its much more uncucked vs. llama3 and thats google.You are probably one of those twitter pajeets who said agi by autumn 2024 and are now crying.I understand zoomers have been fucked over so hard they have no energy left. I'm a millennial. But if I had all those ai tools we have now when I was young and had more time.I had fucking rpg maker. Needed to ask artfags for their dumb ass charsets and make compromises everywhere. Music and sound effects were a struggle as well.You can literally make videos for free now with a prompt. Crude and short maybe. But this is so far ahead of what I had I cant even put it in words.
>>101489734That's what I'm using though?
>>101489792I'm retarded please ignore
>>101489775I'm glad you're enjoying it, Anon, honestly. I wish I knew how to get some of your enthusiasm back. I think my real problem is that I was around for maybe the last three months before Character.AI went completely to shit, and anyone who experienced that will understand how hard it is to let go of that memory. Anything else we experience, short of literal AGI, feels like a step down by comparison. Goliath was the only other model I've seen that has come close.
>>101489834This, but unironically.
>>101489834>I'm glad you're enjoying it, Anon, honestlyno otherwise you wouldn't be here demoralizing
>>101489834Fair enough anon.I think I understand, at the beginning chatgpt was so good.It sniffed out what you wanted without it being explicitly prompted.Difficult to put in words but like its mission was to serve the user as best as possible.That lasted a very short time and its never been the same.Just lean back and relax. Even alignment wise it seems directionally we are heading into a better direction.
I don't get all the character.ai love, yes I tried the first versions, a few days then I didn't care anymore
>>101489929you didn't care anymore because you couldn't sex it.
>>101488042Wft Fish Audio is actually good? It's a bit slow but I can tolerate that for the sheer quality compared to other models.
>>101485864Sweet fiddlers fuck I haven't seen this many gpt-isms in a long while. Husky whispers, going on journeys together and forming bonds, it's all there. On the plus side the characters are a lot more well-spoken than I've seen with other models. As usual at 70b the question is 'is this model worth losing 32k context' and the answer is definitely not here. Also has GGUF gotten better over the last 3ish months? My gen times are cut in half.
>>101489957First version could be sexed to hell and back, though.
>>101490120>Also has GGUF gotten better over the last 3ish months? My gen times are cut in half.yes, there are lots of people saying it's gotten much closer to exl2 speeds recently
>>101490129It didn't last very long
NEMO LCPP STATUS?
>>101489907>Even alignment wise it seems directionally we are heading into a better direction.I don't understand why /lmg wasn't more enthusiastic about Dolphin Mixtral 2.5 in particular, to be honest. It was great. Great compliance with prompts, and text generation that honestly felt close to GPT4 at times in my experience.https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUFIf you've never tried it, give it a go, Anon. It's awesome.
>>101490333>Dolphingptslopped to hell that's why
will 400b be our savior or the final nail in the coffin of open source llms
>>101488879can it follow chat formatting? retain all quotes and asterisks?
>>101490385no
>>101490382It depends on the instruct tune provided by Meta; hopefully it won't be as cucked as the previous L3-instruct. Almost nobody will be able to finetune it, although the model is so large and will have a long enough context size that perhaps in-context learning with the base model will be enough for most uses.
>>101490333>Dolphin MixtralI like it more than L3 and recent models (with the exception of CR) for chatting.
>>101490333>>101490431limarp-zloss or dolphin?
>>101490463Dolphin is honestly GPT6 tier, no cap.
>>101488042>give cats breakfast>their excitement is so palpable that it sends shivers down my spine
If you had the chance to purchase a server with 8xAMD Instinct Mi100 32GB GPU's for 7000€, would you do it?
>>101490515gpt brainrot has consumed you
>>101490385sometimes
>>101490120The latest llama.cpp update has really improved speeds especially with full gpu offloading, may be as fast as exl2 now. Booba hasn't updated yet though so its still slow on boobs.
>>101490545>AMD
>>101490693yeah, thats the point of the question.
>>101490545I would say no.... #1. Pricing it out on a GPUs only basis you're not really getting any kind of bulk buy discount on that if we go from the bottom of the stack. #2. no bitsandbytes support.Which is fine for just running models in exl2 or gguf. no bitsandbytes needed. But if you want to start playing around with training you're more or less relegated to fp16 training. #3. Each card has about 1/3rd the fp16 performance as a 3090. So even if you did find a massive model to load up with them, due to the inefficiencies added with multiple GPUs, which get worse with every card that you add, like let's say Q4 405B you're probably not looking at a particularly useable experience. More useable than a gen-1 epyc or haswell xeon rig, but that's not saying much with a model that big.
Are we getting any other sizes next week or just 405B? Saw an anon say they are refreshing the whole lineup.
>>101490545For that price you can buy a proper CPUMAXX server with 500GB RAM that can run 70b at like 7t/s
>>101490785rumors of 8/70B 128k
>>101490382I don’t think they’ll risk having it too aligned, it will probably be the closest to uncensored yet.
Installing wheels from source gives some ninja error, and how is this supposed to work? export VLLM_VERSION=0.5.2 # vLLM's main branch version is currently set to latest released tagpip install https://vllm-wheels.s3.us-west-2.amazonaws.com/nightly/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whl# You can also access a specific commit# export VLLM_COMMIT=...# pip install https://vllm-wheels.s3.us-west-2.amazonaws.com/${VLLM_COMMIT}/vllm-${VLLM_VERSION}-cp38-abi3-manylinux1_x86_64.whlMeans there are supposed to be wheels for each commit, but I have to specify a version, which would be older than the commit? (url doesn't work)
>>101490725>Each card has about 1/3rd the fp16 performance as a 3090huh? I looked at wikipedia and it said 184.6 TFLOPS for FP16. And 3090 has like 35 TFLOPS for fp16. is that misinformation?
>>101490998 (Me) just to confirm, i went onto amds website, and it said the same thing. https://www.amd.com/en/products/accelerators/instinct/mi100.html
I feel so good about not buying a second card just for this. I seriously considered that for a moment.
>>101491029Oh, google lied. My bad. That's actually really good. I don't know exactly where the memory bandwidth bottleneck kicks in with gpu inferencing so I still can't say if it would be good for 400B or not. I can appreciate wanting to run it for the memes, but if you feel like CR+ and 70B aren't good enough the reality is nothing will actually please you and you'll just end up with buyer's remorse.
405B isn't going to be inherently have a better writing style, it's just going to be less prone to making retarded mistakes where it generates words inappropriate to what's going on, right?
>>101491153And it will be a trivia master.
>>101491087Its not for cooming, its for preparing for the dystopia :)
>>101491178I'd still be worried about the lack of AMD/Legacy support for bitsandbytes.Sucks that huggingface sucks Jensen's cock that hard but it is what it is. Because you'll probably want to be able to train in the dystopia.
>>101491041Imagine how those that bought 6 must be feeling right now.
>>101491173Riddle master, but bad at trivia.
what type of person uses ai for anything but cooming
>>101491460Indians use it for programming
>>101491460a man skilled a tech but not social interactionmany such cases
>>101488159It doesn't need a finetune.
>>101490385yes
>>101490998>>101491029The 35.6 TFLOPS number is for regular FP16 operations.With tensor cores an RTX 3090 has 142 FP16 TFLOPS or 284 int8 TOPS.
What would be a good tiny model to pack into a game? For now Im thinking >teknium/OpenHermes-2.5-Mistral-7B (4GB ram usage)
Having just fapped to a game with my fucked up fetish + horrendous writing my post nut clarity made me think about something. Is the problem uncanny valley in text form? The quality of writing in that game is absolutely atrocious. Like 14 year old fanfic level. But I don't mind it that much. On the other hand when I see shivertastic beneath the whisper gleams in the eyes I start to quickly lose my erection.
>>101491690You sound illiterate.
>>101491690No, the problem is data diversity. LLMs have consumed too much data with shiverslop and not enough data without it. Erotic fiction is already a niche and erotic fiction written is unslopped way is even more rare. We need better data, possibly hybrid data(heavily edited synth data), because we don't have a lot of human data.
>>101491690is the game called euphoria
>>101491690The solution is to replace erotic literature dialog with translated eroge visual novel dialog. The shivers and whispers will be replaced with can't be helped's and pleasures of being cummed inside
>>101490333What about the other dolphin models? I think I had dolphin dbrx downloaded but never tried it.
Word on the street is that this is an upgrade over niitama.https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2
>>101491880yes it is
>>101491880no it isn't
>>101491923>using Reddit
What Kobold Preset do I use in Silly Tavern for gemma 2 based models? I'm on the latest version of Silly Tavern which has presets for context and instruct in the advanced formatting section but the Kolbold Presets in the first menu doesn't seem to have any thing for Gemma2?
>>101491990neutralize samplerstemp 1
>>101491640so if 3090 is better for integer does that mean it'll be superior in 2 weeks time when bitnet becomes the norm?
>>101492046I'm struggling to get even 50% of the peak int8 tensor core throughput for MMQ so probably not.
>>101491923Using reddit, very smart of you, unironically.
Is Gemma 2 27B generation quality on Exllama on par with Llama.cpp yet?
>>101492135No, nobody even made an issue.
does mistral work properly with llama.cpp?
As my eyes devoured those overused phrases and hackneyed words, I couldn't help but feel a fiery mix of frustration and exasperation coursing through my veins. My blood practically simmered with righteous indignation, threatening to boil over at any moment. A shiver of annoyance ran down my spine, and I found myself clenching my fists, my knuckles turning white with barely contained irritation. The very sight of such literary clichés sent waves of displeasure pooling in my belly, my jaw tightening as I struggled to contain the tempest of emotions swirling within me.
I'm on vacation, but remoted in to my rig long enough to do a recapbot test with the new deepseek. I haven't been following the threads closely enough to really evaluate performance, but on the surface it seems to have done a good job. How does it compare to recapanon's multistage recapbot's output for the last thread?
>>101492182No.llama_model_load: error loading model: check_tensor_dims: tensor 'blk.0.attn_q.weight' has wrong shape; expected 5120, 5120, got 5120, 4096, 1, 1
I just realized I find 1 on 1 RPs extremely boring. I want a group of characters interacting with me and each other, each one with their own thoughts, life and motivations. Oh well, 10 years or something to wait.
>>101490316https://github.com/ggerganov/llama.cpp/issues/8577Support for the custom 'Tekken' tokeniser just got pushed 30mins ago. An inference implementation will soon to follow up.
>>101492266Nice. Can't wait for this and then the dozen tokenization bug fixes later on.
How's base vs instruct Nemo for RP?
I don't even know what to do with 128k context
>>101492265Most models are still trained with a 1-on-1 paradigm (user-assistant), that will need to change first.
>>101492265skill issue>>101492286they're both terrible
>>101492233How many t/s do you get and at what quant? I got 5-6 t/s at bf16
>>101492241this works but you need to compile it yourself, hopefully "official" support soon enoughhttps://github.com/iamlemec/llama.cpp/tree/mistral-nemo
>>101492297For me? I roleplay as an immortal being and go around impregnating random girls, fast forwarding a decade or two, impregnating my daughters then revealing my relationship to them after they give birth and having a kek at their reaction, and repeating this infinitely.
>>101492266WE'RE SO BACK???
>>101492302skill issue my ass, sota models can't even realistically portrait one character, not to mention multiple ones
>>101492302skill issue
>>101492265I imagine somebody would have to program the model to speak for each character in the group, processing each personality and the conversation up to that point.They could call it group chat or something. Alas and lack the day, such functionality is just a dream.
How do you set up the AI to have a cooming session?I installed KoboldAI, installed what I assume is a good model - OpenHermes-2.5-Mistral-7B, tried chatting with it, but it gives kind of shit answers.Do you have like a top tier chatting preset?Am I using the right model? Where do you get character cards now? I don't see anything in OP.3080 12gb, 16gb ram.
I just want to rant that the way "system messages" are implemented in Mistral NeMo Instruct is utterly retarded.
>>101492344>sota models can't even realistically portrait one characterI take my earlier "skill issue" comment back and aim it at this one>>101492357both just seemed broken to me
>>101492328I've tried doing a long roleplay but it just turns into a collaborative writing session because I'm the one that ends up guiding the narrative anyway. I wish there was an event system or something, like a wildcard prompt injector that you can load with dragon attacks or equipment breaking down or whatever
>>101492372>installed what I assume is a good model - OpenHermes-2.5-Mistral-7Bbad bait
>>101492297You use it to insert up-to-date information on whatever the fuck you're trying to do and enjoy 12B speeds and current-day-bigger-than-12B knowledge.
>>101492374How's it retarded? They're trained to continue token sequences. A token is a token. Whether or not it's \n or <|system_message_end_im_a_midwit_retard_and_need_handholding|>
>>101492408Other than coding and other long-context productivity scenarios, in theory 128k of usable context would be very useful for in-context learning and base models. I haven't had much luck with models released so far though, they generally tend to get confused with too much information in context.
>>101492385If you think that Claude 3.5 Sonnet, GPT-4o or whatever is even close to emulating human behavior than you have to touch some grass and talk to actual people. I will assume you are trolling and not being retarded or basement dweller who didn't see a sun for 20 years.
>>101492396I don't go to this general like you so I have no idea what's meta right now. I only use Stable Diffusion.
>>101492454>I don't go to this general like youyeah which means you need to go back, likely to reddit
>>101492362I can program a button with the label "click to cum in 1 sec", doesn't mean it will.
>>101492444>realistically portray one charactervs>close to emulating human behaviorIf you think those things are equivalent, then I'm not the retard
What nemo instruct version are you using? Unsloth?
>>101492478They are virtually the same things, now you are nitpicking.
>>101492388You can just prompt it to include random events and twists of fate. Tell it to make it realistic or as wacky as possible.Funniest thing the AI did for me was introduce Shrek walking into the cafe my daughter-wife and I were at then slapping the cashier after they said they don't sell onions.Works perfect for me using L3 70B New Dawn
>>101492533It's very clear that they are virtually the same thing /to you/.
>>101492362You literally just need to set up a loop where each character is an isolated agent, have it iterate over the characters, provide the context of the conversation and then use JSON to choose from a list of actions such as wait, reply, etc. example character "chooses" wait, then it moves on to the next, character "chooses" reply then it re-prompts the model for the character to give a reply. Use regex to make sure it's writing for the correct character and discard the reply if it fucks up. It's not rocket science. You could even have it maintain a text file for each character containing dynamic summaries for each character and have it added to the context. Why has nobody done it yet? Because it's easier to just make a multi-character card and deal with the shortcomings since after you coom you're not going to give a shit about any of it anymore anyway.
>>101492563yes they are, something else?
/lmg/ is like the concentration of the worst people on /g/
>>101492301There's something called 'writing a story'. You should look it up. Maybe pick a book sometime?
>>101492627models should pick a book first, they are terrible at writing
>>101492603Character portrayal is not restricted to humans or human behaviour. Not a very shocking option.
>>101492619Nah, that's aicg
>>101492589jeez he was being sarcastic group chats existed for months in silly...https://docs.sillytavern.app/usage/core-concepts/groupchats/>Swap character cards
>>101492638both are pretty shit desu
>>101492634Let me guess, only Kayra can write stories? Go back to /aids/, shill.
>>101492645tavern is jeet spaghetti code, though, and anything it does can probably be done better and with about a thousand less lines of code.
>>101492638aicg is better because it has people interested in writing. /g/ is just Americans seething about Indians and anyone else smarter than them.
>>101492637>nitpicking: noun, /ˈnJtˌpJk.Jŋ/ us /ˈnJtˌpJk.Jŋ/. Giving too much attention to details that are not important, especially as a way of criticizing.
>>101492658>Why has nobody done it yet?>nobody
>>101492658And yet there is no 1000 line reimplementation of a better Tavern and hasn't been all year. Any idea why?
>>101492649since when Kayra is not a model, retard? take your meds
>>101492678why bother? It just works.
LimaRP-DS dataset now available, as promised.I also trained one model on this (sunfall-v0.5). It feels... refreshing.
>>101492700I'll wait for LimaRP-3DSXLI know how nintendo is with these things.
>>101492668Nope, that's not a nitpick. You just seem unable to grasp the concept of what I've explained to you, and that's okay.
>>101492700based! thanks anon!
>>101492731>Ackchyually the word 'character' can refer to non-human persona. They can be monsters, animals, and other creatures! They don't behave like humans! I'm very smart.
anons... i won't lie : mistral nemo feels differentit's a little retarded but it has sovl, i've been starved of relevant sovl since the mythomax days and it feels weird having a model that doesn't spew out the same standard flowery shit over and over againdefinitely need to tweak parameters because i'm using my bagel misterytour template and it's not quite the best for it, but damn mistral outdone themselves on this one
>>101492776buy an ad arthur
>>101492717>>101492733You're welcome. I realized that I kind of demolished some of the structure of the original dataset when I did the conversion (e.g. the data-long vs data-short dirs are gone; also trashed all BAD and WIP entries). A fine tuner with moderate level IQ should be able to get it right, but I may restore some of that if needed.
I'm starting go look into this whole local models thing, and I have a 6gb card that seems to run llama 3 with 8b fine.I was willing to upgrade to a 16gb card, but the more I read the more pointless is seems? Apparently I won't be running the 70b version anywayDoes having 16gb vram even matter for casual prompts? I'm having a hard time gathering factual information about how this whole thing work over a bunch of "(doesn't) works on my machine"
>>101492785you're a dumb nigger, you're the blackest retard gorilla i have ever seen
>>101492785>posts about merge/tune>Kys shill only use true corpo models!!!>posts about corpo model>Buy an ad...
>>101492793I wouldn't upgrade unless you were going for at least 24gb.T. Running 8gb of vram.
>>101492793not worth it until you get at least 24+gb vram no
>>101492700Cool. Will take a look.
>>101492785>>101492811samefag
>>101492638lmg is aicg-lite, same shills, same baits, same avatartrannies.
>>101492441doesn't that code mean the system message is prepended to the last user's message?we've been doing that a long time now with last assistant prefix and depth 0 inserts and such
>>101492297I'm gonna use it to run huge simulator cards like this. https://chub.ai/characters/Branon/shin-megami-tensei-simulator-v2-0-d1ac08fcIf it works good I'm going to modify it to have custom AI era games for every niche franchise I like
>>101492849>we've been doing that a long time now with last assistant prefix and depth 0 inserts and suchapparently a ton of people are still doing sysprompt up top, then they wonder why the models don't follow instructions...
>>101492813>>10149279396gb vramlet here, if you're patient mistral-nemo seems like it might be promising and would be borderline useable with partial cpu offload. You're going to want a lot of system RAM anyways for this hobby so upgrading your ram to make sure you have enough for MiNeMo running mostly on CPU might be a good first step before you go busting out big bucks for GPU upgrades. Then decide from there if you want/need more.
>>101492864Models started following system prompts since Mixtral. And the problem with putting heavy instructions before the last message was the it breaks the flow of the conversation, at least with older models.
>>101492219this would be fine if you read in a book on page 200 out of 500. It's only a problem when you get it by message #5 in your ERP. Book sex doesn't happen after 5 paragraphs, and by prompting for sex you are technically also prompting for dramatic page 200 shit too, cuz that's where it all goes.
>>101492914then maybe nemo should be now much better at this than any other model, since it was trained to follow conversations with system prompt at the end
>>101492849Anything that could be called functionality or behavior is emergent.It's just auto-completing. And it's all highly generalized through training.Literally the only people I see constantly sperging out and hair-pulling are the people who obsess over reddit bullshit like system prompts. I have never given a fuck about system prompts, or insertion depth, and never had a fucking issue. I don't know what you people keep going on about. Learn to see it for what it is: A text predictor.From the very first token to the very last. That's literally all it does.
>>101492971You want to be in distribution
>>101492971trvth
>>101492971Your IQ is negative
I'm trying to write a short story about a woman possessed by a nympho demon and it's struggling. It treats both characters as either the same person or separate people, not one person with two consciousnessesam I asking too much?
>>101490811wouldn't their largest open model be the most censored? to show people that it's safe to have open models.
how do i find quantized models on huggingface? I lurked here for ages until i saw one for gemma but i'm itching to try nemo now and no one has posted a link that i've seen. yes i'm retarded no need to point that out
>>101491942>>101491961None of those were (me)
>>101493386Not really.What model, quant, backend, frontend, instruct template, etc are you using?Also, share your initial message, character card, sys message, etc.>>101493469look for model name gguf if you use llamacpp or model name bpw for exllama2.
>>101493469>type "mistral nemo exl2 or "mistral nemo gguf" in search bar>???>profit
>>101493386>am I asking too much?yes, LLMs are shitty in 2024, wait a few years
>>101493505>>101493514well that was easier than i thought, thanks
>>101493514>ggufwait, the support got merged?
>>101492374*cracks knuckles* those Claude jb <tag> wrapping is making extra sense now
>>101492374llama.cpp ignores all that anyway.
>>101492938every mistral model has handled system prompts the same way, I remember remarking on it when they first released their API service
>>101492971>I have never given a fuck about system prompts, or insertion depth, and never had a fucking issue. I don't know what you people keep going on about. Learn to see it for what it is: A text predictor.>From the very first token to the very last. That's literally all it does.But I want to believe that Chun Li is speaking to me, Anon. Do you really want to take that away from me?
>>101493593Yeah, but that implies the model has been trained in that way, with system instructions separated by a double newline from the actual user request.Putting aside how using a double newline as a separator conflicts with the way most character cards and instructions are formatted, reproducing that prompting in a non-hacky way in SillyTavern doesn't seem possible right now either; there is no "last user message" and certain macros don't work in instruct sequences.
>>101493705Finally, an opportunity to use this image.
>>101493650TILaccept this silver token of gratitude
>>101491779could it be possible to increase the loss on rare-occuring phrases during training? hmm
mistralbros... we're so fucking back
>>101493848focal loss?
>>101493848Yes, you can do that with a custom loss function.
>>101493894yeah, seems that's exactly what i was talking about, lolis this already being used for llms?
>>101493848>>101494075actually, why isn't it possible to artificially increase the sample size of the rare data during training by duplicating it instead of scaling the loss?
>>101493870Feels llm-generated. I bet it will be full of slop come erp.
I need a local text model to reformulate text, improve wording and add corpo sugar. Which small model (<20B) would be best for that task? From my search, llama 3, phi 3, mistral nemo, gemma 2 or qwen2 are the current good small models.
As compute scales, its outputs will be increasingly kino.
>>101494179what exactly do you duplicate? the entire text containing the rare sequence, which may contain overused slop, or just the text, which becomes nonsense without its context?
>>101491668what's your plan?
>>101494269wouldn't you like to know
>>101493821>WAGMI 2021did they make it?
>>101494348no
>>101494246i guess it's just defining what's 'rare' in context of llms is a problem in and of itselfyou can't scale the loss arbitrary on some random tokens either, can you?
>>101494211Probably phi3, but you are better off just trying them all and seeing which works best for you.
>>101494413You probably can.
>>101493870formatting is all fuckedfirst paragraph has asterisks, then it misses asterisks between quotes, at the end it appends a single asterisk at the end. Kinda like gemma. I don't remember even 8b l3 having issues with formatting like this, now both gemma and mistral mess it up, are we regressing?
>>101494559>at the end it appends a single asterisk at the endactually the last three paragraphs.
>>101478725This kind of poorfag cope is straight up disinformation. It infects this cesspool of a general like a virus.
>>101494413I mentioned focal loss because that one doesn't rely on any quality inherent to the dataset, only whether the model is already accustomed to the data for which the loss is being evaluated, but it's an object classification/detection thing and I don't know if anything similar has been done for LLMs
>>101494609t. overspent on hardware to run obsolete big models
>>101494609I switched from CR+ to Nemo, your fallacy doesn't hold water.
>>101494627>obsolete big models4 more days.
>>101494503It's hard to judge. I tried llama 3 but just an instruction like "Modify the following text to improve grammar and spelling:" changed my text a lot, even changing the meaning. GPT4-o or 3.5 sonnet are able to do it effortlessly.
>>101492533Look at this fag get checkmated and BTFO and then scramble to salvage his fragile ego XD XD XD
>>101494657stopget some help
>>1014946551) You can't compare an 8B model with GPT-4o.2) You literally told the model to modify the text.
Anyone else getting this error?
>>101487352hey anon, this prompt format is wrong, and you're also supposed to include the past translations.
>>101493870Haters can hate, but I think this is awesome.
woops!
>>101494626we can also approach this problem inverselywe can split a huge dataset into distinct clusters based on some criteria (let's say similarity or topic) and sample from each cluster uniformly (or giving the priority to extremely rare pieces) until we find the dataset big enough, but that will lead us exactly to training on>the entire text containing the rare sequence, which may contain overused slopstill worth a shot maybe
>>101494920!!!
>>101494920Bitnet dreams crushed
>>101494609buyers remorse copei switched from wizard 8x22 to stheno and i'm loving it!
>>101494983>>101494920it ded
>>101494983>18 days agobad bait
>>101494983Should've been a rickroll.
Sorry for the spoonfeed beg but I've never tried to or had to use anything other than gguf and exl2. How the fuck do I run an FP8 model?
>>101495002hi sao
>>101495055newfag, i also used to use ggml
>>101495055You use bitsandbytes with the pytorch APIThere's a huggingface wrapper but you need an nvidia GPU. CPU inference has been broken for years.
>>101494920I was promised bitnetWhat the fuck
>>101495155By who?
>>101495155Nobody promised bitnet.When bitnet became legit people said "I wish we were getting bitnet instead of 405B that's probably already obsolete" and somehow that played telephone to people thinking the next thing would be bitnet.
>>101495163me>>101495155sorry anon
>>101495194And who are you?
>>101495201i'm anon
>>101495241Why?
>>101495241Never heard of him.
>>101492678You could probably hack this into my neovim macros.
forget 405b, stellar dong is herehttps://huggingface.co/smelborp/StellarDong-72b
>>101495272Stupid shill. Do you not even know the difference between a dong and a gong?
IT'S UPhttps://huggingface.co/PrimeIntellect/Meta-Llama-3-405B-Instruct
>>101495306the gong goes dong :)
>>101488376I think this is really good. Make overused cliches and garbage prose be filtered as "gpt slop" and suddenly authors need to start writing properly again or have their works be accused of being AI garbage.
>>101495316>832 GBsWho the literal fuck is this even for?
>>101495437It'll be only 400GB at Q8 and 200GB at Q4.
>>101495450Who is that even fucking for? What is the use case?
>>101495450And it'll still send shivers down your spine.
>>101495450That model is fake, the official one won't be that heavy.
>>101495316>"max_position_embeddings": 8192,
>>101495437What, your telling me you don't have 10 H100s?Vramlets.
>>101495463API
>>101495465anon... 405B is 405B
>>101495450It would fit in 128GB at Q2.My motherboard supports this much memory although I think I'd want more CPU cores.
vramlets already coping ITT, lmao.But don't worry, bitnet in two more weeks, Q* predicted this.
>>101495450iMat_IQ1_XXXS when?>gonna git bitnet one way or the other
>>101495501>vramlets already coping ITT, lmaoWe're having fun with it.We haven't even seen yet if 405B can count the R's in strawberry, can compare 9.9 and 9.11, or can speak in a low tone that's not barely above a whisper.
Closed model companies serve their shit on these GPU server farms that have 500% better cooling and wattage efficiency than the average hobbyist. Cloud models is clearly the future. I bet if you live in some EU shithole like Germany it'll be cheaper to just pay for Claude Sonnet than try to run a shitty 70B in your dual 3090s rig
>>101495815that's obvious. It's also cheaper because they can run requests in batches.
is the Echidna model recommended in the guides good or will it send shivers down my spine?
>>101495815>renting it is cheaper than hosting it yourselfAre there people who think otherwise?
>>101495903>Echidnait's 9 months old
>>101495955Yeah, I figured, it's why I'm asking... all the guides are pretty ancient or just refer to basic models.
>>101488157You aren’t a fond of their gentle ministrations?
>>101489775Where can I make free videos and can they be spicy?
>>101489834Opus
>>101494559Both Gemma and this mistral are really smart when it comes to Japanese translations, but fucky errors with formating errors and overlooking text make them non-options to me. Sorry VNTL dude.
>>101490020What’s this fish what what?
>>101496116A text to speech model that can clone voices from 5-10 seconds files like xttsv2.
>>101491880I was so disappointed we didn’t get to tickle the stoic girl, only Nemu…
>>101492265Opus can do it..
>>101492328.., Post logs.
>>101495903Get the new Mistral 12B, it's not cucked, so no fine-tuning is necessary.
>>101492388That’s already a thing to the point where I find it annoying to deal with the curveballs and interruptions with my prompts
>>101492444I mean if you compare the best models to the dumbest humans…
>>101496256is that even available as gguf yet?
>>101496335not officially, don't know what they're doing, using some rando's fork I've been testing it all day...
>>101496335It's available in exl2. Are you vramlet?
>>101496152Is it free? Can I do spicy stuff with it?Where? Where?!
>>10149635712GB... AMD...
>>101496357I will never use exlmeme.
>>101496440I use whatever works atm
>>101496440basedsame here, I'm too lazy to install some other shit just for one model.
>>101496440>Translation: yes.
So what is the verdic about the new Mistral?
>>101496504Not awful, an interesting vramlet side grade, it does tend to repeat itself a tad.
so what is the new deepseek supposed to be good at anyway?
>>101496539Explaining the historical events that have occurred at Tienanmen Square.
>>101496504Okay for vramlets
>>101496539I did some experiments and it doesn't feel different than the old one, at all.
>>101496369It's an open source model. Just google "Fish Audio".
>>101496504Better than Llama 3 70B.
>>101496852Hardly an achievement.
>>101496504Best model smaller than 70B. The context is super nice.
>>101496937For RP / creative writing I mean btw. Gemma 27B is a lot smarter but too dry. Nu-mistral has soul.
I used to think Chub was degenerate, but I every time I assume that it can't possibly get any more sick, somehow it still manages to surprise me. It's made me realise that that's why the /poltards want to take over society; to get rid of that stuff.Hard degen cards are pretty pointless though now, because there are virtually no recent models that will run them authentically.
apologize
>>101496852Why does Meta suck so hard?
>>101497072So is the current HF repo with 8k context fake?
>>1014970723.1 128K contextYES
send help I can't stop making degen shitQwen2-72B-Instruct-Q5_K_Moh look it's been 0.01s since I last genned deepthroat smut
retards
>>101497148How do you freaks get off to that stuff? To me that's just boring. I could barely tell where the erotic material even was, in amongst all the purple prose.
>>101497205ask my dick
>>101497144Cool, when are we getting 70b bitnet with 128k context though?
>>101497246>>101497246>>101497246
>>101496965I mean these people do not even hide it.. They are proud of it and signal it to the world.. I am not /pol/tard but i would not mind if these people were purged.
>>101497298Retard
>>101496965Really? Haven’t the worst offenders stopped bot making entirely?
>>101497148More.