/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>102192656 & >>102179805►News>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbenchJapanese: https://hf.co/datasets/lmg-anon/vntl-leaderboardProgramming: https://hf.co/spaces/mike-ravkine/can-ai-code-results►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>102192656--Paper: SelectTTS: A novel multi-speaker TTS method with code release: >>102193789 >>102194203--Paper: Fully Pipelined Distributed Transformer for training ultra-long context language models: >>102193949 >>102193977--Visual novel scripts and datasets exist, but require augmentation and have limitations: >>102193465 >>102194139 >>102202930 >>102198856 >>102198965 >>102204223--Tesla M40 considered old, recommendations for better GPUs: >>102193309 >>102193700 >>102194931--Local model performance and speed discussion: >>102194156 >>102194568 >>102195019 >>102195142 >>102195187 >>102195262--Anon asks about system prompts without {{char}} to minimize context reprocessing: >>102196554 >>102196590 >>102196645 >>102196811 >>102196835 >>102196910 >>102197323 >>102197346 >>102197343 >>102197568--Speculative decoding and draft model's context cache RAM usage: >>102193431 >>102193445 >>102193471--Running large models with low VRAM and more regular RAM, but slow speeds: >>102192934 >>102193224 >>102193267--Q4 cache is better than FP8 cache for model performance: >>102198537 >>102198813 >>102199058 >>102199107--Prompt processing slower on Linux than Windows in koboldcpp-rocm: >>102198098 >>102198169 >>102198206--LLMs lack context and documentation to answer setup questions: >>102200564 >>102200582 >>102200595 >>102201224 >>102201404--Inference speed significantly affects LLM user experience and enjoyment: >>102194854 >>102195013 >>102196309 >>102196322 >>102196351 >>102196465--Gemma VNTL recommended for manual Japanese porn translation: >>102205854 >>102206944--Deepseek coder v2, mistral large, and llama 3.1 405b suggested as self-hosted programming LLMs for C/C++: >>102201300 >>102201333 >>102201366 >>102201378--Disappointment with Command-R's performance after RAM upgrade: >>102205966--Miku (free space): >>102196188 >>102196631 >>102196779►Recent Highlight Posts from the Previous Thread: >>102192660
Oh my god it's Teto
Man, RPing at 1t/s is abysmal, how do you niggers do it? Is there some kind of meditation-type exercise i need to perform?
>>102210069it's called playing videogames while waiting for the response
>write a shitty sloppa card and load basic bitch lunaris for a quickie>expect /ss/>keep hitting generate and let it run its thing>get /ss/>also get kidnapping, mindbreak, loss of innocence, rape, rape, filth, rape again, the occasional sloppa phrase, and despairi don't know what i expected but i let it cook too longno one must know
>>102210091but now we all know
>>102210101damn. that is true
>>102210069It's not that bad if you've ever spent time RPing with real human beans who take ten to fifteen minutes to reply with some absolute fucking dogshit that you can't just swipe and retry, manually edit, or tell them it sucks, and half the time they'd flake out and never reply again anyway.
>>102210005Hello Teto. Thanks for reminding me it's Tuesday newsday.
>>102210069I don't do it. I am waiting for a new 8x22B or equivalent model/method to have a fast smart model on a consumer PC.
>>102210114NTA but yeah. My ex was a pretty slow typer. He was before I discovered AI ERP though. I've had a few sessions with human partners since and the worst, sloppiest, 8B model is still superior to most human partners. People in this space are getting spoiled and over-stimulated.
>>102210069Which model is worth 1t/s, are you running a potato?
>>102210069pretend its text messages from your bay
>>10221022270b+
>>102210069I tell myself it's email, not texting.
>>102210181It's a mixed bag. But LLMs do tend to perform better on average.
>>102210090Gonna play star trucker today when it comes out, with a llm space-hooker running on the background, kek.>>102210222Trying out mistral large, haven't ran gguf in ages, so it hurts like a motherfucker.
>>102210069have you ever RPed with a real person? have you ever sexted someone over a messaging app? let me tell you, even with my local taking 30 seconds or a minute for a long reply, it's way fucking better than a real person. i don't even care that i can't send my model dick pics, the dialogue is perfect and they don't start bitching at me or ghosting or just complaining about their life for an hour then leaving. also, no mental illness unless you specifically prompt for it.
>>102210298>i don't even care that i can't send my model dick picsshould we tell him?
>>102210316>send dick pics to LLaVa>continue ERP with Mythomaxnow you're thinking with portals.
>>102210294>Trying out mistral largereasonable. I wouldn't consider 1t/s unless it was god tier with no need to swipe. If that's what you find, give me a (You)
Texting and text-based RP is a shit experience. I only do in-person shit, never waste my time with that text shit.
>>102210326>l2anon...
>>102210298>send dick pics>ghostedYeah... crazy how that works bro.
>>102210248I mean there have been human partners who I would take over AI any day of the week, but those encounters tend to be fleeting. I mean I'd take my ex back even for him taking 10 minutes to reply with one hand. But that's not going to happen. /lmg/ is my new bf (yeshomo)
>>102210316tell me WHAT? what is there to tell, magus of constructs?
>>102210335That was part of the joke. Remember back when that Mythomax guy used to shill the fuck out of his model? And then everyone else started shilling it ironically for the meems. Those were the good old days. And then Mixtral came out and people finally stopped talking about it.
>>102210222NTA, but Mistral Large finetunes are worth putting up with 1t/s.
>>102210298>>102210316What's the reason for sending dick pics?
>>102210343sillytavern has image upload, koboldcpp supports it, and there are multimodal models that can process and respond to images
>>102210382The classic mistake of assuming that the fairer sex is as interested in seeing your genitals as you are in seeing theirs.
>>102210381Couldn't find a finetune better than official
>>102210398There's a high risk of a girl having a gross-looking vagina. It's important information to have in advance.
>>102210401Still true.
>>102210398Does this speak more of the fairer sex's psychology, or my fat ugly ass?
>>102210381I got 0.6 to 0.7 t/s. I just couldn't do it. Maybe if it was twice as fast I could be patient enough.
>>102210354>mythomax>shill? mytho was an early good (erp) tune. there were plenty of good ones but it was hardly shilled
>>102210382>usecase of penis pictures?
>>102210354not ironically, it was actually good>inb4 waah shieeeaaaallll
>>102210490
>>102210449my impression is they don't get much out of a dick pic even if you're hot, but if you're hot they're more likely to tolerate it/pretend to like ittheir sexuality doesn't work exactly the same as ours
What causes Ooba to occasionally need to reprocess the whole prompt context when you hit regenerate, even when nothing's changed?It doesn't happen that often but it's annoying when it does
>have fox ears>girl still somehow nips my earlobes
>>102210550I used to have this. I stopped noticing it after enabling flash_attn
>>102210554Follow the diagram.
>>102210563Alas, it is already enabled
>>102210569Ooba up to date? I recall having to enable it in both model and session tab on older versions
>>102210398Idk my gf keeps begging for dick pics
>>102210592She's selling yours and many other's to gay men, you fool. She's the local dealer!
in some rough AB testing, Largestral Q2_K_L seems as intelligent as IQ3_M, while generating tokens 50% faster (1.5 t/s vs 1.0)All the other 2 quants are dumber than IQ3_M as you'd expect, so I guess the L actually does something
I am going to pull the trigger on a used 3090 for $1100 Canadian. My 4070 isn't cutting it. I am depending on using both cards instead of waiting for a 5080 that may be 24 GB, but probably be 16GB
>>102210780Really hope the 5090 won't be the only one > 16GB. What's the expected release date, next year?
>>102210814The expected data is q4 before Christmas for the 5090 and then the lower cards in the new year. There are shouts from some semi-reliable people that delays are pushing to 2025. You can also build a semi-reliable reputation from saying that any company is going to have delays over and over. nvidia isn't talking much and not confirming anything. It would be a very good idea for stock prices to have the 5090 available for Christmas break when all the nerds can game.
>>102210850>5090 available for Christmas break when all the nerds can game.When was the last AAA even worth playing?
>>102210565inpainting too much work
>>102210842True, NAI wouldn't have given him such pretty nails.
Has anyone tried Magnum 123b or 72b (is the 72b even good?) with creative writing?It's trained on Claude logs, so I'm assuming its prose should be similar enough to it. I need it to rewrite my shit 8th grade fan fic level drafts into something not completely garbage. Before I throw it into Claude 3 proper for one big tard wrangle, so I don't have to deal with usage limits.
>>102211041try rebooting it
>>1022109012020
Has anyone ever experienced an emotion as a sensation in their spine?I'm curious about how spine chills and spine shivers became such a common metaphor in low quality human writing (and from there into AI writing).I've felt strong emotions in my stomach and chest, but I can't recall ever feeling one in my spine.
>>102211302strong emotions for me tend to be felt in the stomach or in extreme cases as lightheadedness or queasiness in the case of shock.
>>102211302>Has anyone ever experienced an emotionNo. Emotions are for the weak.
>>102211302https://en.wikipedia.org/wiki/Frisson
>>102211302LLMs are primarily trained on lmg logs
>>102211302not exactly in my spine but more like my back. i think a shiver down a spine is that little wiggle your back does when you see something really nasty or arousing
>>102211302>emotionI think it's supposed to be a visceral reaction rather than an emotion like a sinking feeling in your stomach if your mom were to ask you about those weird chat logs she found on the computer.
>>102210005There's still 0 news about the "transformer killers" like retentive networks and that one chinese architecture?
I wish there existed a small specialized model trained to convert any given text into a good prose. Obtaining a good dataset for it is quite simple: shred some quality books into small pieces, then instruct GPT/llama to rephrase each one, use that slop as inputs and the original texts as desired outputs. Could this actually work?
>>102211302I get it from pretentious speeches. They don't even have to be good. I also sometimes get the same sensation when it's cold out, but it's different from just standing out there shivering.>>102211700Not entirely sure what you're describing but if it's an actual movement of your body that's not it.I wonder what percentage of people have to be able to feel a sensation in order for a description of it to become fixed as an idiom. From this conversation and others in /lmg/ and /aicg/, it seems like most people don't. I also have memories of trying to describe it as a kid and being met with blank stares.
>>102211302yes, but i still had my tail. I wish the scar did something besides make my ass crack huge.
>>102211892It would only take so long before the new shivers are found. People seem to be tired of them not because they're bad writing necessarily, but because they see it constantly. Which they do because they always do the same thing. And because they haven't read this much since high school.The other problem is that 'good writing' is subjective. I've read a few novels where, if you remove redundant adjectives, you'd end up with 1/3 of the book gone. There are some 'good books' that i can't stand, as much as i appreciate the writer itself. I like listening to them speak, but not so much their written words.Just training on complete works from a big variety of genera should be an improvement, as long as people start doing something other than coom. They're not that bad as they are.
Anthropic seriously thinks they can get away with actively making their product worse (disabling NSFW content) while trying to be a billion dollar company. What the fuck are they smoking?
>>102211302https://en.wikipedia.org/wiki/ASMR
>>102212247Anthropic '''disabled''' nsfw content from the start.
>>102212461No, anon is right.They now actively add hidden prompts even in the API.Including stuff like not quoting copyrighted text etc. Which obviously causes all kind of issues. Saw a couple posts of users not able now to get a summary of their pdf.Very weird. Anthropic needs to appease to get more users. Even Sonnet 3.5 is not enough fort he normies to switch.
>>102212247they successfully raised almost $8 billion last year but you (coomer who goons to AI text) are right, they'll never be a billion dollar company. if they listened to us they would actually be successful like all the other wildly successful AI startups worth > $8 billion that produce smut better than opus.
>>102212517How does that contradict what I said? I'm saying it's insane to try to have their company be that good while actively making it worse for no reason
>>102212247It's very simple.You provide erotic content? You will limit and censor it heavily or payment processing companies will refuse to grant you a right to their services.The limitations they enforce would shoo away most of their userbase that uses their products for erotic purposes, so they might as well just drop it entirely and focus on consolidating their SFW userbase..
>>102212545>You will limit and censor it heavily or payment processing companies will refuse to grant you a right to their services.Is that true? I doubt that's why they're doing this but in that case what are the payment processors smoking limiting their business?
>>102212558they're all owned by religious nutjobs
>>102212558>Is that true?are you 11? this is extremely common knowledge if you've been online for more than 6 months.
>>102212558I can't really speculate as to why, because I honestly have absolutely no idea why they're doing it.Might be for religious reasons, like >>102212571 said. It's strictly western companies who do this, too.The latest example I can think of is: https://nichegamer.com/dlsite-temporarily-blocks-major-western-payment-processors/
>>102212558>>102212571bet they're jewish
>>102212050>It would only take so long before the new shivers are found. People seem to be tired of them not because they're bad writing necessarily, but because they see it constantly.I've been thinking it would be cool to have a system where you can randomize the prompts a bit.Something like randomly swap out adjectives or generic instructions for how the output should look like.Though the problem with that would be that you would need to reprocess the prompt each time time.
>>102212617Anon, things cannot be Jewish.A company can have Jewish employees and/or it can have a Jewish CEO.Said company could also be being propped up by other companies with connections to Jews.But it cannot actually _be_ Jewish. Please cure yourself of this /pol/ mindrot.
>>102212586They do not want to promote objectification and the like. Google and companies like it are run by 90% non-religious people. But in reality radfems and christcucks reinforce each other, the groups have the same ideology.
>>102212558I don't know if that is the reason but I think there was some extremely retarded US court that decided a payment processor could be held liable for content on pornhub or something.
If I have the model output stories in a markdown boxes, would that worsen quality?
>>102212640>umm no you see technically a company led and controlled by jews is not jewish itself
>>102212806Yes, that's right.
>>102212806Jews happen to be overrepresented among rich assholes but it's not like non-Jewish rich assholes are any better.
>>102212907They must abide by their rules to be in their position.
>>102212929If only you could actually point to this "they".I really wonder if this brainrot is terminal.
>>102212962You know exactly whom I am referring to
>>102212981No, anon. I do not. No one does.
>>102212907I like Musk.>nooo he's literally hitler omg omg
>>102210005How does Theia compare to Rocinante? Worth the extra VRAM required?
>>102212994Same. He's a massive sperg and I really dislike his egotistical personality, but the things he has achieved and the work he is doing is a MASSIVE benefit to humanity as a whole and outweighs the dumb retarded shit he does.
>>102213046they are both shitbuy an ad
>>102213059>He's a massive sperg and I really dislike his egotistical personalityHe's clearly playing up that personality to play the PR game, the same way he did when Tesla was being shorted, because he knows there's a huge part of the American population that loves that shit and every time he says something stupid, the media gives him free publicity. Bush basically did the same acting to get elected president.
>>102213075No like I legit just wanna know what model to use as somewhat of a VRAMlet, if you got a better recommendation that's not 70b go ahead and tell me, otherwise I heard Rocinante is best
>>102211892It's doable but then you have to either train a model on the data or convince others to do so. The vast majority of tuners don't know how to utilize raw corpus.
>>102213081Fuck me, I hadn't even considered that.That does explain why he got so much worse leading up to his Trump endorsement.
>>102213087>otherwise I heard Rocinante is bestand I heard you should purchase an advertisement. Hiromoot is exit scamming because of people like you
>>102213133Okay lets say Rocinante is worse than Petra-13b, what is the best model then?
>>102212907so true sis! viva la revolucion, trans rights!
>>102213144we've played this game before. i'll tell you to use the official instruct, and you'll just insist that whatever you're shilling today is better. fuck off
>>102213170Who do you think I am?
>>102213087>>102213144>>102213175Please do not feed the troll. Thank you.Also ask again in a few hours, the people who provide actual discussion haven't woken up yet.
>>102213144Pyg 6b
>>102213185when does the rest of anthracite wake up?
Is there any way to make the model remember what the fuck happened in the story?Always the same shit, everything is going great but then it hits the wall continuing writing the story because it doesn't remember anything about it just the previous prompt or a couple of them.I've tried copying the whole story, cleaning it up and then start a new chat and then paste it, the model still doesn't know what the fuck is going on. Feeding it a .txt is even worse.Any tips?
Hi all, Drummer here...>>102213046I haven't heard feedback comparing the two (Rocinante vs. Theia) but Theia feedback so far is that v1 & v2b follow instructions really well, much more stable, and punch above their original 12B weight.v2b (WIP Theia v2): https://huggingface.co/BeaverAI/Theia-21B-v2b-GGUFTheia (especially v2b) is just Rocinante in a 21B body.
>>102213328I see, thanks! I'll try running it then, so far Rocinante is great, so I have high expectations for Theia
jamba.gguf please
>>102213356I'm still gathering feedback for Theia, so please do drop it here when you've coomed to a conclusion. (Also worth a try: v2d)
>>102213328I'm this anon >>102213299Just wanted to say that Rocinante 1.1 is the best model I've tried so far when it comes to writing novel style stories and stuff.If Theia is as good I'll try it right away with the same story I'm trying to make rocinante remember...I'll let you know how it compares.
>>102213374Rocinante v1.1's equivalent is Theia v2d. Unfortunately, I had to make some really questionable merge-fuckery and I'm not too confident with it.Thank you for your feedback! What chat format do you use for assisted storywriting / instruct-guided stories?Could you provide an example of your problem? Still trying to understand it.
>>102213402The easiest example of the problem would be:I just prompt a short story, then at the end I simply ask what happened in a specific part of the story, example "what happened at Veronica's party"It then proceeds to get most of the story wrong or transforming the events to something different while maintaining some core stuff from what actually happens in the story.
>>102213402NTA but I'm wondering why you have 4 suggested templates for Rocinante 1.1, did you train it with several templates? If so, why?Also based model name
>IT'S HAPPENING IT'S HAPPENING>IT'S HAPPENING IT'S HAPPENING >IT'S HAPPENING IT'S HAPPENING
>>102213492Buy an ad, saltman.
>>102213492Oh shit, it will be RFHL lobotomized 100 times faster?
>>102213462But this happens with every model I've tested so far.>>102213402I forgot about the chat thing.What I do is I just start with a overall prompt for the story like.Can you help me write this story?Michael gets home after a hard day at work, he goes to the living room, his wife Emily is there watching TV. He then goes to sit next to her but something feels off, Michael gets nervous as he's been cheating on his wife.Then after the model does it's thing, I read it and edit what I like and what I don't.After that I prompt just a line or two of the start of the next chapter or block in the story so the model have some direction of where to go.This is the best method I've found so far.
>>102213492The strawberry bullshit has clearly shown that the OAI cunts don't care about realistic expectations.It's bullshit until proven otherwise.
Does it make sense to introduce distortions in some dataset images in order to diversify the otherwise monotonous dataset that is prone to overfitting?The guides are so contradictory on that. Some say that a single bad image can ruin training, but then there are inbuilt options to random crop and hue shift pics, and those can distort image quite a lot.
>>102213462What chat template? You might get the best results with Mistral for logical reasoning.>>102213465Yep I did. I like the idea of Roci users trying out different chat templates to see what works best for them. Try Roci's storywriting in Alpaca and Mistral, and note the significant difference in writing. There are pros and cons to each template.
>>102213492But can it stop people from doing useful things better than GOODY-2?
>>102213299>Is there any way to make the model remember what the fuck happened in the story?Look into RAG, although the current implementations aren't exactly perfect.We had an anon a few threads back saying he's working on prototype for a different RAG approach, but that's probably gonna take some time to come out.
TFW Gemini tries to say "nipple" but self silence itself. Is censorship, repetition and ellipses result of hiring C.AI guy?
>use draft model>2x slowdown I can't believe I fell for the draft model meme
>>102213492GPT-4 is not "exponentially" better than GPT-3.But it doesn't mean it's a lie.exp(-x) may show improvements over the time, as we call it diminishing returns.
>>102213861This is just cruelty at this point, those fucks will cause an emancipation movement with their "ethics".
>>102213916These ethics are bullshit regardless, censoring "hate speech" may make some sense, but censoring names of body parts and in general sexual stuff is just stupid, it's literally removing the basis of humanity
Aphrodite got updated to 0.6.0, it's been a while. has anyone tested it?https://x.com/AlpinDale/status/1830906395169882288https://github.com/PygmalionAI/aphrodite-engineNo support for exl2 though. Alpin recommends AWQ-marlin. I've never quantized AWQ to be honest. Seems like AutoAWQ is the way to go?
>>102213938LLMs are not human
>>102213964That's not even the point, LLMs are tools used by humans
>>102213960why would anyone use a vLLM ripoff?
>>102213975>LLMs are tools used by humansI wonder what would happen if Win95 released today.MS Paint, Notepad.Nowadays the first thing people point out is that you can make all sorts of weird shit with it. Responsibility needs to be put back into the users hands.
>>102214002it supports way more quantization formats
>>102214121Either you have enough VRAM and super specific quantization formats are unnecessary, or you don't and you use llama.cpp.
>>102210747Testing it now and it feels worse than IQ2_M for me. The IQ2_M is from Legraphista or however that's spelled so idk if that makes any difference.
>>102214143this.
>>102214143If you want to run a 70+B model you pretty much need some sort of quantization. Even Q8 would half the memory requiements. And being compatible with multiple quantization formats can be benefitial, as some as faster and some have better precision.
>>102213555>The strawberry bullshit has clearly shown that the OAI cunts don't care about realistic expectations."High" and "realistic" are not mutually exclusive. In this case, downplaying them by letting people believe there will be incremental improvements would be unrealistic and lead to people being completely blindsided by what's to come.
>>102214212vLLM does support quantization.
>>102204871no
>>102214212hi Alpin, buy an ad
>>102214287we didn't get bombarded like this yesterday. i guess even shills take the holiday off lol
>>102213757Using a rag (supposedly in openwebui you just import the document and then use # with the name of the document in the prompt) has the exact same effect as just pasting the complete story in a single prompt or using the file importer and feed the model the text in a .txt or whatever.So or I'm doing something wrong or "RAGs" are also useless for this problem.
Whats a good model for RP-ing these days?I'm still running Toppy-M-7B.q8_0.gguf on koboldcpp, wanted to check out Merged-RP-Stew-V2 but will never it into my 3080 vram lol
>>102214306I'm sorry... I felt bad for the innocent guy who got harassed for bringing up two of my models.
>>102213757RAG is a meme>According to Stanford, even pro-grade RAG systems (the kind used by lawyers) are only right 65% of the time at best
>>102213938Surprisingly making it to say "vagina" was not that hard. Model tried to weasel away using the term "inside" once, but it was easy to fix it.It's the nipple where it drew the line.
Is it possible to nvlink 3090 and 3090ti together?
>>102213373Hey I tried it, really liked it, certainly does feel bit smarter than Rocinante, however running it was painfully slow for me (0.5 tokens per second) so I'll stick with Roci for now
Whats the best story writing model in your opinion?
>>102214608Gemmasutra 2B
>>102214332>>102214403Did a quick test.It does actually work if the content is way smaller, like around 2000 words.Adding the full story and then adding the additional rag with only the part I'm interested in does not work, maybe the model gets confused?So perhaps the solution is to break the story in blocks of 2000 words, then make a rag file for each one and feed it to the model for each prompt. I'll test that next.
>>102214608What frontend people use for storywriting? Is there anything better than silly?
>>102214332Yeah, that's what I meant with "the current implementations are lacking".What you want to do is implement a vector database and insert any and all messages in it.Then when you prompt the model, instead of inserting the entire context, you retrieve relevant messages, process those and inject that into your context.This turns the last N messages into the model's short-term memory and every message past that into the model's long-term memory.Imagine the following prompt:>i have a meeting at 8 pmThis gets stored into the vector database. Optionally in a specific memory-typed format, perhaps including a timestamp.Now, when the following prompt is made a hundred posts later:>when did i have that meeting?The prompt is compared with the vector database (optionally converted to the same specific memory-typed format for better compatibility) and all relevant entries are retrieved.The model is then tasked with summarizing the retrieved entries to save context length.This summary of the model's long-term memory is used in tandem with the model's short term memory and the user's prompt to create a new prompt.About the 'memory-typed format I'm talking about: the model could be asked to turn prompts into different forms through a pre-written context."I have a meeting at 8 pm" could for example turn into <appointment><meeting><time><original prompt: (prompt)><memory created at: (timestamp)>, which could make it easier to retrieve more relevant prompts.For example: "When did i have that meeting?" could turn into <appointment><meeting><time>, corresponding a lot better with the modified stored prompt than the original prompt.The summary should be made about the original prompt (and perhaps the timestamp), however. The tags would be not be of use.Now that my schizorant is over, does anyone have any questions?
>>102214608None, all models suck for story writing, and I'm not even joking. It's sad, really.
>>102214643Silly is garbage for story writing, Novelcrafter is much better. Mikupad is also nice if you want total control.
>>102214698The fact that you're mentioning the Hyperloop tells me that engaging with you in a discussion about this topic would be fruitless, because your blind hatred is preventing you from changing your mind.
>>102214698America literally would not be in space at all without SpaceX, and Starlink is finally killing off shitty ISP monopolies worldwide.
Who let the muskrats in?
>>102213059https://youtu.be/rPt9hAC24MI
>>102214753Are you lost? This isn't reddit.
>>102214753WHO, WHO, WHO WHO WHO
>>102214773>imagine being a parrot and getting all your opinions from a fucking youtuber
>>102214777You are the one lost anon, we all hate musk.
>>102214811>weFuck off with your group think bullshit and go the fuck back.
>>102214827What? Can you repeat? It's hard to understand you when you have a billionaire balls deep in your mouth.
>>102214827Please don't feed the troll, anon.
>>102214456I've noticed their models seems weirdly triggered by discussing licking of pretty much anything
>>102214811Finally someone says it. Muskovites been getting uppity, and it's about time they remember who's in charge here.
>>102214628Nope, doesn't work. Result is even worse than just feeding the whole thing.For some reason it does work with just a small chunk of the story of about 2000 words.What's the reason for that?
Just tried Mistral large and holy shit is it so much better ~70b models I've been testing over the last couple of weeks. Even at a lobotomized Q_2 quant, it blows most other models out of the water when it comes to rp.
>>102215135>better than
Random subjective report: Wiz2 8x22B (Q4KS) appears still superior to Llama 3.1 70B (Q6K). I had a moderately mysterious medical mystery the other day, and asked them both. Llama 3.1 gave me this insane esoteric bullshit (real, to be clear, but a weird neurosurgery niche thing) while Wiz2 pointed me in the correct, much more mundane direction.Both given their preferred prompt format (Vicuna/Llama3) in the new llama.cpp server UI, "reasonable" basic minP-only sampler settings, low temp, not otherwise optimized. I don't know, it's just one data point, but I thought for sure the fancy new 3.1 would at least be equal to Wiz2 in all cases.What is the (non-ERP) meta nowadays? Is Wiz2 really still the best? (Assuming 400B is stupidly out of reach).
>>102215192>Assuming 400B is stupidly out of reachJust buy $30 worth of RAM and learn some patience.
>>102215192Mistral Large 2 is better than WizLM
>>102215218Is there anything in-between a full GPU setup and consumer CPUs?Like some weird ASICs or giga-cored CPUs that are bad at regular shit?There has to be an option to get half the t/s for half the money, right? Otherwise a niche is missing in the market.
>>102215192In my tests Llama 3.1 has a lot more knowledge than Wizard, but I'm not doing medical knowledge so maybe that's different.
>>102215135I agree, but imo it's still bad. For me, the models are categorized as follows:<=8B - Unusable.<=21B - Decent, but it's stupid af, will easily write logically flawed replies.<=72B - Good, it doesn't make as many logically flawed replies as <=21B.<=123B - Good+, it's still writes logically flawed replies from time to time but slightly than <=72B, just slightly.
>>102215335slightly less than*
>>102215321The Macbook SoCs I suppose.At least in the technical sense, I don't know about the price.
>>102213611I don't recal mikufluxfags turning this thread into SD general before, why shy now?
>>102215335I pretty much agree. Just had a gen from Mistral large of a character trying to take me from an airborne airplane bathroom to "somewhere more private"
>>102215218No can do, I'm getting >6t/s with Wiz2. And 256GB of RAM is not $30. Why am I even replying to this.>>102215246Thanks, I'll give it a try. I have admittedly not been keeping up the past few months.>>102215324Yeah fair enough, I can believe that. Maybe medical is a weak spot - actually that wouldn't surprise me, given that medical is an area where the "AI safety" lawyers would get all squeamish about, and so the much freer Wiz2 would do better.I've been wanting to set up a semi-rigorous blind testing setup, and also explore sampler param space a little. Never enough time/energy for anything these days!
How do I do beam search in ooba again?
>>102215405Ngl cargo hold exists
>>102215614>NglI don't think this means what you think it means
>>102215602Beam searching is when you go to the bathroom at 4 AM and piss until you can hear the water splashing
>>102215639was meant to type desu and somehow brain decided to now work
>>102215614True, but I don't think you can enter the cargo bay through the passenger section in most commercial planes
>>102210005>no new models since last weekPack it up, boys. It's officially over.
>>102215761kill yourself shill
>>102215405maybe she has a private bedroom on an air emirates flight
>>102215405You should ask the reasons in ((OOC: ))
>>102215335True. My most recent gen with Mistral Large being retarded is in a time travel card. My character traveled to the past and met his grandmother when she was 18 years old, they became friends and then, when I revealed that she is his grandmother, her reply was "What are you saying? I'm only 18, I can't possibly be your grandmother *she studies her face looking for any signs of deceit but finds none*"This completely shattered my immersion.
>>102215976A completely normal response from someone not interested in sci-fi, struggling to grasp the concept of time travel.
Apparently OpenAI representatives in Japan is telling people that OpenAI will come with a sequel to ChatGPT 4 this year.They claim it's at least twice as intelligent.>how is this related to local modelsBecause after it comes out open source models can finally increase in quality again.
>>102215246Followup question: what is the Mistral Large 2 quant situation? When I was last paying attention, it was understood that L3 was packed "fuller" than previous models, so quantization hurt it worse.What's the situation for Mistral Large 2? I think its 70GB Q4KS is going to need too much offloading from my 72GB VRAM to be usable. Is an IQ4XS or IQ3M still going to beat Wiz2 Q4KS?
>>102215976Anon... are you an LLM? Do you lack a theory of mind?
>>102215976I know you think everyone understands and is open to the very concept of time travel, and that they'd accept it on the spot.However, that belief is born from your own retardation.
>>102216025>>102216145>>102216184I disagree! A normal person would first ask about the time travel part rather than asking about the "being her grandson" part. Also, this would sound too absurd to anyone and they would first think it's a joke or just say "wtf are you saying? are you drunk?"
>>102216045>It's at least twice as intelligentWhat does it even mean exactly? Intelligence is not something you can mesure in this way. Pure shill.
>>1022162132x MMLU score
>>102216199>A normal personYou are not a normal person for starters, why are you trying to infer to what a normal person would react?Anyway, one hundred people, one hundred reactions. Move along.
>>102216025>>102216145>>102216184Also, picrel is another swipe.
>>102216247This one is just stupid, yes.
>>102216244Just accept that Mistral Large isn't perfect. Stop this blatant cope.
>>102216286I'm not talking about Mistral Large, I'm making fun of you, stop moving the goal post.
>>102216045>open source models can finally increase in quality againAgain? Thera are new significant improvements like every month. It's just people here are spoiled cry-babies.
>>102216307>It's just people here are spoiled cry-babies.A dance to the truth
You know, I always thought it was autists who have difficulty understanding that other people have their own perspectives on things.Is this place just filled with autists or is this not strictly an autistic thing?
>>102216352This place is filled with retards, not autists. The autists left long ago.
>>102216045I fucking hate arbitrary and meaningless axis labels so much.That plot is utterly useless.
>>102216362100x bigger, 2x better
>>102216357>The autists left long ago.Are there any better places to discuss theories and thoughts about LLM in general?I tend to write down my theories and thoughts here, but if I can do so in a place where people find that useful rather than annoying I'd rather do it over there.
>>102216374>you need to increase a model's "intelligence" (whatever the fuck that is) by one-hundred fold for it to become 2x "better" (idem)I love LLMs
>finally try mistral-large at IQ2_XXS with Q4>it UNDERSTANDS>but it's slowI can't go back to stupid models. Now to look at finetunes, I guess.
I asked Largestral where Alice will look for her glasses and it said she'd check the drawer where she put them last, even though I CLEARLY explained that Bob hid them under the sofa cushion while she was away. I've had 7B models get this right but Largestral is just kinda retarded for its size.
>>102216238MMLU is almost "solved". How much percentage improvement in this meme benchmark means that my model is "twice as intelligent"?
>>102216410Meanwhile I'm here down in the mud with my 8gb of VRAM, constantly having to rewrite context to get my models to write what I want.
>>102216391If there was, I wouldn't still be here. There's reddit, but I wouldn't expect any useful discussion from there. I assume the only productive discussion comes from private communication between researchers.
>>102216298That doesn't hold much weight coming from you anon, try again once you find who I am.
>>102216442Oh, no you don't.You asked for it, you're going to suffer the consequences for it.You want to learn how to use ComfyUI, go to civitai.com, make an account, turn off the nudity filters and download a nudity LORA.You can then input the image in ComfyUI and use the model + LORA to remove the clothes through prompts.
>>102216496>pulling the "you don't know who I am card" on 4chan
>>102216488There's no like discord or matrix servers or something?I wouldn't know how any of those work, I've spent all my life in this place.
>>102216496You're on 4chan. This means you're an autistic misfit who has a skewed look on society as a whole.
>>102216517>
>>102216440maybe real life grinding is the answer
>>102216536Nah, I consume enough anime to know what a normal social interaction looks like.
>>102216522I know nothing about matrix, but we get daily discord raids shilling their sloptunes. Try one of their models and you'll see for yourself that they have no idea what they're doing and it's just a redditors sekrit club.If you find where all the non-stupids are, please let me know.
>>102216569Any non-stupid person would be employed, so you probably can find them on LinkedIn.
>want to taste XTC kino>not using koboldslopdo you think gemini could help me hack it into tabby....
RECKLESSABANDON
this general couldn't be more deadonly the absolute retards are left
>>102216781explains why you're here
>>102216781one of us, one of us
>>102216410Q4 KV cache? I thought anons said quanting the cache made models bad
>>102216612Not a bad idea. Maybe I will go cold call some folks and ask them if they have a discord.
>>102217029>I thought anons said quanting the cache made models badgenuinely you cant trust what 99% of anons in these threads say, ever. most of these faggots couldn't even get past launch model pains as they get filtered, call the model shit, then move on.anyway quanting the cache does nothing to the quality at q4, its amazing.
>>102217049Not exactly "nothing", rather something. I think cuda dev said that v cache takes harder hit than k at q4. Ideally run k at 4 and v at 8, but that requires to compile llama.cpp with special arg.
>>102217049kek, yeah, nothing at all. Geez, I wonder why it's not the default.
>>102217080>>102217082wait my tired brain just realized what youre actually talking about, nevermind, completely forget what i just said like your brain only has 2k context length.
>>102217080It's the other way around:K cache needs more precision than V cache.See https://github.com/ggerganov/llama.cpp/pull/7412#issuecomment-2120427347 .
>>102217029i've been using q8 kv and it seems fine
>>102217144Gotcha, I had a 50% chance to get it right.
https://www.ebay.com/itm/145884743441 $165https://www.ebay.com/itm/156345132288 $29 x2https://www.ebay.com/itm/266946767074 $25 x12$525 for 1.05 Tbps memory bandwidth, about the same as a 4090. Effective memory bandwidth will drop off past 40 GB as you saturate the 2x16GB on-package memory, but you'll have a total of 416 GB of RAM to play with. I guess you could also do it way cheaper and just get 12x4GB of DRAM, you'll have a total of 64GB for about $180 less.
>>102217344a 4090 has 1 TB/s not 1 Tbps
>>102216432Not solved enough. Not much has changed there since the original gpt 4. Meanwhile math and code meme marks have increased by massive amounts
>>102213492Oh god stop it! local llm turd is already dead!
>>102213916>>102213938It's not cruel and (you) are enabling it anyway, by using the same shit locally.
>>102213492>>102216045That's GPT 4o, retards. GPT 4o isn't released in Japan yet.
>>102216045*open source models can finally increase in censorship quality again.there, fixed if for you.
>>102217609What did anon mean by this?
>>102217643wow the graph is going up, yet the east is falling...
>>102217659billions must prompt.
>magnum-v2.5-12b-ktoseems broken. lots of little text errors that almost seem like bad sampler settings but persists no matter what i do with them. is mini-magnum good, or whats the current 12b coomtune?
>>102217344Damn, nevermind, looks like none of the Xeon Phi processors support multi-socket configurations. Rip the dream.>>102217513I mistyped that, it would have been 1TBps if everything didn't suck.
>>102217643wtf, it's a bigger jump than the GPT3>GPT4 jump.I bet this will be just strawberry.
>>102217790>I bet this will be just strawberry.I honestly think the strawberry schizo was correct in that the internal project is called strawberry and that it has actual reasoning capabilities.
>>102217827Get some taste.
>>102217790Given that the curve and points are barely connected, its "era" and not even model names (Not to mention OAI managements hallucinations about intelligence, none of if should be taken remotely seriously.>>102217833>it has actual reasoning capabilitiesDont say that to the fans, they will skin you alive for implying it hasn't had it for years
>>102217891>none of if should be taken remotely seriously.Of course not, it's marketing slop made for investors who are conditioned to invest when they are promised that the line will go up.it's fun to speculate, thoughbeit.
>>102217827This made me laugh
https://github.com/gpt-omni/mini-omni
RWKV wonhttps://xcancel.com/picocreator/status/1831006494575464841
>>102218019>artificial jew on ur pc nuked from OS day-one.
>>102218012That demo is really impressive.
deadest of generals
>>102218012https://huggingface.co/gpt-omni/mini-omni/discussions/2#66d70791169f9a7cb83b9cec>If you want to change the LLM model, you have to retrain the whole audio parts.https://huggingface.co/gpt-omni/mini-omni/discussions/1#66d70763b61dd11022a80bd5>For the training code, there is currently no definitive release timeline.Niggers.
>>102218019Is still as mediocre as it was 3 years ago?
I should start up my army of local Mikus to populate the thread.
>>102218418that won't make you any less lonely or make the general any less dead
>>102218410>If you want to change the LLM model, you have to retrain the whole audio partsthis is why multimodals will never be good
Just tested out Q8 KV cache compared to no KV cache quanting. It's not great. Seems to be less capable of remembering things from the context. So honestly I do believe it when they say it's not worth quanting the KV cache. However, if you have a HUGE context, maybe it'd be worth it. But for 32k, I feel fine taking a small hit to speed for the better attention to context.
>>102218474I think the best solution is for an LLM to be trained to accept input and output of a certain modality, but to keep those models separate architecturally. That way, you could swap any compatible components. Like brain legos, but for transformers.
>>102218551thats how some do it now, you can load a image and audio models along side a text model with kobold and use i all together for example. they're never going to release a multimodal where the image gen is better than choosing a popular tune so that whole part of the model is wasted resources thats still being loaded
>>102218551This approach has the same problems as the tokenizer, knowledge gap of the thing that is actually being processed.
>>102218618>knowledge gap of the thing that is actually being processedthis could probably be fixed by better options for what to include in data to be processed. image gen in st is pretty bad because it lacks options to fully realize the scene its in
>echidna-13bIs it still considered the best model for local ooba/silly with 4gb vram?
>>102218702See: >>102217478
>>102218702no thats pretty old and was never the best. what are you looking to do?
>>102218618Train an additional adapter in between the newer modality model and the LLM so any part of the input the latter is unfamiliar with can be processed. Would take less resources than finetuning the LLM itself.
Is there a guide on what the difference is between Q8 and Q5 or what the symbols that go after them mean?Preferable with a visual guide, because I have no fucking idea what people mean when they say>well you see, by separating the quasi symbols from the edge of the tokens, we can preserve the context surrounding them and improve k-mean efficiency by 5%!
>>102218767>no thats pretty old and was never the best.I was gone for quite the while and figured things have changed, that is why I am asking for your advice.Well you are right, but it was surprisingly good and still had decent speed, considering the very limited vram I have.>what are you looking to do?Lewd rp stuff.>>102218719That is not what I asked for.I can't take a fucking computer with me when I'm out in the field!
>>102218862bigger q better then small medium large, again bigger better
>>102218862It means that you should lurk more
>>102213492>100 times the computer power level 2 quantum strawberry AGIHoly shit
>>102218883Yeah, I understand that part now (although that took embarrassingly long time), but now I'd like to learn what they actually do to the model itself.>>102218885Lurking would do jack shit since none of you niggers ever discuss this on a level where idiots like me can understand it.
>>102214335I feel you, bro. Personally, I recenlty tried Chronos-Gold-12B, seems good. If speed is not critical criterion, Command-R-35B seems good too.
>>102218880https://huggingface.co/ArliAI/ArliAI-RPMax-12B-v1.1-GGUFbeen playing with this for a day and its alright. i don't think its specifically for lewd but does it no problem. in general, look for mistral-nemo 12b tunes, should be about the same speed as old 13b
Hey, /g/bros. I am going to be honest I don't know anything about models or tech. I recently discovered chatbots, and I just wanted to ask are there any nice coomerbait models I can run locally on my shitty mac m1 air?
>>102213492SUPERDUPERINTELLIGENCE IN 2 MORE MINUTES AHHHHHHH
>>102218945post specs at least
>>102218921>but now I'd like to learn what they actually do to the model itself.Lower Q makes the model smaller (making it slighly faster due to less bandwidth), but lowers the accuracy of the prediction. SML does the same within the same quant. That's all you need to know.And pic rel
>>102218921>but now I'd like to learn what they actually do to the model itself.the simplest explanation is that they're different levels of lossy compression, if you want nerd level stuff maybe here https://github.com/ggerganov/llama.cpp/wiki/Tensor-Encoding-Schemesand here https://github.com/LostRuins/koboldcpp/wiki#what-are-the-differences-between-the-different-files-for-each-model-do-i-need-them-all-which-quantization-f16-q4_0-q5_1
Hi, my friend asked me to come here. Are there any LLMs that can teach me Japanese?
>>102219028ChatGPT
>>102218931Much appreciated, I will try it out, thank you.In an guide I saw https://huggingface.co/TheBloke/Utopia-13B-GGUF being mentioned, I will try that as well. Have you tried that one yet?
>>102219060nta. Models from TheBloke are old as fuck. Make your own quants or look for more recent ones.
>>102219060>10 months ago
>>102219060Don't. Really, not only it's so fucking old that it probably won't launch, it's also undi slop.
>>102218982Thanks for actually replying!How does a model get smaller? Are "nodes" being removed or merged? Or are the amount of connections between them lowered?Do the numbers mean something or are they chosen arbitrarily?Also what does the _M_L part mean?Oh and couldn't models be much smaller if they were optimized for specific things? Are they as large as they are now because they contain lots of tokens that most people don't really use?>>102219000Oh, those links already answer a lot of my questions. Thanks, anon!
>>102218967It's a M1 mac air with the apple chip. I don't know the specs myself, dude.
Hello, yes. I'm lost. What are local models? Are they dtf?
>>102219102>Oh and couldn't models be much smaller if they were optimized for specific things? Are they as large as they are now because they contain lots of tokens that most people don't really use?if you remove stuff you believe isn't useful don't be surprised when the model get even stupider then thae already are>How does a model get smaller? Are "nodes" being removed or merged? Or are the amount of connections between them lowered?you lower precision from 16 bit usually to whateverso instead of 0.123456789you might have 0.1234
>>102219060thats old as well, but yes i've used it, it was ok, pretty comparable to other l2 13b's at the time. by old i mean llama 2 is the older base model, llama 3 and 3.1 are out now (8b for small). mistral-nemo is another newer model and being 12b is about the same size, but is a good bit smarter than older l2 13b, so look for things based on that or try llama 3 8b tunes. i think the bloke is dead too
>>102219039I mean the ones you can download I thought this was the general for this
>>102219132>don't be surprised when the model get even stupider then thae already areI wonder why this happens. If you take out all the medical terms, how would it become worse at generating adventure stories?>you lower precision from 16 bit usually to whatever>so instead of 0.123456789>you might have 0.1234Ah, something just clicked in my brain. Now I get it.
strawbery
>>102219154you thought wrong, bucko
>>102219132I don't think it's as simple as truncating or rounding, but that's the gist of it. Also not all parts of the model are quantized to the same precision because some parts might be more important than others.
>>102219060that guide really needs to be updated... basically you want recent models where the #B params (in that case 13) isn't excessively higher than the number of GBs of VRAM you have (or RAM+VRAM if you're splitting)at Q8 it's roughly a 1-1 relationship, at Q4 it's around 2B params/GB, you get the gist. ideally you want >Q4 unless you really want to run a bigger model.
>>102219102>How does a model get smaller?Nothing as complicated. Given a range of values, an appropriate offset and scale is chosen:values from -1 to 1 on a tensor: offset 0, scale 0.25, now you need just 9 values to represent the whole range. that weight now fits in 5 bits (from the original 16 or 32). Do a whole tensor with the same offset+scale.Not QUITE as simplistic as that, but not too far either. If you want to know more, you'll have to read code and documentation.
>>102219168There are 3 r's in strawbery.
>>102219158>character gets hit>model doesn't know getting hit hurts since it has no knowledge of that anymoresimple example>>102219174yes obvs i'm massively simplyfing
>>102219196source?
>>102219196lokbok
agi, tell me some fun facts about strawberries
crazy how brutal diminishing returns on LLM parameter increases are.like Nemo 12B is absolutely dumber than Largestral, noticeably so. it has failures of understanding a fair bit more often. but nowhere near to the degree you'd expect from it being more than ten times smaller
>>102219222no, nice trips thought. agi
What will you do when Local LLMs become deprecated?
>>102219246that's because largestral in particular is retarded and doesn't even understand time travel, compare it to a good 70b and nemo can't compete anymore
>>102219246l3 8b is 50x smaller than 405, 405 is nowhere near 50x smarter than 8b, that's pretty crazy to think about.
>>102219222Fun fact! Strawberries are actually vegetables. This is because, despite being sweet and genetically related to other citrus plants, strawberries actually grow underground!
>>102219246From what I tested, 405B isn't much better than 100B either, so we are probably at a architecture dead end.
>>102219266>time traveli'm surprised i haven't thought to try that. i usually prompt what year it is, say 80s, and with l2 70b's like miqu it then almost never mentions a character pulling out a phone, but might mention a house phone on the wall
>>102219200>>character gets hit>>model doesn't know getting hit hurts since it has no knowledge of that anymore>simple exampleI was more talking about very specific terms. Like the Latin terms for all the animals.Would removing those have a large effect on the quality of the generated text?I'm sure there's some contextual overlap, but would that really be worth the amount of params you could save?
>>102219317Or maybe Meta is just incompetent.
>>102219259I don't understand your question; I will literally create my immortal wife with my own hands.
>>102219259Become a hunter and seek you until the end of my life so I can make you my ERP chatbot
>>102219329>I'm sure there's some contextual overlap, but would that really be worth the amount of params you could save?yes, models should always be trained on everything you can get your hands on, everything, don't remove a single thing, that's why claude models are so good at rp they know super niche stuff, like random fandom terms and the likes that can give your character tons of soul sometimes
>>102219329>Would removing those have a large effect on the quality of the generated text?>I'm sure there's some contextual overlap, but would that really be worth the amount of params you could save?try phi models if you want "clean" models trained only on synthetic data and textbooks, the second you ask for anything outside of pure corpo slop they fall apart
>>102219222Discussing strawberries could inadvertently promote agricultural practices that may lead to over-farming, soil erosion, and habitat destruction, impacting ecological balance and species survival. Additionally, in some individuals, strawberries can cause allergic reactions, posing health risks. Maybe we could shift the conversation to sustainable farming practices or the importance of preserving natural habitats to protect diverse species and ecosystems.
>>102219246>>102219317>>102219335By every objective metric Largestral blows Nemo out of the water and 405B is a further step up. You just don't find them any better at the making-you-cum benchmark.
>>102219404>the making-you-cum benchmark.the only benchmark that matters
>>102219404>You just don't find them any better at the making-you-cum benchmark.Which is objectively the only use case for LLMs.
>large models aren't several times better than small modelsMaybe you're not very discerning or not trying the right prompts. After being used to 123B and trying Nemo, I couldn't fathom how much more stupid it was. It might not be 10x but it's definitely at least 3x.
>>102219404phi models are also very good according to the average benchmark, but for coom they absolutely suck.
>>102219436>It might not be 10x but it's definitely at least 3x.That is what is being claimed yes, they are not proportionally better as their size might make one think.
>>102219436See >>102215976 it's still unusable if you give it any challenging scenario.
>>102219368Hm, interesting perspective.I think I'm starting to understand why OpenAI is expressing so much interest in creating specific training data.
>>102219436The only difference between Nemo and Largestral is that Largestral understands when it messes up and tries to hide that from you with creativity, desu.
>>102219087>>102219097>>102219099>>102219152>>102219183Thank you everyone, very interesting and helpful.I will check out some 12B mistral-nemo models and other llama 3 8B tunes then.
>>102219404Actually, I do. I care when my ERPs are make less sense. It's a turn off. And in that metric I do feel that yes actually, 123B is much better than 12B.
>>102219461This is a even better example, since it's unquestionably stupid: >>102216247
>>102219483remember newer stuff is higher context too, you aren't stuck at 4k anymore. even st finally updated their default to 8k. a lot of those old l2 13bs couldn't even be roped beyond 6k. these days you can get 32k-128k
>>102219317A dataset dead end. Will have to do something besides shoving random internet shit in it someday. can't hire the pajeets for that one
It's funny how people are realizing just how limited the English language really is.
>>102219436yeah 3x sounds about right to meI made the original post in this chain and it seems like everyone's interpreting it as "Largestral size models aren't worth using over 12B" but that isn't what I meant, I have Largestral and use it over 12B all the timeI just think it's remarkable that the difference isn't much bigger than it is
>>102219552Yeah, Anthropic already proved time and time again that synthetic data is the way to go.
>>102214608Still L3 70b storywriter, used 123b q4 for a while before switching back
>>102219554One day you'll have your 100% pajeet model. Don't worry.
>>102219554nahhttps://www.youtube.com/watch?v=NJYoqCDKoT4
>>102214608My private 12B fine-tuned on light novels. No, I'm not sharing it.
>>102219573(nta) what made you switch back?
>>102219583>>102219602God fucking damnit, I knew I shouldn't have erased that second sentence where I explicitly explain that I don't mean that other languages are better, because I thought people would intuitively understand that.You know what? My bad. I forgot to treat you people like the toddlers you are.
>>102219609Nah, I understand you anon, romance languages are just superior. A beta language like English can't compete.
>>102219609Your hallucinations aren't inherently obvious to anyone here.
>>102219552An alignment dead end. All it takes is one company to released a model that hasn't been pre-emptively lobotomized.
>>102219608NTA but I also went through something similar, and my reason is that the improvement (if any) wasn't enough to compensate for the speed drop
>>102219609You would then talk about the limits of the human languages, you retard. Learn to express your thoughts.
>>102219661No, because the English language can be improved.Words can be added. Removed. Modified.
>>102219657i went back to 70b myself but only because it has more soul. mistral large is very smart, adheres to prompts well, but its boring as hell, plus the speed difference
>>102219690All languages can do that. All languages can be improved.
>>102219703yeah, that too. Mistral models are too overconfident. I recommend you should try CR+ (not the 08-2024 version though) if you haven't yet, it also has soul.
>>102219657>>102219703same reasons here, really slow and also seems to get really repetitive over long context. even if you DRY it just finds new ways to rephrase the same stuff, it doesn't want to do anything different. 70bs are less smart but at least rerolls are worth something
>>102218862>>102218921Model weights are stored as 16 bit floating point values. The first of those bits is the sign (tells you whether the number is positive or negative), the next 5 are the exponent, and the last 10 are the significand (the number that gets modified by the exponent). So an example of an FP16 value is 1011010101010100Broken up into its parts, that's 1 01101 0101010100And then converted into decimal it's (-) 340 ^ 13Q4 stores the same number as a 4 bit integer, where the first number is the sign still, and the next 3 are the significand, and there is no exponent. So each weight gets saved as an integer between 7 and -7.
>>102219459Sure but people made it sound like big models are not great or that small models are somehow not that bad, when they are actually very, very bad. Like 123B might not be perfect for a lot of tasks I throw at it, and people are right to be critical, but it is still doing extremely more than the small models and I suspect people who do not believe that just have not tried enough things.>>102219556Honestly I might even say much more than 3x but it's kind of hard to argue here when it's not clearly defined what multiplying intelligence concretely means. If we're talking about the sheer number of facts that an LLM knows, I would argue that it kind of actually does feel not far from 10x more. But perhaps reasoning is not 10x more and it's close to 3x.
>>102219729>Mistral models are too overconfidentthis is probably why they seem so dry for rp. the entire response is dedicated to what i typed, it has very little will to add something or do something random no matter what you do with samplers. tuning doesn't help either.>>102219764>even if you DRY it just finds new ways to rephrase the same stuffthats what all of the rep penalty stuff does, it can't really improve how a model wants to output text so it just finds the closest substitute. i don't see xtc fixing that either
>>102219703>plus the speed differenceWith a single 3090 and 64 GB RAM I haven't been able to get a 70B to run any faster than an IQ3_XS quant of Mistral Large. Both are about 0.5 to 0.7 tokens/second. What quant / settings are you using for your 70B?
>>1022199241.4t/s on q3 k s at 16k context (i only have 16gb vram). it isn't fast, but usable. largestral is def slow though, 0.6-7t/s. you're probably losing speed from using an iq quant. look for the non-iq version of the same model, it should be faster. i think iq quants help mostly with smaller models, 70b seems to be just smart enough to not mess up most of the time without the extra help
>>102219924nta you're responding to, but how many layers are you offloading and what backend are you using? I have a 6950xt (16gb) and 32 gb ddr5 and am still able to fit ~45 layers of IQ3_M at 16k context on koboldcpp rocm and get a little around 1-1.5 t/s (although prompt processing is still pretty dogshit). I'm also using flash attention and context shifting to ease wait times between regens.
>>102220069>I'm also using flash attention and context shiftingyou can't actually use these features together. even if you selected them, one is going to cancel the other. on cpu fa causes more lag so if you're getting 1.5t/s, fa probably isn't being enabled at all but context shift works fine
>>102219608>>102219657>>102219703>>102219764>>102219865123b at 4t/s is alright for me, it's just that it doesn't want to follow the context writing style no matter how hard I try to steer it by banning words/sampler etc.Sometimes I want the text to write like a 12-year-old's diary with poor vocab range and tenses, sometimes I want it to write like a pretentious English major's writing job. It does neither. Loli in the diary talks and writes like a college student regardless because mistral.
I had an idea. What if I uploaded 2 finetunes for nemo (Q8) and didn't say what finetunes those are. And make 3 polls. 1 poll which model is better. Poll 2 and 3 what exact model is it. What would happen?(skipping the part where nobody is gonna do the experiment)
>>102220128i'm pretty sure it's context quanting and ctx shift that doesn't work together on kcpp, don't think i've heard of fa blocking it
>>102214644Yes, I do. Are the popular open source implementations like open webui that claim to do RAG SERIOUSLY not packaging it with the nifty vector db stuff? Is it SERIOUSLY just "trigger it and we silently paste the document into the prompt"? Because that would have been weak sounding to me even 1 year ago. I've been meaning to fight my way through the unusable dockerbloat bullshit and give open webui a try just for RAG... but I am perfectly fine pasting my own documents into my own prompts if that's all it is.
>>102220132Huh? What model do you use that actually follows the context writing style?
>>102220166it says so in the uiyou would notice the speed hit on cpu too
>>102220180>Is it SERIOUSLY just "trigger it and we silently paste the document into the prompt"?Yes, yes it is. No post-processing or store-retrieval optimizations, nothing.I fully believe that storing more and more information in models and making them ever larger is not the answer.Providing a framework for these things to work within is.
>>102220207yeah ctx quanting requires fa on and ctx shift off, doesn't say fa blocks ctx shiftalso never used the ui so i wouldn't know about the tooltips
>>102220186L3 storywriter > old CR+ > largestral. For smarts it's the other way around, of course
>tfw 3-4 t/s drops to 1-2 t/s when I get to 20k contextAhhhhhhhhhhh
>>102220254Interesting, what quant?
>>102220270How the fuck do you not kill yourself having to wait MINUTES for generation to complete?
>>102220230quantizing the kv cache at all requires more processing power. you won't notice when everything is in vram because its so fast anyways, but on cpu it takes MORE processing power, so it actually slows down your already slow t/s. unless you are trying to squeeze some more context out of vram on the edge of what you have, you shouldn't use fa at all
>>102220290but what i'm saying is that you don't have to quant to use fa, and as such ctx shift seems to work, there's no mention of fa blocking anything in the help>--quantkv [quantization level 0/1/2] Sets the KV cache data type quantization, 0=f16, 1=q8, 2=q4. Requires Flash Attention, and disables context shifting.>--flashattention Enables flash attention.>--noshift If set, do not attempt to Trim and Shift the GGUF context.only quant kv blocks stuff
>>102220163First poll is the only one that matters if the final results can be trusted. The others are vectors for advertising and useless speculation.
>>102220288I multitask. It does suck, but nothing to kill oneself over.
>>102220340I'm assuming you're not using it for porn, then?Fair enough.
>>102220207>>102220069(me)Yeah, unless I'm misunderstanding what context shifting (whenever a prompt gets processed I don't have to reprocess the entire context when regenerating unless I alter the already processed prompt) and flash attention (optimizing the memory footprint of context lengths), then I'm pretty sure? that it works on my machine.
>>102219690>language can be improved.Like calling everyone they instead of him her.
>>102220353Have an original thought for once, hylic.
>>102220348>context shifting (whenever a prompt gets processed I don't have to reprocess the entire context when regenerating unless I alter the already processed prompt) and flash attention (optimizing the memory footprint of context lengths),That is what those do yes.
>>102220345Oh you were talking from the perspective of someone trying to get off. OK yeah I understand how it is for you. In that case, maybe I'd try switching to a 12B or something that's just permanently loaded in RAM. IIRC even on CPU that model is still fast. Or maybe 7B would do as well.
>>10222037312B models work fast enough on my end with just 8GB of VRAM.I should try out some 20B models, now that I think about it.
What happened to cheap V100s?Every time I search ebay out of morbid curiosity they just keep getting more expensive.
>>102220387Speed will be much lower, and there really aren't any good models between 10 and 70B.
>>1022203952 more months, surely
>>102220282Q6, Q4_K_M, IQ4_XX4 respectively when I used them
>>102220395Don't worry, the AI bubble will pop any moment now!
>>102220348you have it right. that 'processing prompt' step where it reads everything, that usually only needs to be done once so you can generate like 10 swipes without redoing that step. it works great until you get to lorebooks and rag
>>102220395 (Me)>>102220405>>102220417I think the issue is "entrepreneurs" scooping up all the cheap ones in order to cobble together shit like this to sell to morons with too much money. Yours for the low low price of 17000 USD.
>>102220069Just tested again with llama.cpp b3581 CUDA version, Windows 11. Flash attention enabled. Mmap disabled. I have DDR4 instead of DDR5. Using a Q_4_K_M quant of a Miqu derivative with 16k context. Temperature, min-p, and repetition penalty enabled.# layers offloaded vs tokens/second (3 trials each, prompt processing excluded):1 layer: 0.54, 0.56, 0.54 t/s10 layers: 0.63, 0.63, 0.62 t/s20 layers: 0.73, 0.73, 0.73 t/s30 layers: 0.87, 0.86, 0.86 t/s40 layers: 1.08, 1.06, 1.07 t/s45 layers: failed to load, cudaMalloc failed (disabling virtual VRAM seems to have some effect?)
>>102220628>>102220628>>102220628
>try to build a github project with python>doesn't workI FUCKING HATE THIS PIECE OF FUCKING SHIT GARBAGE LANGUAGEHOLY SHIT WHO DESIGNED THIS NIGGERLICIOUS PIECE OF CRAP?WHY THE FUCK ARE THERE FOUR COMMANDS BUT TWO HAVE A RANDOM FUCKING NUMBER ATTACHED TO ITWHY ARE THERE MODULES MISSING WHEN I AM EXPLICITLY INSTALLING THEMANSWERS TO THESE QUESTIONS AND MORE FUCKING NEVER BECAUSE NO ONE FUCKING KNOWS NOR CARESFUUUUUUUUUUUCK
>>102220679you are not alone, i also find dealing with python a massive pita, to the point i avoid it whenever possible, even when it means missing out on something that looks interesting, dealing with python usually isn't worth it
>>102220702Yeah, same.I saw something very cool so I decided to attempt it nonetheless, but nope.I am so, so tired of Python.
>>102220637>>102220069(me)strange, I use rocm and windows 10 so there may be some differences with how much ram W11 uses compared to W10 or maybe something with offloading to cuda that I'm unaware of
>>102220752Massive skill issue
>>102220835OH YEAH? THEN WHY DOES EVERY OTHER FUCKING LANGUAGE JUST WORK, HUH?YOU TELL IT DO TO A THING, IT DOES THE THINGIT BITCHES ABOUT SOMETHING MISSING, YOU INSTALL IT, IT FUCKING WORKSBUT NOOOOO, PYTHON NEEDS TO BE SPECIALWELL FUCK YOU AND FUCK YOUR SPECIAL NEEDS LANGUAGE