/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101191862 & >>101186500►News>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101191862--The Struggle is Real: Cleaning Datasets for Machine Learning Models: >>101191983 >>101192038 >>101192091 >>101192171 >>101192277 >>101192315 >>101192363 >>101192395 >>101192396 >>101192335--Seeking EXL2 Compatible Server with OpenAI API and Context-Free Grammar Support: >>101192778 >>101193053 >>101193098 >>101193144 >>101193166--Qwen 2's Tess-v2.5.2-Qwen2-72B Variant: A Promising AI Model: >>101193192 >>101193460 >>101193516--Perplexity Improves with 9b Base Model: >>101192484 >>101192604 >>101192644 >>101192655 >>101192662--Llama.cpp's Token Generation Delay with Cached Prompts: >>101195573 >>101195626 >>101195675 >>101195962 >>101195977 >>101196044 >>101196192--LLM Compiler: Code Optimization and Disassembly Research Experiment: >>101191929 >>101192026 >>101192176 >>101193178--Counting Letters and Custom Compiling Gemma-2 Support in Llama.cpp: >>101192460 >>101192555 >>101192897 >>101192940 >>101192964 >>101193630 >>101193760--Gemma's Performance in Real-World RP and Potential Combinations: >>101193239 >>101193271 >>101193316 >>101193566 >>101193637--Gemma 2's 8K Context Limitations and Meta's Unfulfilled Promises: >>101195909 >>101195953 >>101196152 >>101196394--AI-Generated Cat Image and LLM Writing Quality: >>101193118 >>101193151 >>101193260 >>101193552 >>101193311 >>101194021 >>101194047 >>101194093 >>101194110 >>101194134 >>101194150 >>101194234 >>101194251--27B's Performance Improvement and Schizo Fix: >>101193819 >>101193846 >>101193868 >>101193906 >>101193945 >>101195648 >>101193954 >>101193967--Llama-70B and Gemma-27B VRAM Performance Issues: >>101194991 >>101195001 >>101195070--gemma2's Repetitive Answers: A Potential Inference Issue: >>101192975 >>101192983 >>101193055--Miku (free space): >>101192212 >>101192496 >>101195485 >>101196114 >>101196225 >>101196269 >>101196461 >>101196550 >>101196766►Recent Highlight Posts from the Previous Thread: >>101191868
!!! THREADLY REMINDER !!!llama.cpp is AGPL3.0-only>>101188248
Any new image model developments?
>>101197218Pony / pony realism
Why did you start a new thread?
dead general dead hobby
dead technology dead future
I've been trying out gemma 2 27b on lmsys. It feels A LOT like the gemini flash model, but a bit dumber. I've also noticed that it basically gives the same response each time, even with the temperature turned up.They overall have the same feel, and my theory is that gemma 2 is just a "fork" of an earlier checkpoint of gemini flash.
>>101197411Aloneposting on /lmg/ on a Friday late night?Yeah, that's something people should aspire to.
>>101197208!!! THREADLY REMINDER !!!petra's timezone is UTC+1
>>101197454
>>101197208
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboardwhat happened to miqu?
>>101197438h-hey it's Saturday night here
>>101197494She had a good run but she was finally put out to pasture. No model is SOTA forever.
What is the current coom model?
>>101197579qwen2 72b
>>101197438hey its saturday morning here
>>101197579Buy an ad.
What is the current coom card?
>>101197613big nigga
>>101197613The one you wrote yourself to reflect your ideal scenario
>>101197652I don't have a type
>>101197660Ask the AI to make up a character then.
>>101197660>
>>101197660>I don't have a typepicrel is now your type
Anyway had issues with 27B generating only pad tokens indefinitely until the generation is interrupted, when loaded with Transformers?This isn't the different issue some are having with 27B outputs being low quality or schizo, this is something else where it's not working at all. Just generating an endless string of pad tokens in response to any input.
*Anyone
>>101197660Python was a mistake.
>>101197771savage
>>101197771heh
>>101197771It's a fucking bane on programming.So many headaches because of that fucking meme "language."
>>101197771there's worse language though, imagine using javascript as the required programming language for the fucking sites on the internet, I know sounds crazy but...
my favorite quant is IQ4_NL
>>101197771Python does have types. /lmg/ proving once again to be mostly nocoders.
>>101197798only pajeets and hobbyists use plain javascriptevery company with more than 2 developers working codes with typescript
>>101197754I found another guy on HF getting this issue but he couldn't solve it either. Not GGUFs, just the standard FP16 weights.Got the latest 4.43.0.dev Transformers, but it happens with the 4.42.0.dev wheel supplied by Google too. Weird.
>>101197294sampling doesn't work so logits doesn't work
>>101197820And the Anon that I was replying to presumably also has a type.But in both cases there is a lack of awareness.
>>101197826typescript is the same shit anon... it's just java script but with OOP
>>101197845it has types
>>101197853yay...
>>101197828Also do_sample is off so it's not that. I'm not getting NaNs, it's just generating <pad> endlessly.
>>101197411blame avatarfags
>>101197861forgot to attach picrel
Gemma2 27b is good at poetic metre. Never seen a model spit out multiple stanzas of perfect iambic pentameter without a single mistake.
>>101197845>it's just java script but with OOPYou have no idea what you're talking about.>>101197826TypeScript is just for decoration, like putting makeup on a pig.
>>101197882how did you test it out anon? last time I've heard about that model there was some bugs making it schizo
>>101197860>>101197828>>101197754Fuck, I just needed to tick the BF16 option when loading the weights. Even though the weights are FP16. I don't get it but I'll take it, it's working now.
>Bug: quantized gemma 27b output still wrong after tokenizer fix #8183it wouldn't be llama.cpp otherwise
>>101197899>You have no idea what you're talking about.Oh I fucking do, anon, I fucking do I made a site project with typescript, it's the same fucking shit as javascript and I hated my life when doing this shit, the fact that this failed language managed to be the main language on the fucking internet is still one of the biggest mysteries of the human history
>>101197882>>101197901Picrel (this was after I gave it the feathers vs. steel riddle, that is why the topic is steel)It's the technical accuracy of the poetic metre I am impressed by specifically, not necessarily the writing quality. Other models would have struggled with consistently keeping the correct metre and messed up stress and syllables once or twice.
>>101197911sounds like skill issue
>>101197913sounds like masochism issue
>>101197911JavaScript has OOP. TypeScript adds types, hence the name, not OOP. Stupid webshitter.
>>101197918>0==false>AAAAAAAAAA you
>>101197911>the fact that this failed language managed to be the main language on the fucking internet is still one of the biggest mysteries of the human historyIt should have been Lua.
>>101197911>Oh I fucking do, anon, I fucking do Not him, and I hate javascript, but no, you really don't. Give it a rest.
>>101197945Lua is relabeled BASIC.
>>101197949No it's not. The syntax is just vaguely similar. BASIC doesn't have closures, first class functions, and you can't create prototype patterns. Also BASIC isn't designed to be embeddable.
>>101197945No, it shouldn't be any single retarded scripting language. The web should be a collection of documents, as it was originally intended to be.Adding any scripting was the first mistake. Trying to turn web browsers into cross platform application emulators because the average bootcamp flunkie is too stupid for regular application development was the biggest mistake.But now that we're here, Web Assembly is the correct solution. Letting those same javascript artisans that got us here in the first place gimp it from interacting with the DOM to prevent themselves from becoming obsolete was the final mistake.
>>101197974If we had to have scripting in the web browser, Lua would have been a fine solution.Otherwise though I agree with you.
>>101197938>>101197948didn't expect to find fucking javascript fanboys, goddam, and you people say you hate python at the same time? you lost all credibility with that stank take
>/lmg/ - i am le smartembarassing
>ask character if she's a virgin>no>add it into the card>now she acts like the shyest most boring and predictable characterwat do?
>sweet summer child:(
>>101198056Perfect example of the model picking up on shitty cliche smut tropes, realizing that's what it's writing and implementing that into its writing style.Try rephrasing it to "{{char}} has never had sex before" and watch it magically fix itself.
Is there any good local multi-modal model that takes image and video input yet?
>>101198076>has never had sex beforeholy shit it works anon, thank you. heres a miku pic
>>101198087>011>011
>>101197911Okay but what's wrong with it
geg
>>101198084Even if there were, llama.cpp wouldn't support them. There's a few that support image input, but I haven't seen any local models that take video yet.
>>101198107:(
>>101197864>shit vs shit whoa!
>>101198035thats all of /g/ at this point, catalog is trashed with anime pics and ai jeet hype-up advertisement "threads"
>>101198099there's hundreds of better programming languages than fucking javascript and you're ok that this piece of shit is required to make the internet work? the fuck?
>/lmg/ - local models general
>How many r's are in "strawberry"?>There are 2 r's in "strawberry".>Spell the word "strawberry" and tell me how many r's are in the word.>The word "strawberry" is spelled S-T-R-A-W-B-E-R-R-Y. There are 3 r's in the word "strawberry".I don't think the strawberry test is good for validating a model's quality despite yesterday's meme. Rather I think it reminds us of the way tokenization works and that it's something we must account for when asking an LLM to do tasks that is more granular than the word/token level.
>>101198499>ad hominemwhats wrong with it anon?
hi
>>101198558Even the creator of javascript thinks it ruined the internet but nahh nothing's wrong with it anon 1!11!1!1https://lunduke.substack.com/p/creator-of-javascript-apologizes
>>101198570You have this saved but you cant name a reason. I'll stop replying though since it's offtopic.
>>101197787Ironically only people who aren't programmers or are new into coding trash on python. There is a good in my job who were coding in C++ for 15 years. Since he tried python he never went back and always fights tooth and nails to use it in our projects. I've seen it multiple times and my personal experiences and my colleagues' around me resonate more with me than some python memes from r/ProgrammerHumor
>>101198577>you cant name a reasonif you knew how to read text on pictures, you'll notice the reasons are cited on his tweet>I'll stop replying though since it's offtopic.looks convenient to leave the debate after being proved wrong :^)
>>101198570This is just attention seeking behavior
>>101198591You're right, it's just a consipracy theory!1!1!1
>>101198586>there is a goodguy*, fucking autocorrect
>>101198523Yeah well said. It's quite robust when you expand the instructions>### Instruction:>Count the instances of the given Letter within the Input string. First expand the Input string into individual letters, then count the number of instances of the given Letter.>### Input:>strawberry>### Letter:>R>### Response:>Step 1:
I think ylecunn is correct and meta should stop wasting compute on producing llms that will never basic intuitive behavior or genuine situational awareness.
>>101198663I think lecunt is french and should go away.
>>101198674>french manI think he sounds based
https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/unslop1/control_vector-commandr-unslop1.ggufTook my control vector for a test drive to see if it works correctly. During SFW everything worked perfectly, had the style that I wanted, but when NSFW part came, slop came out. Looks like slop during SFW and slop during NSFW have different directions inside the model.
>>101198756>wtfpl
>>101198768Sorry, but it will likely be removed if I release it with +NIGGER license.
>>101198756how do I use this with exl2
>>101198785release it with faipl-1.0>https://freedevproject.org/faipl-1.0/qrd: agpl but for weights
>>101198756how do i apply faipl-1.0?
>>101198793>faipl-1.0HF doesn't recognize it.
>>101198918you can select other, and then add the contents of https://freedevproject.org/faipl-1.0.txt to it
>>101198924to the LICENSE file, atleast thats how animagine and many other open source models do it >https://huggingface.co/cagliostrolab/animagine-xl-3.1
>>101198918you can also add >license_name: faipl-1.0-sdto the readme
>>101198756>faipl-1.0
>>101198586I have multiple years of experience both in Python and other languages and it is my strong conviction that allowing retards to use dynamic typing is a terrible idea.
>ministrations
>>101198586>There is a good in my job who were coding in C++ for 15 years.I'm one of those guys, I did some C++ and Java for more than 10 years, Python is still my favorite language, it's just simple and elegant, the others are convoluted piece of shit, but I choose those shit languages because it pays well kek
>>101198756>https://huggingface.co/ChuckMcSneed/control_vectors/blob/main/command-r-plus/unslop1/example_output.mdwtf this is better than lora, why doesnt sao10k use control vectors?
>>1011990250.5 > -0.5
>>101198989there isn't a sentence in there that doesn't have slop in it.
>>101199102wat do
>>101198953>why are you gay
>>101198565damn
>>101199025Because control vectors lock model in a single direction. Not everyone has the same tastes, some people like forming bonds and going on journeys(>>101199067), some like model to be blunt and clear. There are also issues with repetition and decreased intelligence, if applied too hard.
>>101199204what would happen if you merged multiple control vectors then
>>101199134When you prompt LLM to write prose or roleplay it will always give you slop. There are two ways1) obvious - stop prompting it to write slop.2) remove slop - autoprompt to rewrite output removing all slop.the rest is cope. No amount of ko-fi finetunes or control vectors can fix it. It's the datasets, the training. Either you have that, or you have braindead rambling.
>>101199229The universe accelerates so fast it gets reset.
>>101199229All off them would apply.
>>101199242hi x1,000,000000,00000000000,000000000000000000,000000000000000000000000000000000
>>101199237>1) obvious - stop prompting it to write slop.how? the system prompt? the cards? i dont like prose at all but its not like theres much choice in models>2) remove slop - autoprompt to rewrite output removing all slop.how do i autoprompt? i know i can manually clean slop from replies but sometimes its too much, takes a toll of its own
>>101197434Interesting theory. But its only useful if they did it the other way around - as in lobotomizing the final flash model. If a way is found to re-add ze 6 bazinglion context window and multimodal capabilities it would be huge.
Visionbros it is so over...
>>101199300>no chameleon >no Cambrianmeds, schizzo
>>101197208CUDA dev, release your critical code with agpl, with a condition that it will be mit or whatever in 3/6 months, or something like that, just so the non contributing parts cannot be in the currrent state of the art, yet you can say that you are providing a business friendly code when it reaches a stable status
>>101199314>Chameleon"Oh yeah, that happened": the modelIs there a single person who has tried it?
no matter how much i reroll she answers with my weight, wat do
>>101199385use a better model
what's a good API provider where I can rapidly test different models?
>>101199403for example?
>>101199459i fucking hate llms so much
happynameday.today
>>101199471>happynameday.today
>>101199459is this stheno or something like that?
>>101199548stheno 8b 3.2, any recommendations?
>>101199562stheno works ok for prompts that aren't 80% coom focused. I did wholesome adventures with it. So you'll have to learn to write your own cards.
>>101199599gib cards
>>101199459garbage in garbage out
>>101199636Anons when the AI doesn't write a masterpiece or a symphony after typing in a single dot:
>>101199609just write it yourself nigga, it's not hard, think of a girl YOU would want to plap and describe her. You don't have to follow any weird formatting, writing plain text works too.{{char}} is a party girl. {{char}} likes drinking and handholding with strangers. Stuff like that.
>>101197434google is distilling their models from bigger ones so I wouldn't be surprised
>>101199487Damn, Petra looking good
>>101198989>Doing the other RPers actionsI hate people like you.
>>101197218pixart team said they're working on a bigger model
>no more cards to patits over..
What quant to use for gemma 27b for 24GB VRAM?
what
>>101199946pat this
>>101200080he a man tho
>>101200079B'Hig Cox.
least gay card
>>101197831Wait, what? If it can't know logits, how does it choose which token to add?
>>101200080Biggie Jong Un
>>101200126Are you running this with 0.2 smoothing or something?
>>101200277everything is neutralized besidesTemperature - 1.12-1.22Min-P - 0.075Top-K - 50Repetition Penalty - 1.1
>All these logs>No NalaHow disappointing.
>>101200234the pat cut
Thank you greg. Fuck.It's fully possible that they are worrying about something that's actually normal behavior, as weird as it is.It's probably a bug, yeah, but check instead of assuming.
I am going fucking insane, spent 2 hours wrangling. I need to resist the urge to ask totally irrelevant questions.
Is Gemini 2 27B working on local yet or can I go back to sleep?
>>101200518If you had the ability to sleep why not just sleep?
>>101200394Working on another merge right now so I should have some new official Nala tests right now.also uploading a 70B merge right now. But it keeps failing overnight (the computer it's uploaded to is on a wireless ethernet bridge so it's a bit too finnicky for the HF web uploader, so I've been going 1 file at a time as I have time for the last couple of days). New Nala tests and at least 1 new model by the end of the day.
>>101200544>1 new modellicensed under FAIPL-1.0?
>>101200550If I don't make it cc-by-nc petra anon will show up and bully me.
>>101200560>cc-by-ncb-based..
Licensing model weights seems pretty dubious in the first place if I'm to be honest. What's to stop someone from just taking a set of weights, applying a 360 degree rotation to all the tensors and walking away with a different license since it's technically a different set of weights now?
>>1012004971. INST generate output2. INST is this slop/out of character/? respond with yes or no3. if yes, INST rewrite, goto 2. else display output to user.
>>101200593me
>>101200497Your complaint is unclear.But IMHO it can be more fun having meta discussion with the LLM about the RP than the RP is. Especially on runs where the LLM is acting like AGI instead of like 77IQ.
>>101200593Depends on the argument if it's a creative work (protected) or simply a list of data (not protected) or a combination (how cookbooks work under copyright law; the ingredients are unprotected as a list of facts, the instructions are protected because they are a designed procedure).
>>101200629found the artcel
>>101200593Copyright all the symmetry groups.
>r/4changeg
>>101200684reads like some kind of mixtral meme model
>>101200733settings from >>101171560model is the normal 8b stheno 3.2
>>101199025Logs?
>>101200923re-read the post
Which one i should download? I dont even know how much space it needs in ram / vram
>>101200948None, download 3.3
>>101200948this post is bait, right?
>>101200966No. Before that i was looking for Max RAM required and etc.
>>101200948>IQ3_S-imatwhat means the first I and the imat?
>>101200948RAM=model size+15%
>>101201000imat is iMatrix. It is used with tiny quants so they aren't as stupid.IQ is a different system of quants than the common K series. It has a different trade-off of size and performance. IQ is smaller than QK, of the same Q number.IQ and iMat are unrelated but can appear together, e.g. i1-IQ3_XXS
>>101199237>1) obvious - stop prompting it to write slop.Prompting only works if it is consistent with entrainment. In any conflict between the sysprompt and entrainment, entrainment wins.
>>101201064I see, thanks for the answer anon, much apreciated
I read like half of the stuff in the OP and now I'm talking to my GPU.she's kinda stupid and keeps changing her syntax but I think I can figure this out, or try different models and stuff at least.I've made it this far
>>101200948I think q8 base requirement for that model is a little less than 8gb but if you run it at a higher context the ram requirement increases. I'd estimate q8 at 16k context would around 11-ish gb.
>>101201119>I read like half of the stuff in the OPYou are leaps ahead of most people. Congrats.
>>101201119based
>>101201119>I readYou are leaps ahead of most people. Congrats.
>>101201087i never said to prompt it to not right slop.i said to not prompt it to write slop.
Should I have downloaded Koboldcpp instead of koboldai?I have 12gigs of vram, but a decent processor. it looks like the main difference is streaming?
call me georgi the way I'm gerganov to AI chatbots
>>101201486KoboldAI is all but defunct, kcpp is current and for mixed GPU/CPU setups, while exl2 is for Rawdogging pure GPU.
uhhh bros...
>>101201527thanks. assuming I can just copy models from the KoboldAI 'models' folder to a similar folder in Koboldcpp so I don't have to redownload them?
Gemma had two major issues at launch which we know of so far.The first was an incorrect tokenizer, which was fixed relatively quickly though a lot of GGUFs were made before that.The second issue which was discovered much later was that Logic Soft-Capping, which Gemma-2 was trained with but which was initially not implemented in Transformers due to it conflicting with flash attention, was far more important than Google had believed it to be. Especially for the larger model.The first issue (broken tokenizer) has been fixed for a while, and fixed GGUF has been uploaded to Bartowski's Account. But the second issue has not been fixed in llama.cpp yet. There is a PR but it has not been merged, though it likely will be very soon based on the recent approvals.It was first believed that GGUFs would have to be remade after the PR got merged, but a default value was added for the soft-capping which means that old GGUFs will work as soon as the PR is merged.So to summarize, if you download a GGUF from bartowski right now it will work as soon as the PR is merged, but before then you will experience degraded performance. Especially on the 27b model, which is entirely broken at certain tasks at the moment.It's entirely possible that there are issues beyond just these two. It's not rare for various bugs to rear their heads when a new architecture emerges after all. And I have seen some say that they are experiencing issues even after the fixes. Like this post.It's also worth noting that since llama.cpp does not support sliding window attention at the moment it will likely perform pretty poorly with context sizes larger than 4K. There is an issue open for sliding window attention but it has not really been worked on so far since few models actually use it.40 upvotesI honestly had no idea how shit /lmg/ is. I just hated you cause you are mentally ill.
>>101201532Ai wuz not understand the nigger language n shieeet
>>101201593
>>101201606how is google releasing a broken, undercooked model our fault?
>>101201709You didn't summarize the last two days worth of threads in ELI5 fashion so he could understand what was going on.
is typhon 8x7b still king?
Anons, are this good?Nous-Capybara-limarpv3-34BMixtral-8x7B-Instruct-v0.1PsyMedRP-v1-20BFimbulvetr v2
What's the current best local model for creating an app (and so you can learn programming in the process)?
>>101201995DeepSeekCoder-V2-236B
>>101201951old. but i guess for your size theres not much better, i guess stheno is ok but its very sloppy, fimb is very nice, psymedrp? eh didnt enjoy it. mixtral instruct? use mixtral instruct limarp zloss dareties or whatever its called instead. >34bdunno
>>101201951Switch out plain Mixtral for Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss.
>>101202018or typhon, but right now im messing with typhon to see how it was, seems worse than stheno, probably a skill issue>>101202038is it better than typhon? i remember using it and switching it for typhon because everyone was shilling typhon, then i tried typhon and it was meh
>>101199798That's exactly what I wanted to hear. Thanks anon!
>https://huggingface.co/InferenceIllusionist/TeTO-MS-8x7b/tree/mainb-bros?
>>101202093Imagine if you had posted actual meaningful text that created conversation instead of farting something r9k would turn you away for.
>>101202093>mixtralyawncute tet though
According to the new HF leaderboard, Qwen is the top model, while CR+ is much lower. Does that actually align with people's usage of the models? I rarely ever hear about Qwen being good, or bad.
>>101202193> Does that actually align with people's usage of the models?no
>>101202193For me, CR+ is solid for RP and has occasionally acted too smart.Qwen2 liked to spontaneously disrespect which of us was playing which role.But for things other than RP, Qwen might be better. Different models, different strengths.
>>101202193Qwrn 2 is underrated for the size, but CR+ is just better.
>>101202093Howdy fren.Just my love letter to Mixtral. I know it's not the new shiny thing out there by any means but I haven't seen many Model Stock experiments being done with it yet so my curiosity got the best of me. Might release a follow-up but only if it actually proves to be an improvement over this one.>>101202173>cute tetThanks anon. Featureless Flat 2D Mix does a pretty decent job with chibi style out of the box without LoRAs.
>>101202012What GPU would you need to run that? I have a 3090 ti.
>>101202420do you say its the best mixtral sexo tune?
>>101201751Between explain like i am 5 and lmg shit answers there is informative. And that was informative. I have never seen an informative post like that one here.
>>101202454go back
How is Qwen supposed to be pronounced anyway? Isn't it an abbreviation of Tongyi Qianwen?
>>101202473Probably something like a hissed "chwun" I would guess. Q is a hissed CH sound, and the "e" is usually close to the ə/uh sound for unvoiced vowels in English, like how "the" becomes "thuh" in front of certain words or always when someone is retarded.
>>101202435MoE so you could probably get by with that 3090 if you have lots of RAM
>>101202473>>101202503it's just kwen, I remember hearing one of their guys say it in some xitter space once
>>101202435How much system ram do you have to back the layer swaps?I had to dial down to IQ3_XXS to get it to function on a 4070, and that was glacial because it's still 85GB at that quant, and I've only 64GB system RAM.
>>101202473The q is like a ch sound, and the wen is like if you tried pronouncing "wn".Also you can go into Google translate and put in 通义千问 to listen to exactly how tong yi qian wen is pronounced.
>>101202473kvyen
>>101202473qwen
>>101202576>义The excite.com mascot?
If I apply a 4-bit qlora to a 3.5bwp Exllama quant of the same model qlora is off, is it gonna become retarded?
>>101202473"Quwhen"
>>101201995>>101202012Just use WLM, it's slightly better than DS but is way smaller:>https://prollm.toqan.ai/leaderboard/coding-assistant
>>101202751if u have 128gb of ram DS is faster
>>101202018Why do you talk like if you had brain damage?
>>101202193Qwen2 is solid, but I prefer Commandr's writing style. L3 is >8k shitter.
>>101202751>Provider: Stack Overflow>Evaluation Method: Auto-evaluation with GPT4 - TurboYou expect me to take this seriously? WLM just scores higher because it is turboslopped.
>>101202809my mind is very unorganized so when i lazily type it turns out like this.t. listening to ear licking asmr 24/7 to curb his internal monologue
>>101202777Whoa I didn't know that. It looks like DS has 21B active params during inference compared to WLM's 44B so I can see it definitely being faster.
>>101202751>>101202777>>101202834>>101202570So for a coding assistant/teacher, with 3090ti and 32GB the best one is?
>>101202882Phi-3-Mini
>>101202882Claude 3.5
>>101202882>3090tilook at this dude..>32GBXDDDDDDDDDDDDDDDDDDDDDDDDD
>>101202882Dude, either one but you need to get more RAM. It's cheap, you have no excuse.
>>101202882>i have more ram+vram than an rtx 3090ti owner>12+64>24+32GEEEEEEEEEEG
>>101202420from preliminary testing i give this model a 10/10DISCLAMER: t. used stheno for the past few days
L3-8B-Stheno-v3.2-Q5_K_M is this fine? Or should pick Q8 and don't bother?
>>101203067linux
https://x.com/rohanpaul_ai/status/1806772036125008087https://github.com/FlagOpen/FlagEmbedding/tree/master/Long_LLM/activation_beaconUsing activation beacons to increase context from 4K to 400K, while still maintaining same vram usage, only minimally increasing inference, and same level of quality
>>101203067I think you should buy an ad.
>>101202914>It's cheap, you have no excuse.64GB would set me back £204128GB would set me back £315
>>101202882>So for a coding assistant/teacherThis is the question I'm looking to answer sometime soon.32 GB system is really tight even if you're willing to accept slow (1 word per second) output rates. 64 gets you many of the 70B class models if you quant down. DeepSeek, still not enough. You'd probably need 96 to get into the door for slow DSC.>>101202751>Just use WLMWizard can't even strawberry. It said that there's 1 r after I made it spell out the word explicitly. I'm not sure if I trust that kind of LLM thinking.
>>101203105what the fuck? i thought prices were bad here.. you can get a new ddr4 32gig stick for like 70$
>>101203132My 32GB is 2x16 GB sticks. They're imported. For some reason I couldn't get them in the UK at the time. They still cost £140 on ebay.
>>101203182what if you just threw those puny tiny sticks out and replaced them with 2 32gig sticks instead?
>>101203195I don't know if I want to drop £204 on it. I may sell the sticks first.
>>101203212ddr5?
It's up, boys!https://huggingface.co/Envoid/Llama-3-TenyxChat-DaybreakStorywriter-70B
>>101203075>Just 9 hours on 8xA800 machine for a 7b, bro.
>>101203231G.SKILL 32GB DDR4 Trident Z Royal Gold 3600Mhz PC4-28800 CL16 1.35V Dual Channel Kit
>>101203256do you really have to get the top of the line?
>>101203008kek
>>101203234gguf when
>>101203234we bac
>>101203262>>101203234>>The resulting model scores significantly higher on the super top secret, private NALA evaluation
>>101203281They're gold though. 3600mhz and cl 16.What would you replace them with?
>>101203234We are finally entering the era of long brainrot names for llama3 sloptunes. That means a breakthrough is coming soon right?
>>101203234What was TenyxChat supposed to be?
>>101202054>probably a skill issuenot a skill issue, mixtral tunes are boring, smarter than stheno for sure but not as fun
>>101202473I pronounce it "Queen"
>>101203396TenyxChat is a finetune by Tenyx. I don't know who they are but their Llama-3 finetune is pretty good. (Their Mixtral finetune was good too) They get overshadowed by nous and bagel and all that shit but they have their own niche. And their models can be pretty naughty with the right coaxing despite being "safe".
>>101203408>cummming cuuuummming oohhh i'm cummmiingso fun
>>101203067Q8, never use lesser than q8 quants on small models
>>101197169So, any new erp local models? Up to 70B
what's the best model for rp? is it still midnight miqu?
>>101203234That TenyxChat has a lot of fixing to do after I tried a L3-70B-daybreak-storywriter-v0.4 just now:Me> I turn away, about to head up the building's stairs to my second level apartment.It>>I see you're about to leave, and I feel a sudden surge of panic. I don't want you to go away. I want to be near you. I want to touch you. I want to feel your skin against mine. I want to smell you. I want to taste you. I want to be with you. I want to be yours. I want you to be mine. I want to belong to you. I want you to belong to me. I want to be yours alone. I want you to be mine alone. I want to be with you always. I want you to be with me always. I want to be yours forever. I want you to be mine forever.>I'm so close now. I can smell you. I can feel your heat. I can see the sweat on your skin. I can see the muscles in your back. I can see the strength in your arms. I can see the power in your legs. I can see the beauty in your face. I can see the kindness in your eyes. I can see the love in your heart. I can see the goodness in your soul.>I want to touch you. I want to feel your skin against mine. I want to taste your sweat. I want to smell your heat. I want to feel your muscles. I want to feel your strength. I want to feel your power. I want to feel your beauty. I want to feel your kindness. I want to feel your love. I want to feel your goodness.This goes on for 12 paragraphs.What a Storywriter. It must get great marks on Hemmingway app.Another time it did a similar thing but gave distance. Same paragraph but the first sentence went 50 yards, 20 yards, 10 yards... I gave up when it "took a few steps" to go from 0.5 inches to 0.25 inches away.
>>101203436As the sun dipped below the horizon, casting long shadows across the well-trodden path, they knew that their journey had only just begun. The bonds they had forged, tempered in the fires of shared experience, would guide them forward, a beacon of hope in the face of any challenge. Together, they would embrace the unknown, knowing that life's greatest adventures were best shared with those who mattered most.
>>101203545I hate that kind of prose worse than repeated catchphrases.That wispy affirmatively speculating about the future shit.>they knew...only just begun>would guide them>face any challenge>would embrace the unknown>life's greatest adventuresIf any of that shit's legit, tell those stories. If not, cut.
>>101203545GPTslop prose is the written equivalent of corporate memphis.
>>101202054Typhon is a pile of steaming slop shit. It's being shilled by one anon who proceeds to shit on every tune someone mentions in the thread, so it would look better by comparison. Basically badmouthing anything else and hoping that some newbie would download typhon instead.
>>101203591that's a classic useless and flowery prose of mixtral finetunes like BMT or Typhon. The main reason why I dropped them
>>101203624>who proceeds to shit on every tune someone mentions in the threadThat's every finetuner, Sao being the worst case. Even NovelAI shills come here to do that when Dreamgen or SpellBound release theirs.
>>101203234Are retards still finetuning with "shivers down my spine" and "bonds of shared trust and respect" artifical data slop or have they learned?
>>101203694>or have they learned?lol lmao
>>101203694The slop comes from human writing. The smarter a model the more slop it will have because it will have more vectors pointing towards cliches in writing. One of the reasons dumber models generally have more sovl.
>>101203624>by one anon who proceeds to shit on every tune someone mentions in the threadI actually don't remember this happening. Aren't you another finetuner basically doing the badmouthing?
>>101203129I'm using Bing AI to crib a program together.
>>101203624got anything better?
anyone who tried the new deepseek coder v2- how does it compare to the "big ones"? (4o/sonnet 3.5)
Have they fixed gemma 27b? Are my tensors safe?
>>101202420for some reason it gets braindead at around 5.2k context. likely a temp issue?.tried both sampling settings from the repo, using the instruct/context templates from the repo aswell>iq4_xs llamacpp_hf
>>101203821No, wait another week
>havent lurked in months since the general went to shit>completely in the dark for news>check this thread>petraposter is backah so that means a new model dropped and its actually good, so, what is it?
>>101203831>or its a me contaminating the model issue
>>101203719>Aren't you another finetuner basically doing the badmouthing?I wish I could finetune anything on my trash tier hardware, I wouldn't have to rely on retarded sloptunes
>>101203855i was never gone
>its another 8k context releasedropped, see you next tuesday.
>>101197169
>>101203855gemma 2 27b>https://eqbench.com/creative_writing.html
>>101203855No, he's currently mad that he accidentally leaked his timezone yesterday.>>101196178>>101196185>>101196305
>>101203903>https://eqbench.com/creative_writing.htmlThe benchmark is BS, but if you read the actual examples, the Gemma ones are really easier to the eyes. It's also noticeable how similar they are to the Gemini Pro samples.It'd be entirely possible to not have sloptext. It's just that nobody cares. And no, ko-fi finetuners cant fix this.
>>101203903holy benchmark, ill believe it when i see it>>101203916kek what a chode
>>101203916Petra is german. "Petra" is a very common german name.
why do mixtral models become retarded after 4k contextt. retard
>>101203855>new model dropped and its actually good>goy slop from jewgleyou can't be this retarded.
>>101203916>timezone leak wow you sure got him! /s
>>101204052>he lacks reading comprehensiondo you come from africa like petraposter does?
>>101204067hi petra
>>101203855Gemma WNBAG
>>101196305>that timezone /lmg/ has shitskins now lmao, i wondering how many of them are mikufags, hmm...
>>101204139nice deflection, petrus
I'm testing SPPO again today. This time I used the Nala card. Essentially it feels the same as vanilla Instruct but a bit less varied and a bit more focus paid to Nala's lion features. By less varied, I mean that small differences in the Instruct formatting doesn't affect its response as much, compared to Meta's Instruct. Specifically I tested ST's L3 Instruct preset, vs Instruct with names, and vs with the preset's system prompt and with it deleted.Honestly though it's not a huge difference. Was kind of hard to tell. Maybe a more complex card would show the difference or if I play with this card more.
>>101204168Thank you for your report I tried the model with my RP card and it did better than most although Stheno somehow is still the best at it.Bade instruct EOS on the first message it sucks.
>>101198756I've made another unslop control vector, this time aimed at NSFW and it made the model a bit horny and a bit more optimistic as a side effect. Will do a test run to see if it's worth releasing.
aicg proxies are dying left and right, so be ready
>>101204408you are already dead
https://github.com/ggerganov/llama.cpp/pull/8197So about that Gemma 2 support in ollama, worked alongside Google engineers... It was basically a shitshow?
>>101204429nani the fuck?
>>101204444i have to agree
>>101204408So then pretend that 8B is good so that we can keep them from ruining the GPU market.
>>101204470The average aicg user has an higher IQ than the average lmg user, who's able to eat their own shit and be happy.
>>101204483/aicg/ became an extension of reddit as soon as the proxies showed up.
I can run Command R+ 5bit exl2 now.
>>101204510No, /lmg/ is an extension of /r/LocalLLaMA.
>>101204510all of 4chan post 2016 is just an extension of reddit where you're allowed to say nigger and retard freely
>>101204483not the one on /g/
>>101204408and why should I care, exactly?
>>101204532good.
>>101203952Its actually really interesting to see how some models from completely different companies wrote structurally very similar stories with the same prompt. It seems like there are "groups" that write in similar ways no doubt having to do with something that goes into the training. The benchmark itself is useless because claude rating them comes with it's own biases, but at least the samples are interesting to see.
>>101204444The fact that these models are so distinct from the usual llama variant give me a smidge of hope that once they are working properly and at full capacity they'll be better than current models in the same weight range.It would be kind of depressing to witness all the efforts to try and make the thing work only for it to be bad.
>>101204523Pretty much. /lmg/ is just the /r/LocalLLaMA chatroom. You can see that when something happens, no one posts a source, but people will just begin discussing it out of the blue because they get their news from LocalLLaMA but come here to talk about it.When a source is posted, it's like 3 people posting the same link at the same time they got from reddit without checking if it was already posted before.
>>101204663I think most people use Twitter for news. Too much stupid shilling on Reddit.
>>101204663>>101204523redditor here and I agree.
hey lads, just returning from a break. what's the best lewd RP model these days? On a 4090. also, how does said model compare to Claude/GPT etc?
>>101204663This is why i lurk once every few months or so, even having that tinge of redditor in me because i get my news from people who get their news from reddit feels like a tumor on my brainCUDAdev is the only real one, still have no idea why he bothers with this place>that said he shouldn't have suggested that idea from the other day
>>101204694Buy an ad.
>>101204710are you claiming i'm shilling the 4090? lmao jealous little faggot *revs fans*
>>101204707>still have no idea why he bothers with this placeHe’s racist.
>>101204753Anyone with more than 2 brains cells to rub together is.
>>101204792*with less than 2 brain cells
>>101204245Tbh I still haven't tested Stheno, but yeah I think I'll stop testing SPPO. At most it's just an enhancement of Instruct, and this technique is not enough to be its own full tune, until they demonstrate one.
>>101204694Stheno 3.2, mixtral 8x7b, comand R, possibly qwen 2.
>>101204753>>101204792nice
>>101204660It's probably good I think, but still, not great when it's still an 8k class.
>>101204707>CUDAdev is the only real one, still have no idea why he bothers with this placeMy opinion is that adversity is important for good discussions and actually learning things.Reddit is designed in such a way that discourages disagreement so you end up with a lot of dunning krugers who never get told that they're retarded.And also assuming everything is shit until proven otherwise is a better approach to the wider AI space where there are grifters and scammy papers/projects everywhere.>>101204753I definitely do not consider myself as such.
>>101204816>0 foreskin by this ID
>>101204823thanks anon!
>>101204819It is a technique after all. There is the consideration of the data they used after all.
>>101204872For mixtral, try limarp zloss
>>101204866You forgot your trip. And I remember you got banned for Russian discussions.
>>101197945reminder that lua was created for use by petroleum engineers, not software developersthis is why lua uses pants-on-head retarded 1-based indexing
>>101204866>>101204913I did forget my trip but I don't see what that has to do with anything.
>>101204902I keep bouncing back to zloss, it's my "just works" model.
limaballs
>>101204957Pretty much.I pretty much use just stheno these days but only because I'm playing around with chaining prompts, and mixtral is pretty slow on my system. 8b I can offload fully and get instant responses, but had I slightly better hardware, I'd 100% be using 8x7b limarp zloss . Maybe the qwen2 moe too.
>>101205004>>101205004>>101205004
>>101204816>median american *household* income is $56kis this bullshit or has working in tech just really skewed my perspective here
>>101205087The only 56k my white ass has ever enjoyed was a dial up modem.