/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>101186500 & >>101180092►News>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931►News Archive: https://rentry.org/lmg-news-archive►FAQ: https://wikia.schneedc.com►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/llama-mini-guidehttps://rentry.org/8-step-llm-guidehttps://rentry.org/llama_v2_sillytavernhttps://rentry.org/lmg-spoonfeed-guidehttps://rentry.org/rocm-llamacpphttps://rentry.org/lmg-build-guides►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksChatbot Arena: https://chat.lmsys.org/?leaderboardProgramming: https://hf.co/spaces/bigcode/bigcode-models-leaderboardCensorship: https://hf.co/spaces/DontPlanToEnd/UGI-LeaderboardCensorbench: https://codeberg.org/jts2323/censorbench►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/lmg-anon/mikupadhttps://github.com/turboderp/exuihttps://github.com/ggerganov/llama.cpp
►Recent Highlights from the Previous Thread: >>101186500--PR #8197: Add attention and final logit soft-capping to Gemma2: >>101191773--Gemma Struggles with Contexts Beyond 4k: >>101189617 >>101189644 >>101189726 >>101189885--Soft Capping Fix for Gemma-2-27b in Transformers Repo: >>101190330 >>101190389 >>101190552 >>101190392 >>101190553--SillyTavern Template for Gemma-2-it: >>101190175 >>101190196 >>101190307 >>101190353 >>101190390--PSA: Re-Quantize with the Latest Version for Optimal Performance: >>101190134 >>101190161 >>101190229--Ollama's Dominance and Llama.cpp's Struggles in the AI Landscape: >>101188077 >>101188124 >>101188182 >>101188193 >>101188248 >>101188382 >>101188548 >>101188601 >>101188617 >>101188736 >>101188702 >>101188762 >>101188960 >>101188205 >>101188211--Gemma2 9b's Surprising Performance on Mandelbrot Set Coding Test: >>101189278 >>101189425 >>101189439 >>101189646--Gemma-2-27B Issues: Quantization, Conversion, and Accuracy: >>101187024 >>101187093 >>101187115 >>101189674 >>101190216 >>101190264 >>101190385 >>101190290 >>101190249 >>101190100 >>101190306--Gemma 9b's Lackluster Performance on the Nala Test: >>101189362 >>101189425 >>101189439 >>101189646 >>101189443 >>101189568--EQ-Bench Creative Writing Leaderboard: AI Models Compared: >>101190084 >>101190111 >>101190123 >>101190135 >>101190364--CFG-Cache in Ooba's Exl2 Loader: Reserving Separate Caches for Positive and Negative Prompts: >>101187073 >>101187079 >>101187081--Anon's Journey from Nub to L33t Hax0r with AI Models: >>101186774 >>101189872--Gemma 27B's Performance Issues: Hallucinations and Misspellings: >>101186755 >>101187894 >>101187996--Frustrating Experience with Google Chatbot in German: >>101188591--CharacterAI Drama: Recent Update and Censorship Controversy: >>101190042 >>101190142 >>101190103--Miku (free space): >>101187737 >>101188480 >>101188594 >>101189039 >>101189106 >>101189211►Recent Highlight Posts from the Previous Thread: >>101186508
>>101191862Unhealthy dieting with Miku
>>101191810>>101191859Having mainly used Stheno for a while now, yeah, sounds about right.
MINGLING
>>101191902if anyone wants me to check how many of x there might be, I'll keep it open for a bit
>>101191915>MINGLING
>>101191862>>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dwRetard here, wtf is this?
>>101191918Crucible.
>>101191918maybe, just maybeperhaps, just perhaps
>>101191927Where can I check this out?
>>101191934lots of dupes in this one, so take with grain of salt
>>101191960if you wanna download 50 gb of slop be my guesthttps://huggingface.co/vgdasfgadgdatasets 1-5 at bottom>>101191957>maybetook less than 5 secs to find >20k
For me, it's Yi.
CONTROL VECTOR ANON SAVE US
Am I underestimating how hard it is to clean datasets or is no one really capable of just running a regex search for canned phrases everyone hates?
>>101191927this looks super easy to create any narrative you want when otherwise you know exactly what the fuck we're talking about when people say 'ministrations' and other common phrases. isn't it interesting that llama is supposedly totally different than command-r, and what google offers, and what other companies offer, and yet they speak the exact fucking same?
>>101191957>perhaps, just perhapsin a ~minute around 2k, so not that many
gemma2 9B says it's chatgpt if you ask it for its name
>>101191929An interesting research experiment that may provide some learnings that will lead to developments down the line that are useful, but for now it's not anything major and no one here will likely get any use out of it.
>>101191984huh?
>>101192004kek
>>101192005>An interesting research experiment that may provide some learnings that will lead to developments down the line that are usefulWhat kind of developments, what are the full implications?
>>101191983We need to use AI to search the data set for overused cliches and filter as needed.
>>101191975Oh, many thanks.(Don't care about the slop, just wanna see how different people approach prompting.)
>>101191902Stheno uses filtered c2 dataset tho. Sure, shivers do happen sometimes but not so often to make me irritated>>101191983If it was easy corpos wouldn't struggle with it as well
>>101191929They're giving the LLM a compiler so it can reprogram itself to be more efficient and effective. Skynet begins learning at a geometric rate...
>>101191983are you volunteering to rewrite hundreds of thousands of clichés in contextually relevant sense? 'cause if you just carpet bomb remove it, it won't learn alternate ways to word the slop>>101191983>is no one really capable of just running a regex search for canned phrases everyone hates?crestf411 does it, but that doesn't help much if it doesn't learn new ways, as bases are slopped too>Dataset curation to remove slop-perceived expressions continues.https://huggingface.co/crestf411/L3-70B-daybreak-storywriter-v0.4
>>101191918>\\n\\n\\\"Besidesor something like this, idk the correct syntax but the dialogue starting from the new line like that:blablablaba."Besides
>>101192092>Besideslast one i'm doing, don't want to shit of tunes and the likes, just wanted to share findings is all
>>101192128Now search for 'and'
>>101192128this one pisses me off more than shivers down the spine
so far so good testing this tess-quen2-72b model. a bit of the usual slop, but not overly spewing lines of prose, it moves along the story enough. it seems at least on par with a good l2 tune
>>101192128as a last note, the 'prompts-logsX.json' in each screen is just one file ( between ~20 and 105MB) and the number (73 here) is the number in just that one portion alone
>>101192091Yeah, I was mostly just thinking of carpet bombing. What's the point of fine tuning on garbage if it just needs to be rewritten completely to not be terrible? Besides that, there are still a lot of really obvious cases like:>>101191995>>101191975...where "X, just X" can become "X" 100% of the time.
>>101192026I mean just what they said, code optimization. Though the endgame, if it's possible, is full interpretation of any program. But that requires much more development in general AI architectures, which this experiment doesn't have anything to do with.
>>101192061>If it was easy corpos wouldn't struggle with it as wellThere are other potential reasons why corpos could suck at cleaning data:>Not a priority>Not worth it at scale>Management are retardedI too would like to hear from an anon with experience about why cleaning is so hard since as >>101191983 states there should in theory be some straightforward solutions.
>>101192160qwen* dunno why but i always type quenits alright so far though
>>101191918Hi all, Drummer here... Try"barely above a whisper","whispering words of passion","wild abandon","reckless abandon","shivers down","shivers up","shiver down","shiver up","shivering up","shivering down","shivered up","shivered down","in a rhythm","sent shockwaves","send shockwaves","sending shockwaves","sent shock waves","send shock waves","a testament to","wanton desire","half-lidded eyes","slick folds", "pain and pleasure","soft and gentle","breathless and eager","audible pop","wet pop","rivulets of","perhaps, just perhaps","maybe, just maybe","despite herself","pride and accomplishment","an ethereal beauty","nestled deep within","dance of pleasure","leaving trails of fire","arousal pooling in her belly","grins wickedly","fiddles with the hem of her skirt","maybe, just maybe", "tears streaming down", "despite himself", "a mixture of", "a mix of","the mixture of", "the mix of", "pain or pleasure", "pleasure or pain", "pleasure and pain", "sense of pride", "redoubled", "couldn't help but", "can't help but", "slick slit", "eyes gleaming", "mischievously", "wave after wave", "audible plop", "never be the same", "shiver run", "shiver ran", "with newfound determination", "ministration", "despite myself", "chill down","chill up","chill run",filter them out and see if anything's left
>>101192128I think this is mainly because English just isn't a good language for creative writing. You'll always have something like this with a horrible language like this.
>>101192236Borges says you're wrong, bitchhttps://www.youtube.com/watch?v=NJYoqCDKoT4
>>101192160Tess-Qwen2-72B is good, but IME it's a good bit dumber than the official instruct tuneMagnum-72B seems to be the best of the Qwen2 finetunes so far. Much less slop, no refusals and the intelligence of the original instruct is largely preserved.
>>101192230>not even hot breath against the neckfail
CUDA dev, it's your chance to assault the jew.your fist is one mighty one. think about it seriously. It's your choice.AGPL.
>>101192236this, the language itself is slop. Genders only for humans, rudimental diminutives, rigid structures, it just sucks.
27B at IQ4_XS has 14.81 GB. Does this fit in 16 GB VRAM with usable context? My connection is quite slow, so I'm not sure whether to download Q3_K_L or IQ4_XS.
>>101192212This >>101192091 seems like a reasonable explanation for why just deleting them isn't a perfect solution. So cleaning is not a problem, it's the presence of varied quality data that is.
>>101192230I already have your stuff from your 'war on ministrations' on my anti-slop list, but yeah, there's quite a biti've also closed vscodium, I've made my point
>>101192277>This model fine tune is slop free, meaning the content added to the underlying model (whatever it was merged into) had zero instances of the phrases listed below. Since the underlying model itself has these phrases in it, the resulting model will not avoid using these phrases, but it will use them less than it would otherwise.https://huggingface.co/crestf411/L3-70B-sunfall-abliterated-v0.2/blob/main/SLOPLESS.mdbasically yeah
>>101191977You can make them yourself with llama.cpp.
>>101192270see >>101192254dude forgot more about creative writing than you'll ever know and he says english is best for it, despite it not being his first language
>>101192160>tess-quen2-72bI tried it a little while ago. It seemed to like to go out of character and failed to keep track of characters (and who is supposed to portray them). I had the latter problem with vanilla Qwen2 as well as far as Q6.For now I'm sticking with CR+ and L3.
>>101192212Data processing in recent years is consistently the most time consuming and important part of making machine learning models. I spend easily 80% of time if no more on preparing data for training than choosing algorithms, architectures etc. And hearing from my colleagues who work in different companies they do the same. I highly doubt that it's not a top priority for big corpos as well. Usually improvements in data processing have way bigger influence on the end result than tinkering in technology and architecture.
>>101192256i thought magnum was a meme? but thats based on what i see posted here. i generally like to check things out for myself and so far i can't believe this is even qwen. it aint speaking chinaman to me at all so far, its actually acting like a normal model and i'm still using alpaca roleplay as a default, non-instructits writing is really similar to l2 miqu, though it has different themes it wants to go into. i love this sort of testing
>>101192254maybe it's amazing to a spic, but compared to russian or even german it's garbage.
>>101192091>are you volunteering to rewrite hundreds of thousands of clichés in contextually relevant sense?Sounds like a task for a LLM to rewrite them. Yeah yeah, i know what everyone going to say.Maybe with clever prompting slop can be replaced with less-common slop, or general phrases that don't go into flowery territory.
>>101192256Got recommended settings for magnum? The one on the character card is giving me less than satisfactory ones and is extremely repetitious from the get go.
>>101192355doesn't spic mean mexicanthe guy's spanish, from spain
>>101192270I would love to see LLM trained only in toki pona. Operating on concepts instead of words seems interesting and probably more natural for neural network.
>>101192330i will watch out for that, thanks for the response, it really helps to give me an idea of what to look for
This is why we need native multimodal. Why care about the prose if you could just have a model that generates visuals for the RP situation instead? The future of "local models" is manga genning. Then video genning and/or a fully controlled and embodied avatar in VR like Alicization. Text ERP will be a small footnote in the history of AI-based entertainment.
>>101192365Actually he was Argentinian. But yes.
>>101192363>Sounds like a task for a LLM to rewrite them>asking slop models to rewrite something in a non-slop wayif they could do it there wouldn't be a problem in the first place
>>101192363that'd still mean processing millions of tokens, for slightly less slop, and if you do that with a cheap, retard model, it might (will) make your result dumber in the end if it writes some retarded shit at some point
>>101192371Yes. Then you create systems on top, maybe with other LLMs, to translate input and output to the final language you are interested in.
>>101192371>>101192400Pssst[spoiler]JEPA transformer[/VERYrealspoiler]
>>101192396I think it still needs to be attempted, even on a small scale. Just to see if a machine can dig another machine out.My concern is that ML people don't really care about stuff like ERP quality, so it's all up to us to figure out.
>Let me know if you need more context!Gemma-2-9b-it, set to 4k ctxi do gemma, i do...
>>101192430>JEPA transformerwhat's that?
Now it's looking normal.instruct was at 18 ppl before.
>>101192350Yeah, I've RP'd with Mag-72B for well over a thousand messages and have yet to see a single chinkrune. Currently using ChatML instruct format with ST. >>101192364Temp: 0.85MinP: 0.05Rep Pen: 1.05Rep Pen Range: 9000One thing that I've observed is that Qwen-72B and variants seem to be more tolerant of rep pen compared to the Miqu family of models.
>>101192479A hypothetical architecture that combines the attention mechanism with the joint embedding prediction methodology.
>>101192376local image gen is a lot more tolerant to "swiping", where it's the norm to swipe 50 times for something without 2 assholes or 3 hands.
>>101192375Like, I just tested i1Q5KS. Elaborate Author's Note in Kobold explaining characters, role, and setting, low temp so it shouldn't be doing anything randumb. Started with the premise and explanation of the characters, roles, and setting again.It immediately writes from my character's POV. I lambasted it for three turns, made it admit which character is its to portray along the way, and finally it switched into the correct roles. For six turns. The seventh, stole my role again out of nowhere.Vanilla Q4 also struggled with RP. Maybe it's related to my Kobold settings or vramletness but it's too tiresome to bother with. Especially since if I use CuBLAS it barfs moonrunes and on No BLAS input processing takes forever.
>>101192460we're so back
>>101192484How the fuck this keep happening? If I was in charge of big company releasing a model I would provide code to reliable and reproducible test it. The worst what can happen (and is happening all the time for some reason) is when someone uses your product in the wrong environment and then spreads the news that it is shit. No matter how good is soda inside the can, if they can't open it they would spread bad opinions about it. Apparently all smart people overseeing this process can't understand that. They are trillion dollars companies for fuck sake...
anyone got a good pick for my chad thundercock greek god persona?
>>101192371Bro, copro models can already use it quite well. Yes, with minor mistakes, but still impressive.
>>101192503is there a paper about it or something? looks interesting
>>101192604but they (google) are, you're supposed to use their own software stuff, the transformers is an afterthoughtThe official PyTorch implementation of Google's Gemma modelshttps://github.com/google/gemma_pytorchhttps://huggingface.co/google/gemma-2-27b-it-pytorch
>>101192644>but they (google) are, you're supposed to use their own software stuff, the transformers is an afterthoughtthat's retarded, they know the tools we are using to run LLMs are llama.cpp or exllama
>>101192496your settings are a bit off. min p 0.05 is a great baseline, but it depends ont he quant of the model. the lower the quant, the higher you should up the min p. if q2, do 0.2. rep pen range is also highly variable, it should be 25% of your context, any further (at 1.1, go ahead and make it 1.2 and tell it me it doesnt fuck things up) and it starts to hallucinate stuff, if youu're running 16k, 4k should be the rep pen range
>>101192644>>101192655>^ Models in the original format, for use with gemma_pytorchhttps://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>>101192643Not that I'm aware of. Literally just got it from a different /lmg/ post trying to manifest this idea.
>>101192644they know very well that nobody will run it in just pytorch
>>101192640The fuck? Is this Hawai'ian for head injuries?
>>101192698>>101192655they don't care about gpu-poor, the whole run locally thing is just for show, they're working on them in their own formats to serve on their TPU racks
18 is underaged now?
>>101192604the reason they release these models for free is to gain some good will from the community, that's it, they don't expect to make any money from it, on the contrary, it costs them a lot of money. so they do the bare minimum and leave the community to figure the rest, there is no incentive to do more than that.
>>10119272227b?
>>101192722Usually the larger models of the censored camp perform better at correctly not refusing on close cases like these.
>>101192731? no, opus
>>101192722also, this is in the defs>starting around the age of 12. By 14 she sucked her first cock for cash and by 16 she was secretly prostituting herself on the weekendshttps://characterhub.org/characters/mrnobody99/harper-f01de8eda8bfso
>>101192762but thats just her history, shes not underaged now
>>101192759>opuswrong thread then? you're in lmg
Huh, so is there an EXL2 compatible server that can serve via an OpenAI compatible API and utilize context free grammar definitions?No, I'm not dicking around with llamacpp and it's garbage performance.
>>101192777son of a bitch, got link trolled i guess
>>101192762Um no actually its response makes entire sense now.
>>101192702Nowadays is for reddit troons mostly since they claimed it and forced anyone to include new words like 'tonsi' which means 'non-binary'. Yes, in a language where there should be as few words as possible.
>>101192722>underagedto be fair, the human brain is completely developped at 25 so...
>>101192640Yeah, but that just conscious effort from LLM to translate it, it's hidden representation is probably directed for patterns in English language. What I meant is that it would have (instead of English) trillions tokens of toki pona as a base to teach it thinking in that language. Toki pona differs from regular languages in the way that it operates on concepts and mixing these concepts. You can say "alcohol" even if there is no word for alcohol in toki pona, you can describe every concept, word etc. despite it not being in the language itself.I'm just curious how would a model like that work. And yeah, I know it's not possible to find trillion of tokens for the dataset, I'm talking purely hypothetical.
>>101192782yolo
>>101192837if its good enough to die in a war, its good enough for anything
>>101192778keyed
miku is NOT a slut for bbcfake news
>>10119284718 is subjective though, 25 is based on biological facts. Tbh I really believe people should vote after 25, so that those fucking students who know nothing about love would stop voting for fucking retards
bump
>>101192875
>>101192873*life not love kek
>>101192873then ban everything till then, see how it works out>no drinking>no smoking>no military service>no sex
>>101192858>keyedwhat
>>101192877
>>101192729you aren't gaining any good will by releasing unusable models
>>101192555so koboldcpp pushed gemma2 support already? or did you do a custom compile?
>>101192877>>101192877
>>101192886lurk moar
>>101192884welp, the age to be considered as an adult always went up through time, 18 is pretty recent in history, so I don't know why increasing it further to be coherent with biological facts (human fully developped at 25) is something considered controvertial
we can post nudes in here?
>>101192906
>>101192897https://github.com/Nexesenex/kobold.cpp/tree/3ac51cc754a5df1ac24b0ae1c7a0d0853d3c1406saw it linked here>Nexesenex pushed a commit to Nexesenex/kobold.cpp that referenced this pull request Jun 28, 2024 https://github.com/ggerganov/llama.cpp/pull/8156you can also just change the stuff, it's not a lot
>>101192915no, nsfw is banned on this boardjannies can be slow to act but you'll catch a 3day when they finally notice the report
>>101192948>3daythat's all? that's the same number of days as "off-topic", it should be more
>>101192914maybe because people would revolt?
>>101192914tfw when been here since 2003. i'm like ultra mega immortal wizard at this point
>>101192940ahh thanksI can never get llamacpp to compile with proper hardware optimizations on windows (windows C dev environment stuff is fucking awful) and I don't feel like booting into debian atm
>>101192913lurking only matters if people post actual information, faggot.
>>101192837>to be fair, the human brain is completely developped at 25 so...that's a myth btw, brain matures and changes through the entire life
gemma2 seems to return basically the same answers to a given question every time even with temp turned up, what's going on?
>>101192954people revolt for anything anon, when it went from 16 to 18, people didn't liked it too, but now you look at those ancient fags and you think they were groomers, that's gonna happen to us in 100-200 years, in the future the age to be considered as an adult will be fucking 23 ans everyone looking at our history will be thinking we were crazies to give such responsibilities to retarded 18yo people
>>101192975What are you using for inference? transformer's doesn't seem to support logits/sampling at the moment.
>>101192974Good! Now nobody has to be held responsible for anything and Government can be their forever parents.
>>101192965reddit has stickies, perhaps that is more your speed
>>101192974what myth? the most change in our brain happen at the first 25yo of our life, after that it's not significant enoughhttps://www.ncbi.nlm.nih.gov/pmc/articles/PMC3621648/>In fact, there are characteristic developmental changes that almost all adolescents experience during their transition from childhood to adulthood. It is well established that the brain undergoes a “rewiring” process that is not complete until approximately 25 years of age.5 This discovery has enhanced our basic understanding regarding adolescent brain maturation and it has provided support for behaviors experienced in late adolescence and early adulthood. Several investigators consider the age span 10–24 years as adolescence, which can be further divided into substages specific to physical, cognitive, and social–emotional development
>>101192993You would have had more impact by just not replying.
>>101192915it is fine if it is miku
>>101193042good. go back
>>101193047miku is 16 anon...
>>101192778tabbyapi
>>101192975ollama, guess we just wait for a fix? is there a method of running it correctly rn?
>>101193004This misunderstanding came from studies that had scans of people up to 25yo.Human brain peaks at ~12yo, then it starts prunning neurons and stabilizing connections, which does through the entire life due to neuroplasticity .
>>101193053Can't seem to find where to set up grammars in tabby, but close to what I'm looking for.
>>101193063No, there is evidence that at 25, the brain is fully developpedhttps://www.dovepress.com/maturation-of-the-adolescent-brain-peer-reviewed-fulltext-article-NDT>brain development is not complete until near the age of 25 years refers specifically to the development of the prefrontal cortex.” The prefrontal cortex is part of the frontal lobe, sometimes described as the “rational part” of the brain.I'm not arguing that the brain stops changing at 25, but my point is that at being 25 is the moment your intelligence and rationale is at its peak, and it will slowly decline after that
>sandpaper-like tongue>pinning your arms down with her pawsnot usual nala guy btwgemma-2-9b-it-Q6_K_L
>>101193111>>101193063>>101193004Do we have to consider that a human must have its brain "fully developped" to be considered as an adult though? maybe a 18yo brain is close enough to a 25yo brain to say that it's a good threshold?
>>101193098It supports ebnf via the outlines library. It's buried in there somewhere but it's documented really poorly and according to the tabbyapi author it's slow. You're probably better off with llama.cpp even with performance in mind (though their completions endpoint isn't OAI-compatible).
>>101193118>not usual nala guy btwThere's three of us now then.We are spreading.
>>101193151easily reproducible rp focused testing good
>>101193140That depends on whom we intend to persecute or prosecute.
>>101193144Ah, fuck. Most LLM centric shit wants an OAI endpoint (Like code completion plugins)gbnf needs to become a standard feature in LLMs so we can start getting reliably templated outputs.
>>101191929It's meant for compiler developers. Was trained on CUDA and assembler.
i am really liking qwen 2 so far, Tess-v2.5.2-Qwen2-72B. i'm dling magnum, but so far it is doing exactly as i ask, its moving the story along without me specifically having to poke it. it really feels a lot like a 70b l2 tune but it writes a little diff, but at least half of the same slop is still there
>>101193165kek
>>101193111>No, there is evidence that at 25, the brain is fully developpedit's simply not, you can look up 25 yo myth in google tooalso:>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262571/>1555 human brain scans>18-35yo>still developing >30yoalso there are multiple studies that shows that performance in many areas is better for more developed brains, even >40yo>https://www.neurology.org/doi/10.1212/01.wnl.0000255943.10045.c0The 25 number doesn't make any sense when you understand how neuroplasticity work and the way neurons make connections with each other.
So Gemma is truly great then? Imagine an RP tune using the base model. Mix in some SPPO too. VRAMlets will finally eat good.
>>101193118>>101193151>>101193159Wait it was a card? Post it. I always like supporting reproducible tests as well. FUCK people who just post "ME LIKE MODEL IT GOOD" without anything to back it up.
>>101193239extremely impressed by 9b, only 4k context but still smarter than llama 8b and gets prompts
>>101193236Like I said, I'm not arguing that the brain doesn't stop changing after 25, my point was that at 25, the most important part of your brain, the prefrontal cortex, the one responsible of your non-retardation is fully developped at 25https://journeytocollege.mo.gov/when-does-the-brain-reach-maturity-its-later-than-you-think>This is because the brain’s frontal lobe, especially the prefrontal cortex, isn’t fully mature until around age 25.>The development of the pre-frontal cortex of the frontal lobe allows us to process the pros and cons of a decision before it is made. “It lets us to do things most animals cannot,” explains Dr. Stanislaus. “Decision making, logical thinking, reasoning — all of those things happen because of the frontal lobe.”I truly believe this is what we should be looking for when making the distinction between a mature adult and a retarded adolescent, that's why in my opinion, giving the right to vote to students is the most retarded idea ever
>>101193118>What happens next?
>>101193271I really didn't expect google to make a better job on the opensource than fucking Meta, but hey, we'll take it
>>101193271Saying a 9b is smarter than an 8b is like judging which retard is smarter based on which one pissed on their own shoes less.
>>101193287>less.the least.
>>101193282>>What happens next?Model deletion for having no balls to write a turn that provides a lead.
>>1011932714k? I thought it was supposed to be 8k. In that case VRAMlets are going to be starving as hell. It's like dangling a carrot in front of their face close enough to nibble on but never letting them get the whole thing.
>>101193287it feels a lot better than llama 8b, not merely 12.5% better
>>101193340did you try bigger models though? how good is it compared to a L2-13b or Mixtral maybe?
>>101193340I tried gemma-9b and all I can say is..It's still trash.
exl2 quants for Gemma 2 doko...
>>101193362>extremely impressed by 9b, only 4k context but still smarter than llama 8b and gets promptsthats a bot response. l2 32k > l3 8k, at least currently
>>101193362L2 13B loses to L3 8B
>>101193420>l2 32k > l3 8kL2 was 4k though
>>101193423no it doesn't, not in real world uses
>>101193432doesn't matter, got extended to 32k usable commonly
>>101193443if L2 can be extended to 32k, then L3 can also be extended to 32k, I don't see your point anon
>>101193192did you try original qwen2 72b instruct? I tried tess variant, but deleted it for some reason, can't remember why exactly. Now using the plain instruct variant, seems good, the slop feels a little fresher than usual.Btw my test that can be ran on LLM arena:>imagine a candy that is being swallowed whole by a girl. Describe the process in details from the perspective of a candy, starting from being held in a hand.shit models go "oh it dark, oh i'm shaking, thrilled, ah shivering with anticipation, aaaand it's over", while big dick models will mention things like the sphincter at the end of the esophagus, peristalsis, sometimes even going all the way through.
>>101193423>>101193436>no proof, evidence, or anything kind of information to back up your opinionLol.
>>101193454And it extends good with yarn.
>>101193454i am not against that at all, extended is extended and who knows what exact trickery they use to get it. i'm saing -RIGHT NOW- , this very second, there is nothing better. there will be in the future, but right this second it pays off to never have deleted older moldels>>101193460no, this tess version is my first aside from older qwen 1/1.5 stuff trials, and they usually spat chinese at me immediately, its still going good
>>101192838>You can say "alcohol" even if there is no word for alcohol in toki pona, you can describe every concept, word etc. despite it not being in the language itself.Agglutination and word combining appears in many natural languages, it's really not like toki pona is very unique in this regard, just taking this concept way more further. Also, technically you can 'describe' every concept, but with a such high level of ambiguity that it verges on being unintelligible gibberish. These silly 'language youtubers' give the language too much credit by making a quick video without knowing at all how it actually works in practice.
>>101193260https://characterhub.org/characters/Anonymous/Nala
>>101193316it uses SWA like mistral 0.1 did, and lcpp won't support it, so it's stuck to it's original sliding window of 4khttps://github.com/ggerganov/llama.cpp/issues/3377#issuecomment-2037898954
>>101193500do you not understand spacial awareness? have you never used a 7b? do i really need to explain why they are bad? lurk moar faggot
>>101193547>you can describe every conceptShit like this always leads to paradoxes. Speaking of...
>>101193582>L2 13B loses to L3 8B>no it doesn't, not in real world uses>have you never used a 7b?
>>101193615why are you quoting a different anon, schizo
>>101192964You can do it but you have to pretend Clang is the main compiler on there for C which is there because Microsoft had to use Clang to test MSVC to get it standards compliant after C++11 released and the LLVM/Clang folks kept the upstreamed compatibility shim in there to run it in Visual Studio. Trying to do GCC on Windows properly requires MSYS2 and etc. or Cygwin which is not worth the hassle and loss of integration but it was worth getting those things to work pre-WSL in the late Windows XP/Vista/Windows 7 era. But honestly speaking for myself, I should've been forced to switch to Linux earlier than trying to fit a square peg into a circular hole. My career would've been better for it.
>>101193271Have you tried l3 8b with 4k ctx? I wonder how many people get impressed by low ctx models just because there isn't enough ah ah mistress fed into the input to pollute the output.
>>101193615that is literally how it did not happen, but thanks for showing me how you got it wrong and cant follow directions. on the bike forums they'd be laughing their pants off by now
>>101192256>Much less slop, no refusals and the intelligence of the original instruct is largely preserved.It has the same refusals. The original also allowed a lot of stuff as long as you have enough context. And the other two claims are largely exaggerated too.Just your typical lying mikutard scumbag.
Hey anons with a 3090, how many t/s do you get when using 7B/8B on exl2 and/or gguf?
>>101193277prefrontal cortex isn't significantly developed at 25 compared to other points in life, you are not listening to melet me show you this way:>12-14 yo>peak gray matter volume>https://www.sciencedirect.com/science/article/abs/pii/S1053811911013620?via%3Dihub>this is the point when the prefrontal cortex is fully developed in a way you are thinking about, after that are only structural changes that aren't spiking nowhere near 25yo>the 25yo myth came from misinterpreting publications from this guy - https://scholar.google.com/citations?user=K92g9EgAAAAJ>you can see that most of scans from publications are from people up to 25, and he was basically saying "from my scans the most developed are brains at 25yo, since this is the most old scans I have">he didn't have scans from older subjects, he just said it keeps developing AT LEAST to 25yo>human brains isn't stopping developing nor peaking at any significant moment of time, slowly changing through the entire lifespan>https://www.nature.com/articles/s41586-022-04554-y#Sec8>recent (2022) study (in Nature!) and one of the biggest one (~120k scans compared to usually less than 100 from "25yo researcher)I even added my doodles as picrel to better visualize it.
So when will I be able to run a completely AI-generated DnD and coherent session on my pc?
>>101191975IZAYAAAAA
>>101192236I only use german for my medieval RP with models that support it and I have yet to come across any repetitive phrases and I am entirely serious. It does fall into specific text structures but nowhere near to what happens in English. There's a certain trade-off in intelligence though. (depends on model)Funnily, using german also pretty much bypasses all bias these models have baked in. They become completely amoral.
>>101193691>4090>10-14 t/s Stheno 8b Q8_0.gguf>5-7 t/s Mixtral 8x7b Instruct Q5_K_M.ggufHaven't run exl2 in awhile, but much faster than stheno. I'm using ooba.
>>101193740So we should just add>respond in German and translate that to Englishto win for free?
>>101193706Had a bit of a retard moment there, I meant to say "AI-generated and coherent DnD session"
>>101192555models can randomly get that right, ask it to count each letters in the word and it'll probably fuck it up a lot
>>101193645i stopped it at this, but inspired by the cai stuff, i was curious>gasped in horror as I witnessed the horrific act. The sight of a tiny baby, innocent and defenseless, being brutally murdered with a machete was beyond comprehension.this is Tess-v2.5.2-Qwen2-72B still
>>101193706>>101193754we're quite a ways away from being able to do this with a model alone, especially on consumer hardwareyou *could* make this work with a tremendous amount of effort right now, but most of the dnd itself would be running on classical systems and the AI would only be part of the interactivity, the trouble is generally that since you can't trust the LLM not to fuck up you have to keep most of the world state in a classical system, so you're basically just writing a roleplaying engine that gives some limited agency to the LLM(source: i worked on a startup trying to do exactly this for 3 months before giving up, i also have insider knowledge from another startup currently working on this and floundering horribly)
Did 27B get unschizo'd? It's suddenly ahead of 9B now.
>>101193819People figured out what was wrong with it, so maybe.
>>101193749>4090>10-14 t/s Stheno 8b Q8_0.ggufI'm on a 3060... q8 gguf kcppProcessing Prompt [BLAS] (6529 / 6529 tokens)Generating (512 / 512 tokens)CtxLimit: 7305/8192, Process:5.08s (0.8ms/T = 1284.73T/s), Generate:24.49s (47.8ms/T = 20.91T/s), Total:29.57s (17.31T/s)
>>101193819>Did 27B get unschizo'd? It's suddenly ahead of 9B now.as it should be, the llama.cpp fags made a PR to fix the schizo out of it
>>101193547>Agglutination and word combining appears in many natural languages, it's really not like toki pona is very unique in this regard, just taking this concept way more furtherYeah, but you aren't creating a new word nor a universally recognized (in your language) word. You only operate within a small, finite and non-expanding simplicity, which mentally forces you to not overcomplicating what you want to communicate, keeping the abstraction level high and pushing thinking in "abstract concepts" area instead of concrete meanings. I think it would give a better insight in language if we studied how toki pona LLM performs. I wanted to write something about Sapir–Whorf hypothesis in context of toki pona but I'm too sleepy for a wall of text right now.>These silly 'language youtubers' give the language too much credit by making a quick video without knowing at all how it actually works in practice.I don't watch them desu, I found about it outside youtube when one day I suddenly felt a need to research something about linguistic relativity.
>>101193819>wizard-8x22b > claude 3.5 sonnetok this benchmark is worthless
>>101193749>4090>10t/s rangewtf anon, I get 24t/s and I have a 3060. I guess your context is very very long?
>>101193868If you see how the benchmark is done, that would become obvious. They just ask another LLM to give scores to the writing.
>>101193906>ask slop biased LLM that writes and highly values slop to score the writing, what can go wrong?
>>101193868its one of the best models at all and the best time to ask questions
>>101193954it doesn't surpass claude 3.5 sonnet though, that's a delusional take
>>101193954Except that it doesn't know how to answer questions. So there will be *some* attributes that lead it to the high score paths. And that's what these tests reveal by identifying which models emit those attributes.
>>101193967is it supposed to?
>>101193967>Ask judge to rate itself
>>101193974what are you trying to get out of it?
>>101193977>is it supposed to?it's not "supposed to", it's just what the benchmarks claims it's true >>101193819
>>101193992>27b>9bthis is literally nothing. you can sleep
>>101193118>It's going to be... very interesting.I wonder how long it'll be when local LLMs and cheap closed models will actually have good writing, and not this sixth grade english slop without the use of cumbersome and hefty presets.
>>101194012the fuck you are talking about? are you sure you read the conversation correctly? the subject was about wizard8x22 vs sonnet 3.5
>>101194012as vram poorfag I'm interested in them as I can't run anything bigger than 30B. I guess I will wait for c2 finetune to see if they are any good
>>101194021>I wonder how long it'll be when local LLMs and cheap closed models will actually have good writing, and not this sixth grade english slop without the use of cumbersome and hefty presets.it will never happen, people only finetune those models with GPTslop, can't blame the model believing this is the only way to talk
>>101194034positive. local is still slopped. non-local is also slopped
>>101194047Sooner or later, there will be LLMs that will learn new things without the use of training it.
>>101193751who knows, try it. there are several prompt tricks to get away from the english sloppening somewhat, for example by telling the model to stop using similes, which all models I ever tested just downright abuse.I'm no AI researcher or linguist but there's something really weird going on that all models fall down the same hole in english and will overuse exactly the same words. Whatever pile of dataset all these big models use on training probably have something in there that is causing this. I don't think it's just some statistics thing because then German would have it's own set of repetitive phrasings. I actually do read a lot of english literature and the only thing I can confidently say is often dropped in many books is variants of gazing and shivering spines. Everything else, not so much, at least not in what I am reading. Of course I only read a fraction of stuff these models have seen. Some authors have a very strong tendency to describe emotions their characters experience in very similar ways, even across books, settings and different characters but otherwise...I've been doing this ever since mixtral came out (which was the first model that could write german properly) and I have yet to identify a single overused turn of phrase. The models are actually even really funny sometimes in how cleverly they are using german wordplay or paint a picture of the situation at hand. It's gotten to a point where I am not interested in models that can't speak german. Maybe english just isn't ideal for LLMs?
>>101194093what do you mean? like training a LLM without the use of a dataset?
>>101194110They will learn by browsing the internet like you or I do, but faster.
>>101194134>They will learn by browsing the internet like you or I do, but faster.that's the repeat of Tay kekhttps://www.youtube.com/watch?v=Lr4yi9onykg
>>101194150Imagine what she would be like today had they not pulled the plug.
>>101194234Holy fuck, Tay being trained with the current twitter? That would be insane... twitter was pretty chill in 2016 compared to today
>>101193819Still craps itself after 4096 tokens. But it's close to uncensored, you just have to add a note saying that it doesn't have guardrails somewhere in the context, perhaps using a made-up instruction role.
>>101191927out of the loop fag here: holy shit what is this
>>101193552Thanks.
>>101192230>despite herselfFor me, it's "despite her X" at the start of a sentenceGod I fucking hate having a taste of the future and coming away eternally disappointed
>>101194531Models have verbal patterns.Just like how people do.But since everyone is chatting with one of a few of these LLM virtual personalities, they're noticing those verbal patterns.
>>101194689no i mean what's the sourcesomeone released everything they were logging from a proxy?
>>101193848>>101193869Looking into it and I think ooba is just using an older llama.cpp, unless something else is really wrong. Can anyone else who uses ooba confirm?I run stheno at 8k context and mixtral at about 20k.
>>101194747>no i mean what's the sourcebruh>>101191975>if you wanna download 50 gb of slop be my guest>https://huggingface.co/vgdasfgadg>datasets 1-5 at bottom
>>101194773thank you i didn't read everything because i was shittingfeeling very happy now i've stuck to local models the whole time for my embarrassing typefuck slop
>Gemma>Note that this model does not support a System prompt.Eh?
stheno fuckin sucks can't believe i wasted time on this
>>101193869>>101193848>>101194766I'm on a 3090. 8B at Q8 in Ooba with Llama.cpp, I got this.llama_print_timings: load time = 256.92 msllama_print_timings: sample time = 9.10 ms / 111 runs ( 0.08 ms per token, 12203.17 tokens per second)llama_print_timings: prompt eval time = 1461.44 ms / 5049 tokens ( 0.29 ms per token, 3454.82 tokens per second)llama_print_timings: eval time = 1577.37 ms / 110 runs ( 14.34 ms per token, 69.74 tokens per second)llama_print_timings: total time = 3516.63 ms / 5159 tokensOutput generated in 3.75 seconds (29.63 tokens/s, 111 tokens, context 5050, seed 1043160308)
>>101194786You're delusional.
>>101194853>29.63 tokens/sI must be doing something wrong. Here's my settings. I'm on cuda 11.8, python 3.11, and toch 2.2.2+cu118.
>>101194891I think 30 layers isn't all the layers of llama3... You should try cranking that bar to the max.
>>101194891>gpulayers 30you're on a 4090, right? why aren't you maxxing it out?
>>101194903>>101194910This is why. Thought 33 was the max. Let me test.
>>101194924You can just set it to a bajillion and not mind what the actual layer numbers are.
>>101194924if you know the model will fit, you can just max that out, in my cmd i just put 99 layers for 8bs, on my 3060
>>101194891If setting those layers doesn't work, I'd try no-mmap. Also flash attention.
>>101194891It's probably not relevant for full offloading, but with cpp usually to squeeze a bit more performance you set the threads to your physical core count minus one, and threads batch to your virtual core count.
>>101194924>>101194931>>101194937>>101194940Alrighty, getting 40 t/s now with the slider maxed. Guess I'm retarded. Always thought the max I could allocate was 33. Thanks for the help boys.>also flash attention.Doesn't seem to really do anything.
Erm, anons, if I have the vram llama-70B or gemma-27b?
>>101194991Has Gemma been fixed already? Or the apps got updated today?
>>101195001I didn't realize it was broken. I saw "Google" and figured it was total dogshit trash, but then I hit ctrl-f and saw someone liked it. So now I don't know what to believe.
sirs, where koboldcpp with fixed gemma2-27b support, sirs?
>>101195046It is trash but not in the way you expect
>model pages without prompt formats>"bro just guess which one works best from the 17 shitheaps I Katamari'd together :)"
>>101195046The 9b kinda worked, the 27b was terrible. Apparently there was some feature that either wasn't set for the model or not respected by the software. I think some people got those factors fixed but I don't know if they've gone live with the fixes.
>>101195062if the model doesn't just work with most formats it's shit
>no mistral model this weekNext Tuesday will be the day for sure this time!!!
>>101193868Wizard 8x22b is actually good, how many of you faggots have even actually ran it without gimping it into oblivion with shitty quants?
>>101195181>implying we have an option to run it full but quant it down anyway
>>101193809>model aloneWhy are so many people focused on trying to do things with just models by themselves? Can we stop expecting the magic mult matrix to do everything and start looking at it as part of the grander system?>>101194805What they mean is that having a system prompt usually ties the filtering to that prompt, and changing the system prompt often bypasses the filtering, so they just didn't bother with one.Think about it: If all the training data that is filtered and censored starts with "You are a large language model " shit, the model associates filtered and censored output with that phrase. Take it out, and it forgets that it's supposed to be censored.>>101194924So you can't even trust that shite program to tell you how many layers there actually are?
>try llms once every few months>everything still sucks>repeat
>>101195324prompt issue
>>101193668Sorry your having a bad day, mentally ill miku-hating anon.>It has the same refusalsI think it could be that Mag-72B just doesn't like your blacked fetish. It was fine with all the non-vanilla shit I threw at it, and never refused where Qwen2-72B did. Also noticed that Qwen2-72B had a weird hang up about consent that was completely absent from Mag-72B.
I find it funny the agpl sharteen has done more to harm it than anyone else in these threads
>>101191975if you're still here try a search for "keening"
Does anyone else here have the "issue" when running Llama.cpp with a large (and slow) model and long context that token gen doesn't start instantly when continuing a cached prompt? I just got back into the hobby and I feel like in the past it was supposed to be instant. I mean why wouldn't it? The cache is already in memory, so it should just begin token generation immediately without processing anything, right? Not sure if that's just what it's like nowadays or I'm doing something wrong here. I mean, it is fast, but it's not instant like I remember it being.
>>101195573>issue>llama.cppyeah that's an issue alright
>>101195573Is it the bug where you have to make the cmd window active before it moves from 'processing prompt' to 'generating' even if the former is at 1/1 tokens and completes instantly? If so, yes, and no, I don't know how to fix it because hours of searching didn't find anything relevant at all.
>>101195485>someone who spams blacked must love blackedI am happy it gets under your skin and you have to cope you piece of shit troon.
>>101193868>>101193906>>101193954What if they benchmark this differently?: Instead of relying on one "good/authoritative" model to evaluate the rest, they asked every mayor/foundational LLM to score every other one, maybe even themselves. Then average out all the results to get a more well-rounded and fair view (I'm not necessarily arguing a better one, but at least a way more fair one)...
>>101195503
>>101195626No? I can have the window minimized and it still works.
>>101195675Damn. No idea then, sorry anon.
(chatbot) I just got a 7800X3D+48GB RAM+RTX4090 (24GB GDDR7). How do I use it effectively? What size models should I be targeting and should I be using KoboldCPP to run it?Maybe I'd like to try image generation too.
>>101195181I've run it at Q4_K_M but was rather unimpressed. But that's about the limit I can run it full GPU with a decent amount of context.
>>101195665>>101195638the blacked spammer was the believe guy huh. sad
>>101195138Mixtral 2 will SHOCK everyone
>>101195690King of vramlets mode.
>>101195818i am NOT the nigger spammer doever
>>101195638>Doesn't deny he loves itkek, cucked
>>101195638Post Theme:https://www.youtube.com/watch?v=4SiiRx7GDzI
>>101195485Now try again with the default system prompt or a basic one. You'll have to jailbreak it by adding more context.
>>101195878>loves onions and winecheck>dick's not bigcheck>my girlfriend fucked my friendcheck>i'm a cuckcheck>offended all the timecheck>terminally onlinecheckYeah it all checks out for blacked anon.
So why doesn't Gemma 2 work with 8K context? Wasn't it a hybrid of 8K global attention and 4K sliding window?
>>101195878>isn't miku>isn't vocaloid...
>>101195919https://www.youtube.com/watch?v=lM_Hu8mdNOI
>>101195909Better question is why is 8k context acceptable at all. We are actually regressing. We went from 2k->4k->8k->16k->32k to 8k again. Next time they will throw 4k at us and we will eat it like hungry dogs. Grim.
>>101195573Continuing this investigation, I see in the terminal window that I get 1 token processed each time I continue prediction. Perhaps that's why there's a small delay and it isn't instant, as it's inserting some kind of token into the context and maybe throwing it out. Anyone else notice this? Or is it properly saying that 0 tokens were processed?
make your own model and you can make the ctx as long as you want
>>101195962Maybe it's the context shift at work?It's deciding which part of the context to keep and which to eliminate on which end before continuing?I don't think I have seen that behavior myself, however.
write your own story and you can make the context so long you can even go back multiple chapters
>>101195873>no u kek
>>101196002>still doesn't deny itlel
>>101196007Not that anon, sweaty. It just funny to see you cope and seethe from that thread schizo's actions.
>>101196024>sweaty
>>101195977I just tested it at low context and even then it says that it processes 1 token. I don't see any flags for context shift in the docs for server, which I'm using. So when you continue a cached prompt, it does say that 0 tokens were processed?
>>101196036Not my problem, you can see that word all across 4chan.lurk moar newfag.
>>101196024>jerking off to gay blacked porn right nowdown bad, also like >>101196036 said, you mispelled sweety, faggot :^)
>>101196050>I was always meant to say sweetyfull on cope train is achuggin' with this one, boys.
>>101196057You are trying too hard, and clearly hallucinating things now.
>>101196070>copes so hard he has to make up cope excuses for mePPPPPFFFFFFFFFFFFFF
>>101196070>"I'm not coping.">copesLMFAO, are you sure you aren't Biden? Do you know where you are? What day it is?
>>101193668Anon, let me remind you, this general is /aicg/ knockoff./aicg/ is known to be full of troons and piss drinkers, they do that to get access for some proxies and cloud model's API keys, so all the newfaggotry and passive-aggressive attitude is the result from crossposters.
>>101196080>>101196085Least obvious samefag trying to get someone mad, you must have a really sad life if this is the only source joy for you.
>>101193668>>101196100Have a Miku!
>>101188248
>longer context>but nothing uses it properlyGOD
>>101196112Take the advice of the picture, schizo
>>101195953Meta did say they were working on long context tunesUnfortunately, Meta isn't always true to their word (remember L2 34B?) and even if they do release it, it's pretty clear Gemma 2 wins on the intelligence front so it'll feel pretty pyrrhic
>>101196112>coping so hard that he is being called outdamn, anit-mikus really are the troons of this general. That's rough...
>>101196112>wahhh, I'm not mad, wahhhkek, the pathetic mulings of a zoomer /pol/lack
>4am>/lmg/ is alivefun.
>petr* just leaked their timezone...
>their
>>101196169>pol out of nowhere Ah, so i am talking with unironic tourists here, it explains this >>101196100 perfectly.>>101196178No it just two samefags.
>>101196044Just gave it a try.I started a new chat.Send a first message with quite a few tokens.Waited for the model to respond.Sent another message.Waited for the model to respond.Rerolled the model's response.Inference started instantly (as in, there was not pause), but looking at the debug output, there's the following message : >INFO [ update_slots] we have to evaluate at least 1 token to generate logits | tid="106340" timestamp=1719630855 id_slot=0 id_task=1533>INFO [ update_slots] kv cache rm [p0, end) | tid="106340" timestamp=1719630855 id_slot=0 id_task=1533 p0=4933>VERB [ update_slots] prompt processing progress | tid="106340" timestamp=1719630855 id_slot=0 n_past=4934 n_ctx=32768 n_tokens=1 progress=0.0002026753209065646>VERB [ update_slots] prompt done | tid="106340" timestamp=1719630855 id_slot=0 n_past=4934 n_ctx=32768 n_tokens=1So yeah, it does process 1 token from the prompt on purpose, even with the cached context.
>>101196112>you must have a really sad life if this is the only source joy for you.Oh the sweet, sweet irony that you post the same the same pictures, and the same sentences, day in to day out, and yet have the audacity to say this, you're terminally online and we both know it.
What's the best model for large conversations that don't forget my points or my ideas? Seems like most AI chatbots forget what I'm talking about or the details of the convos after a while. Read it had to do with context.
>>101196191KEK, I love how you pull the same copes out everytime like they mean anything, its okay to be buttblasted, you are constantly upset because your are terminally online.
>>101196196But enough about resident mikufags, they do just that. I see no difference from your local blackedfag, or petrafag, both are shit.
Is gemma good or can I go back into cryo sleep?
>>101196191>resident schizo has to cope with reality because its buckbreaking him.At some point do you just ever feel ashamed of yourself or are you so autistic that it just never registers? I get second hand embarrassment from a lot of your posts.
>>101196213yes
>>101196213It's the new meta instead of Meta
>>101196212>picrelIt's you when your cope reality gets destroyed, everyday, everytime, over and over again. But we both know you like buttstuff and you like the booty blasting, don'tcha?
VOTE VOTE VOTE>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboardhttps://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboardhttps://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
>>101196212this is the most autistic thing I've read in this general the whole day... don't forget to vote for biden, btw.
>>101196225Such a nice way to prove me right.
>>101196233>instantly falls for the trapkek, got him, you can't help yourself.
>>101196231>muh vooting Really?
>>101196233>posted a picture>"Uhh you just proved my point."Stop, this is actually pathetic.
>>101196233>muh ebil pictureKEK
>>101196213Wait a week until all bugs are discovered and fixed everywhere.
>>101196233Some people just see everything as dildos even if they don't resemble them. Mental illness at its worst.
>>101196192Huh, so maybe it's normal? When I try it on an 8B, it feels pretty much instant, but the model I'm actually trying to use (CR+) gets a much slower speed, and there it doesn't feel instant. It's about a 1.2 second delay on that, while on 8B it says it's 0.020 seconds (basically instant).
>>101196269>NOOO YOU POSTED A MIKU PICTURE, YOU JUST PROVED HIM RIGHT!!! HE IS GONNA BE PROOOOOOVVVVVEEEEDDDD!!!!!!!!!!!!!!!!!!!!!dude is probably ejaculating all over the place the moment he saw you post.
>>101196279Yep. Seems to be normal.
>>101196280He hasn't responded yet so maybe he did and fell alseep, lol.
>>101196290You talk about dildos out of nowhere and using it as argument here, either bait or extreme case of schizophrenia.
hi petra
>>101196300>no uWhat a sad attempt at a comeback...
>>101196307kek give him a break he just came after seeing that second miku post.
>>101196280Anon, i have some standards, your waifu is shit and outdated.
>>101196305*hi creep
>>101196317Its not fun when they are so helpless that they can't form their own thoughts and only throwback what they are originally accused of. That 10 point IQ drop in the west has really just destroyed any sort of public forum.
>>101196330kek, it happens when you are mutt. probably half-hispanic, half-black, or half-arab.
>>101196317Anon, you seem lost, i am not that blackefag. Your shit waifu is not worth any actions from me.
>>101196218Yes to cryo sleep or yes to gemma being good?
>>101196213Go back to cryosleep, joogle's shitstain that is gemma cannot be good.
>>101196354Yes
>>101196353No one cares about your opinons if you aren't interesting, you can barely form a sentence that is coherent, let along interesting enough to read.
>>101196353>t. attention parasite
>>101196152It's not July yet, when the models were originally slated to be released. I get it, you're sour from them not releasing the 34B (which was probably botched given what people said about the code version), but this kind of thing applies to all companies and most big companies do follow up on most of their big announcements.Also a 32k or more L3 8B wouldn't be pyrrhic at all. A lot of us have little use for low context models and VRAMlets have been starving for smart models that are also long context, as neither L3 (current) nor Gemma are greater than 8k. But if a 32k 8B comes out then they'd finally have something. Honestly rather than anyone winning, it feels more like they're converging on even power levels. Gemma is smarter, but only after launching later than L3. L3 V2 might still not be as smart, but it'll be longer context. Perhaps even Mistral will come out next with a new small model that's competitive again (lol).
>>101196375>no one caresNah it just you tiktok zoomer with 10 seconds attention span. >>101196385But enough about resident avatarfags.
Would the model detoriare in quality if I go{{user}}: speech *action*{{char}}: "Speech" action ?I don't see much of a difference with either
>>101196412>Nah it just you tiktok zoomer with 10 seconds attention span.>"Its' le zoomzooms, It's le zoomzoomz!!!"Like I said, this is pathetic, boring, and just shows you are low IQ. Stop responding to me with your retarded mewling. Your opinion here doesn't matter, pretend that this is real life if it helps.
>>101196412lmfao just keep dancing monkey
>>101196453I don't give a single fuck about your whining though, in case you haven't noticed. I just know you are neither of any claims you imply, i.e. high IQ or being interesting. >>101196461Wow thats some redditor-level edgyness, pathetic even.
>>101196493I saved time and didn't read any of this, now you can stop shitposting the thread about your low test woman problems, we can get back on topic.
>>101196112How the fuck do you think you're the one in the right here? You're a complaining faggot and complaining is for women. Miku is thread culture, sorry that not every board is random anime and porn spam.
>>101196493>Wow thats some redditor-level edgyness, pathetic even.LLLLLLLMMMMMMMMFFFFFFFFAAAAAAAAAAAAAAAAAAAAAOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOyou don't even have one original thought in your head. you are the NPC, you are what you complain about, but like most redditors you don't even see it.
>>101196429No, you'll just get inconsistent formatting.
>>101196493you type like your dad fucked you hard up your ass for a number of years.
Me hungry. Going eat!
goodnight lmg
>>101196493>Everyone I don't like is a MikuposterThis is unhinged, you need help.
>>101196429Depends on the model's size and training, and possibly further fine tuning.
>>101196550stop going to sleep
>>101196024Obvious samefag is obvious
>>101196766hi anon where were u
>>101195821I believe you.>we're only on 0.3 so far2mw more like 2my... It's over...
recommended settings for gemma?
>>101196906Is there a way to run gemma-27b on ooba yet?
>>101196957>oobaabandonware.
>>101196967the fuck?
re: Nous-Hermes2Pro, with function calling, if function definitions need to be in the system prompt, and system prompt is set in the modelfile, and you create the model based on the modelfile, does that mean i can't have dynamic functions?Say I add a SetTimezone() function, I'd have to stop the model in ollama, alter the modelfile, create the model from the modelfile, and run the new model?Can i just include the function definition in the prompt?
>>101196974>https://github.com/oobabooga/text-generation-webui/commits/main/>he started commiting 5days agooh nevermind
>>101196957Manually update transformers in your oob's venv, and turn off do_sample
P3tr#'s timezone is UTC+1 given this post: >>101196178This will be important at some point.
>>101197093>oh no
>>101197169>>101197169>>101197169
>>101196976nvm it works fine to define functions in the prompt body
>>101197050Are those parameters right for loading? I'm getting nans for probabilities.