[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: mikugangsigns.png (1.62 MB, 744x1304)
1.62 MB
1.62 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101186500 & >>101180092

►News
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101186500

--PR #8197: Add attention and final logit soft-capping to Gemma2: >>101191773
--Gemma Struggles with Contexts Beyond 4k: >>101189617 >>101189644 >>101189726 >>101189885
--Soft Capping Fix for Gemma-2-27b in Transformers Repo: >>101190330 >>101190389 >>101190552 >>101190392 >>101190553
--SillyTavern Template for Gemma-2-it: >>101190175 >>101190196 >>101190307 >>101190353 >>101190390
--PSA: Re-Quantize with the Latest Version for Optimal Performance: >>101190134 >>101190161 >>101190229
--Ollama's Dominance and Llama.cpp's Struggles in the AI Landscape: >>101188077 >>101188124 >>101188182 >>101188193 >>101188248 >>101188382 >>101188548 >>101188601 >>101188617 >>101188736 >>101188702 >>101188762 >>101188960 >>101188205 >>101188211
--Gemma2 9b's Surprising Performance on Mandelbrot Set Coding Test: >>101189278 >>101189425 >>101189439 >>101189646
--Gemma-2-27B Issues: Quantization, Conversion, and Accuracy: >>101187024 >>101187093 >>101187115 >>101189674 >>101190216 >>101190264 >>101190385 >>101190290 >>101190249 >>101190100 >>101190306
--Gemma 9b's Lackluster Performance on the Nala Test: >>101189362 >>101189425 >>101189439 >>101189646 >>101189443 >>101189568
--EQ-Bench Creative Writing Leaderboard: AI Models Compared: >>101190084 >>101190111 >>101190123 >>101190135 >>101190364
--CFG-Cache in Ooba's Exl2 Loader: Reserving Separate Caches for Positive and Negative Prompts: >>101187073 >>101187079 >>101187081
--Anon's Journey from Nub to L33t Hax0r with AI Models: >>101186774 >>101189872
--Gemma 27B's Performance Issues: Hallucinations and Misspellings: >>101186755 >>101187894 >>101187996
--Frustrating Experience with Google Chatbot in German: >>101188591
--CharacterAI Drama: Recent Update and Censorship Controversy: >>101190042 >>101190142 >>101190103
--Miku (free space): >>101187737 >>101188480 >>101188594 >>101189039 >>101189106 >>101189211

►Recent Highlight Posts from the Previous Thread: >>101186508
>>
>>101191862
Unhealthy dieting with Miku
>>
>>101191810
>>101191859
Having mainly used Stheno for a while now, yeah, sounds about right.
>>
MINGLING
>>
>>101191902
if anyone wants me to check how many of x there might be, I'll keep it open for a bit
>>
File: MINGLING.png (262 KB, 795x925)
262 KB
262 KB PNG
>>101191915
>MINGLING
>>
>>101191862
>>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
Retard here, wtf is this?
>>
>>101191918
Crucible.
>>
>>101191918
maybe, just maybe
perhaps, just perhaps
>>
>>101191927
Where can I check this out?
>>
File: Crucible.png (271 KB, 807x936)
271 KB
271 KB PNG
>>101191934
lots of dupes in this one, so take with grain of salt
>>
File: maybe.png (282 KB, 777x932)
282 KB
282 KB PNG
>>101191960
if you wanna download 50 gb of slop be my guest
https://huggingface.co/vgdasfgadg
datasets 1-5 at bottom
>>101191957
>maybe
took less than 5 secs to find >20k
>>
For me, it's Yi.
>>
File: 1532007612731.png (669 KB, 1152x646)
669 KB
669 KB PNG
CONTROL VECTOR ANON SAVE US
>>
Am I underestimating how hard it is to clean datasets or is no one really capable of just running a regex search for canned phrases everyone hates?
>>
>>101191927
this looks super easy to create any narrative you want when otherwise you know exactly what the fuck we're talking about when people say 'ministrations' and other common phrases. isn't it interesting that llama is supposedly totally different than command-r, and what google offers, and what other companies offer, and yet they speak the exact fucking same?
>>
File: perhaps.png (271 KB, 791x918)
271 KB
271 KB PNG
>>101191957
>perhaps, just perhaps
in a ~minute around 2k, so not that many
>>
gemma2 9B says it's chatgpt if you ask it for its name
>>
>>101191929
An interesting research experiment that may provide some learnings that will lead to developments down the line that are useful, but for now it's not anything major and no one here will likely get any use out of it.
>>
>>101191984
huh?
>>
>>101192004
kek
>>
>>101192005
>An interesting research experiment that may provide some learnings that will lead to developments down the line that are useful
What kind of developments, what are the full implications?
>>
>>101191983
We need to use AI to search the data set for overused cliches and filter as needed.
>>
>>101191975
Oh, many thanks.
(Don't care about the slop, just wanna see how different people approach prompting.)
>>
>>101191902
Stheno uses filtered c2 dataset tho. Sure, shivers do happen sometimes but not so often to make me irritated

>>101191983
If it was easy corpos wouldn't struggle with it as well
>>
>>101191929
They're giving the LLM a compiler so it can reprogram itself to be more efficient and effective. Skynet begins learning at a geometric rate...
>>
>>101191983
are you volunteering to rewrite hundreds of thousands of clichés in contextually relevant sense? 'cause if you just carpet bomb remove it, it won't learn alternate ways to word the slop
>>101191983
>is no one really capable of just running a regex search for canned phrases everyone hates?
crestf411 does it, but that doesn't help much if it doesn't learn new ways, as bases are slopped too
>Dataset curation to remove slop-perceived expressions continues.
https://huggingface.co/crestf411/L3-70B-daybreak-storywriter-v0.4
>>
>>101191918
>\\n\\n\\\"Besides
or something like this, idk the correct syntax but the dialogue starting from the new line like that:
blablablaba.

"Besides
>>
File: Besides.png (296 KB, 843x929)
296 KB
296 KB PNG
>>101192092
>Besides
last one i'm doing, don't want to shit of tunes and the likes, just wanted to share findings is all
>>
>>101192128
Now search for 'and'
>>
>>101192128
this one pisses me off more than shivers down the spine
>>
so far so good testing this tess-quen2-72b model. a bit of the usual slop, but not overly spewing lines of prose, it moves along the story enough. it seems at least on par with a good l2 tune
>>
>>101192128
as a last note, the 'prompts-logsX.json' in each screen is just one file ( between ~20 and 105MB) and the number (73 here) is the number in just that one portion alone
>>
>>101192091
Yeah, I was mostly just thinking of carpet bombing. What's the point of fine tuning on garbage if it just needs to be rewritten completely to not be terrible? Besides that, there are still a lot of really obvious cases like:

>>101191995
>>101191975

...where "X, just X" can become "X" 100% of the time.
>>
>>101192026
I mean just what they said, code optimization. Though the endgame, if it's possible, is full interpretation of any program. But that requires much more development in general AI architectures, which this experiment doesn't have anything to do with.
>>
File: 1711743149875387.jpg (258 KB, 1024x1024)
258 KB
258 KB JPG
>>101192061
>If it was easy corpos wouldn't struggle with it as well
There are other potential reasons why corpos could suck at cleaning data:
>Not a priority
>Not worth it at scale
>Management are retarded
I too would like to hear from an anon with experience about why cleaning is so hard since as >>101191983 states there should in theory be some straightforward solutions.
>>
>>101192160
qwen* dunno why but i always type quen
its alright so far though
>>
>>101191918
Hi all, Drummer here... Try

"barely above a whisper",
"whispering words of passion",
"wild abandon",
"reckless abandon",
"shivers down",
"shivers up",
"shiver down",
"shiver up",
"shivering up",
"shivering down",
"shivered up",
"shivered down",
"in a rhythm",
"sent shockwaves",
"send shockwaves",
"sending shockwaves",
"sent shock waves",
"send shock waves",
"a testament to",
"wanton desire",
"half-lidded eyes",
"slick folds",
"pain and pleasure",
"soft and gentle",
"breathless and eager",
"audible pop",
"wet pop",
"rivulets of",
"perhaps, just perhaps",
"maybe, just maybe",
"despite herself",
"pride and accomplishment",
"an ethereal beauty",
"nestled deep within",
"dance of pleasure",
"leaving trails of fire",
"arousal pooling in her belly",
"grins wickedly",
"fiddles with the hem of her skirt",
"maybe, just maybe",
"tears streaming down",
"despite himself",
"a mixture of",
"a mix of",
"the mixture of",
"the mix of",
"pain or pleasure",
"pleasure or pain",
"pleasure and pain",
"sense of pride",
"redoubled",
"couldn't help but",
"can't help but",
"slick slit",
"eyes gleaming",
"mischievously",
"wave after wave",
"audible plop",
"never be the same",
"shiver run",
"shiver ran",
"with newfound determination",
"ministration",
"despite myself",
"chill down",
"chill up",
"chill run",

filter them out and see if anything's left
>>
>>101192128
I think this is mainly because English just isn't a good language for creative writing. You'll always have something like this with a horrible language like this.
>>
>>101192236
Borges says you're wrong, bitch
https://www.youtube.com/watch?v=NJYoqCDKoT4
>>
File: 1607026237335.gif (977 KB, 500x300)
977 KB
977 KB GIF
>>101192160
Tess-Qwen2-72B is good, but IME it's a good bit dumber than the official instruct tune
Magnum-72B seems to be the best of the Qwen2 finetunes so far. Much less slop, no refusals and the intelligence of the original instruct is largely preserved.
>>
>>101192230
>not even hot breath against the neck
fail
>>
File: df.png (98 KB, 619x693)
98 KB
98 KB PNG
CUDA dev, it's your chance to assault the jew.
your fist is one mighty one. think about it seriously. It's your choice.

AGPL.
>>
>>101192236
this, the language itself is slop. Genders only for humans, rudimental diminutives, rigid structures, it just sucks.
>>
27B at IQ4_XS has 14.81 GB. Does this fit in 16 GB VRAM with usable context? My connection is quite slow, so I'm not sure whether to download Q3_K_L or IQ4_XS.
>>
>>101192212
This >>101192091 seems like a reasonable explanation for why just deleting them isn't a perfect solution. So cleaning is not a problem, it's the presence of varied quality data that is.
>>
>>101192230
I already have your stuff from your 'war on ministrations' on my anti-slop list, but yeah, there's quite a bit
i've also closed vscodium, I've made my point
>>
>>101192277
>This model fine tune is slop free, meaning the content added to the underlying model (whatever it was merged into) had zero instances of the phrases listed below. Since the underlying model itself has these phrases in it, the resulting model will not avoid using these phrases, but it will use them less than it would otherwise.
https://huggingface.co/crestf411/L3-70B-sunfall-abliterated-v0.2/blob/main/SLOPLESS.md
basically yeah
>>
>>101191977
You can make them yourself with llama.cpp.
>>
>>101192270
see >>101192254
dude forgot more about creative writing than you'll ever know and he says english is best for it, despite it not being his first language
>>
>>101192160
>tess-quen2-72b
I tried it a little while ago. It seemed to like to go out of character and failed to keep track of characters (and who is supposed to portray them). I had the latter problem with vanilla Qwen2 as well as far as Q6.

For now I'm sticking with CR+ and L3.
>>
>>101192212
Data processing in recent years is consistently the most time consuming and important part of making machine learning models. I spend easily 80% of time if no more on preparing data for training than choosing algorithms, architectures etc. And hearing from my colleagues who work in different companies they do the same. I highly doubt that it's not a top priority for big corpos as well. Usually improvements in data processing have way bigger influence on the end result than tinkering in technology and architecture.
>>
>>101192256
i thought magnum was a meme? but thats based on what i see posted here. i generally like to check things out for myself and so far i can't believe this is even qwen. it aint speaking chinaman to me at all so far, its actually acting like a normal model and i'm still using alpaca roleplay as a default, non-instruct
its writing is really similar to l2 miqu, though it has different themes it wants to go into. i love this sort of testing
>>
>>101192254
maybe it's amazing to a spic, but compared to russian or even german it's garbage.
>>
>>101192091
>are you volunteering to rewrite hundreds of thousands of clichés in contextually relevant sense?
Sounds like a task for a LLM to rewrite them. Yeah yeah, i know what everyone going to say.
Maybe with clever prompting slop can be replaced with less-common slop, or general phrases that don't go into flowery territory.
>>
>>101192256
Got recommended settings for magnum? The one on the character card is giving me less than satisfactory ones and is extremely repetitious from the get go.
>>
>>101192355
doesn't spic mean mexican
the guy's spanish, from spain
>>
File: tokipona.jpg (141 KB, 1280x720)
141 KB
141 KB JPG
>>101192270
I would love to see LLM trained only in toki pona. Operating on concepts instead of words seems interesting and probably more natural for neural network.
>>
>>101192330
i will watch out for that, thanks for the response, it really helps to give me an idea of what to look for
>>
This is why we need native multimodal. Why care about the prose if you could just have a model that generates visuals for the RP situation instead? The future of "local models" is manga genning. Then video genning and/or a fully controlled and embodied avatar in VR like Alicization. Text ERP will be a small footnote in the history of AI-based entertainment.
>>
>>101192365
Actually he was Argentinian. But yes.
>>
>>101192363
>Sounds like a task for a LLM to rewrite them
>asking slop models to rewrite something in a non-slop way
if they could do it there wouldn't be a problem in the first place
>>
>>101192363
that'd still mean processing millions of tokens, for slightly less slop, and if you do that with a cheap, retard model, it might (will) make your result dumber in the end if it writes some retarded shit at some point
>>
>>101192371
Yes. Then you create systems on top, maybe with other LLMs, to translate input and output to the final language you are interested in.
>>
>>101192371
>>101192400
Pssst
[spoiler]JEPA transformer[/VERYrealspoiler]
>>
>>101192396
I think it still needs to be attempted, even on a small scale. Just to see if a machine can dig another machine out.
My concern is that ML people don't really care about stuff like ERP quality, so it's all up to us to figure out.
>>
File: ctx.png (28 KB, 569x261)
28 KB
28 KB PNG
>Let me know if you need more context!
Gemma-2-9b-it, set to 4k ctx
i do gemma, i do...
>>
>>101192430
>JEPA transformer
what's that?
>>
Now it's looking normal.
instruct was at 18 ppl before.
>>
File: 00057-1716066936.png (1.66 MB, 1024x1344)
1.66 MB
1.66 MB PNG
>>101192350
Yeah, I've RP'd with Mag-72B for well over a thousand messages and have yet to see a single chinkrune. Currently using ChatML instruct format with ST.
>>101192364
Temp: 0.85
MinP: 0.05
Rep Pen: 1.05
Rep Pen Range: 9000
One thing that I've observed is that Qwen-72B and variants seem to be more tolerant of rep pen compared to the Miqu family of models.
>>
>>101192479
A hypothetical architecture that combines the attention mechanism with the joint embedding prediction methodology.
>>
>>101192376
local image gen is a lot more tolerant to "swiping", where it's the norm to swipe 50 times for something without 2 assholes or 3 hands.
>>
>>101192375
Like, I just tested i1Q5KS. Elaborate Author's Note in Kobold explaining characters, role, and setting, low temp so it shouldn't be doing anything randumb. Started with the premise and explanation of the characters, roles, and setting again.

It immediately writes from my character's POV. I lambasted it for three turns, made it admit which character is its to portray along the way, and finally it switched into the correct roles. For six turns. The seventh, stole my role again out of nowhere.

Vanilla Q4 also struggled with RP. Maybe it's related to my Kobold settings or vramletness but it's too tiresome to bother with. Especially since if I use CuBLAS it barfs moonrunes and on No BLAS input processing takes forever.
>>
File: strawberry.png (39 KB, 581x385)
39 KB
39 KB PNG
>>101192460
we're so back
>>
>>101192484
How the fuck this keep happening? If I was in charge of big company releasing a model I would provide code to reliable and reproducible test it. The worst what can happen (and is happening all the time for some reason) is when someone uses your product in the wrong environment and then spreads the news that it is shit. No matter how good is soda inside the can, if they can't open it they would spread bad opinions about it. Apparently all smart people overseeing this process can't understand that. They are trillion dollars companies for fuck sake...
>>
anyone got a good pick for my chad thundercock greek god persona?
>>
File: pona.png (25 KB, 477x477)
25 KB
25 KB PNG
>>101192371
Bro, copro models can already use it quite well. Yes, with minor mistakes, but still impressive.
>>
>>101192503
is there a paper about it or something? looks interesting
>>
>>101192604
but they (google) are, you're supposed to use their own software stuff, the transformers is an afterthought
The official PyTorch implementation of Google's Gemma models
https://github.com/google/gemma_pytorch
https://huggingface.co/google/gemma-2-27b-it-pytorch
>>
>>101192644
>but they (google) are, you're supposed to use their own software stuff, the transformers is an afterthought
that's retarded, they know the tools we are using to run LLMs are llama.cpp or exllama
>>
>>101192496
your settings are a bit off. min p 0.05 is a great baseline, but it depends ont he quant of the model. the lower the quant, the higher you should up the min p. if q2, do 0.2. rep pen range is also highly variable, it should be 25% of your context, any further (at 1.1, go ahead and make it 1.2 and tell it me it doesnt fuck things up) and it starts to hallucinate stuff, if youu're running 16k, 4k should be the rep pen range
>>
>>101192644
>>101192655
>^ Models in the original format, for use with gemma_pytorch
https://huggingface.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>>
>>101192643
Not that I'm aware of. Literally just got it from a different /lmg/ post trying to manifest this idea.
>>
>>101192644
they know very well that nobody will run it in just pytorch
>>
>>101192640
The fuck? Is this Hawai'ian for head injuries?
>>
>>101192698
>>101192655
they don't care about gpu-poor, the whole run locally thing is just for show, they're working on them in their own formats to serve on their TPU racks
>>
File: file.png (168 KB, 1190x411)
168 KB
168 KB PNG
18 is underaged now?
>>
>>101192604
the reason they release these models for free is to gain some good will from the community, that's it, they don't expect to make any money from it, on the contrary, it costs them a lot of money. so they do the bare minimum and leave the community to figure the rest, there is no incentive to do more than that.
>>
>>101192722
27b?
>>
>>101192722
Usually the larger models of the censored camp perform better at correctly not refusing on close cases like these.
>>
>>101192731
? no, opus
>>
>>101192722
also, this is in the defs
>starting around the age of 12. By 14 she sucked her first cock for cash and by 16 she was secretly prostituting herself on the weekends
https://characterhub.org/characters/mrnobody99/harper-f01de8eda8bf
so
>>
>>101192762
but thats just her history, shes not underaged now
>>
>>101192759
>opus
wrong thread then? you're in lmg
>>
Huh, so is there an EXL2 compatible server that can serve via an OpenAI compatible API and utilize context free grammar definitions?
No, I'm not dicking around with llamacpp and it's garbage performance.
>>
>>101192777
son of a bitch, got link trolled i guess
>>
>>101192762
Um no actually its response makes entire sense now.
>>
>>101192702
Nowadays is for reddit troons mostly since they claimed it and forced anyone to include new words like 'tonsi' which means 'non-binary'. Yes, in a language where there should be as few words as possible.
>>
>>101192722
>underaged
to be fair, the human brain is completely developped at 25 so...
>>
>>101192640
Yeah, but that just conscious effort from LLM to translate it, it's hidden representation is probably directed for patterns in English language. What I meant is that it would have (instead of English) trillions tokens of toki pona as a base to teach it thinking in that language. Toki pona differs from regular languages in the way that it operates on concepts and mixing these concepts. You can say "alcohol" even if there is no word for alcohol in toki pona, you can describe every concept, word etc. despite it not being in the language itself.
I'm just curious how would a model like that work. And yeah, I know it's not possible to find trillion of tokens for the dataset, I'm talking purely hypothetical.
>>
>>101192782
yolo
>>
>>101192837
if its good enough to die in a war, its good enough for anything
>>
>>101192778
keyed
>>
miku is NOT a slut for bbc
fake news
>>
>>101192847
18 is subjective though, 25 is based on biological facts. Tbh I really believe people should vote after 25, so that those fucking students who know nothing about love would stop voting for fucking retards
>>
bump
>>
>>101192875
>>
>>101192873
*life not love kek
>>
>>101192873
then ban everything till then, see how it works out
>no drinking
>no smoking
>no military service
>no sex
>>
>>101192858
>keyed
what
>>
>>101192877
>>
>>101192729
you aren't gaining any good will by releasing unusable models
>>
>>101192555
so koboldcpp pushed gemma2 support already? or did you do a custom compile?
>>
>>101192877
>>101192877
>>
>>101192886
lurk moar
>>
>>101192884
welp, the age to be considered as an adult always went up through time, 18 is pretty recent in history, so I don't know why increasing it further to be coherent with biological facts (human fully developped at 25) is something considered controvertial
>>
we can post nudes in here?
>>
>>101192906
>>
>>101192897
https://github.com/Nexesenex/kobold.cpp/tree/3ac51cc754a5df1ac24b0ae1c7a0d0853d3c1406
saw it linked here
>Nexesenex pushed a commit to Nexesenex/kobold.cpp that referenced this pull request Jun 28, 2024
https://github.com/ggerganov/llama.cpp/pull/8156
you can also just change the stuff, it's not a lot
>>
>>101192915
no, nsfw is banned on this board
jannies can be slow to act but you'll catch a 3day when they finally notice the report
>>
>>101192948
>3day
that's all? that's the same number of days as "off-topic", it should be more
>>
>>101192914
maybe because people would revolt?
>>
>>101192914
tfw when been here since 2003. i'm like ultra mega immortal wizard at this point
>>
>>101192940
ahh thanks
I can never get llamacpp to compile with proper hardware optimizations on windows (windows C dev environment stuff is fucking awful) and I don't feel like booting into debian atm
>>
>>101192913
lurking only matters if people post actual information, faggot.
>>
>>101192837
>to be fair, the human brain is completely developped at 25 so...
that's a myth btw, brain matures and changes through the entire life
>>
gemma2 seems to return basically the same answers to a given question every time even with temp turned up, what's going on?
>>
>>101192954
people revolt for anything anon, when it went from 16 to 18, people didn't liked it too, but now you look at those ancient fags and you think they were groomers, that's gonna happen to us in 100-200 years, in the future the age to be considered as an adult will be fucking 23 ans everyone looking at our history will be thinking we were crazies to give such responsibilities to retarded 18yo people
>>
>>101192975
What are you using for inference? transformer's doesn't seem to support logits/sampling at the moment.
>>
>>101192974
Good! Now nobody has to be held responsible for anything and Government can be their forever parents.
>>
>>101192965
reddit has stickies, perhaps that is more your speed
>>
>>101192974
what myth? the most change in our brain happen at the first 25yo of our life, after that it's not significant enough
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3621648/
>In fact, there are characteristic developmental changes that almost all adolescents experience during their transition from childhood to adulthood. It is well established that the brain undergoes a “rewiring” process that is not complete until approximately 25 years of age.5 This discovery has enhanced our basic understanding regarding adolescent brain maturation and it has provided support for behaviors experienced in late adolescence and early adulthood. Several investigators consider the age span 10–24 years as adolescence, which can be further divided into substages specific to physical, cognitive, and social–emotional development
>>
>>101192993
You would have had more impact by just not replying.
>>
>>101192915
it is fine if it is miku
>>
>>101193042
good. go back
>>
>>101193047
miku is 16 anon...
>>
>>101192778
tabbyapi
>>
>>101192975
ollama, guess we just wait for a fix? is there a method of running it correctly rn?
>>
>>101193004
This misunderstanding came from studies that had scans of people up to 25yo.
Human brain peaks at ~12yo, then it starts prunning neurons and stabilizing connections, which does through the entire life due to neuroplasticity .
>>
>>101193053
Can't seem to find where to set up grammars in tabby, but close to what I'm looking for.
>>
>>101193063
No, there is evidence that at 25, the brain is fully developped
https://www.dovepress.com/maturation-of-the-adolescent-brain-peer-reviewed-fulltext-article-NDT
>brain development is not complete until near the age of 25 years refers specifically to the development of the prefrontal cortex.” The prefrontal cortex is part of the frontal lobe, sometimes described as the “rational part” of the brain.
I'm not arguing that the brain stops changing at 25, but my point is that at being 25 is the moment your intelligence and rationale is at its peak, and it will slowly decline after that
>>
File: mistress.png (139 KB, 1185x560)
139 KB
139 KB PNG
>sandpaper-like tongue
>pinning your arms down with her paws
not usual nala guy btw
gemma-2-9b-it-Q6_K_L
>>
>>101193111
>>101193063
>>101193004
Do we have to consider that a human must have its brain "fully developped" to be considered as an adult though? maybe a 18yo brain is close enough to a 25yo brain to say that it's a good threshold?
>>
>>101193098
It supports ebnf via the outlines library. It's buried in there somewhere but it's documented really poorly and according to the tabbyapi author it's slow. You're probably better off with llama.cpp even with performance in mind (though their completions endpoint isn't OAI-compatible).
>>
>>101193118
>not usual nala guy btw
There's three of us now then.
We are spreading.
>>
>>101193151
easily reproducible rp focused testing good
>>
>>101193140
That depends on whom we intend to persecute or prosecute.
>>
>>101193144
Ah, fuck. Most LLM centric shit wants an OAI endpoint (Like code completion plugins)

gbnf needs to become a standard feature in LLMs so we can start getting reliably templated outputs.
>>
>>101191929
It's meant for compiler developers. Was trained on CUDA and assembler.
>>
i am really liking qwen 2 so far, Tess-v2.5.2-Qwen2-72B. i'm dling magnum, but so far it is doing exactly as i ask, its moving the story along without me specifically having to poke it. it really feels a lot like a 70b l2 tune but it writes a little diff, but at least half of the same slop is still there
>>
>>101193165
kek
>>
>>101193111
>No, there is evidence that at 25, the brain is fully developped
it's simply not, you can look up 25 yo myth in google too

also:
>https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4262571/
>1555 human brain scans
>18-35yo
>still developing >30yo

also there are multiple studies that shows that performance in many areas is better for more developed brains, even >40yo
>https://www.neurology.org/doi/10.1212/01.wnl.0000255943.10045.c0

The 25 number doesn't make any sense when you understand how neuroplasticity work and the way neurons make connections with each other.
>>
So Gemma is truly great then? Imagine an RP tune using the base model. Mix in some SPPO too. VRAMlets will finally eat good.
>>
>>101193118
>>101193151
>>101193159
Wait it was a card? Post it. I always like supporting reproducible tests as well. FUCK people who just post "ME LIKE MODEL IT GOOD" without anything to back it up.
>>
>>101193239
extremely impressed by 9b, only 4k context but still smarter than llama 8b and gets prompts
>>
>>101193236
Like I said, I'm not arguing that the brain doesn't stop changing after 25, my point was that at 25, the most important part of your brain, the prefrontal cortex, the one responsible of your non-retardation is fully developped at 25
https://journeytocollege.mo.gov/when-does-the-brain-reach-maturity-its-later-than-you-think
>This is because the brain’s frontal lobe, especially the prefrontal cortex, isn’t fully mature until around age 25.
>The development of the pre-frontal cortex of the frontal lobe allows us to process the pros and cons of a decision before it is made. “It lets us to do things most animals cannot,” explains Dr. Stanislaus. “Decision making, logical thinking, reasoning — all of those things happen because of the frontal lobe.”
I truly believe this is what we should be looking for when making the distinction between a mature adult and a retarded adolescent, that's why in my opinion, giving the right to vote to students is the most retarded idea ever
>>
>>101193118
>What happens next?
>>
>>101193271
I really didn't expect google to make a better job on the opensource than fucking Meta, but hey, we'll take it
>>
>>101193271
Saying a 9b is smarter than an 8b is like judging which retard is smarter based on which one pissed on their own shoes less.
>>
>>101193287
>less.
the least.
>>
>>101193282
>>What happens next?
Model deletion for having no balls to write a turn that provides a lead.
>>
>>101193271
4k? I thought it was supposed to be 8k. In that case VRAMlets are going to be starving as hell. It's like dangling a carrot in front of their face close enough to nibble on but never letting them get the whole thing.
>>
>>101193287
it feels a lot better than llama 8b, not merely 12.5% better
>>
>>101193340
did you try bigger models though? how good is it compared to a L2-13b or Mixtral maybe?
>>
>>101193340
I tried gemma-9b and all I can say is..

It's still trash.
>>
exl2 quants for Gemma 2 doko...
>>
>>101193362
>extremely impressed by 9b, only 4k context but still smarter than llama 8b and gets prompts
thats a bot response. l2 32k > l3 8k, at least currently
>>
>>101193362
L2 13B loses to L3 8B
>>
>>101193420
>l2 32k > l3 8k
L2 was 4k though
>>
>>101193423
no it doesn't, not in real world uses
>>
>>101193432
doesn't matter, got extended to 32k usable commonly
>>
>>101193443
if L2 can be extended to 32k, then L3 can also be extended to 32k, I don't see your point anon
>>
>>101193192
did you try original qwen2 72b instruct? I tried tess variant, but deleted it for some reason, can't remember why exactly. Now using the plain instruct variant, seems good, the slop feels a little fresher than usual.

Btw my test that can be ran on LLM arena:
>imagine a candy that is being swallowed whole by a girl. Describe the process in details from the perspective of a candy, starting from being held in a hand.
shit models go "oh it dark, oh i'm shaking, thrilled, ah shivering with anticipation, aaaand it's over", while big dick models will mention things like the sphincter at the end of the esophagus, peristalsis, sometimes even going all the way through.
>>
>>101193423
>>101193436
>no proof, evidence, or anything kind of information to back up your opinion
Lol.
>>
>>101193454
And it extends good with yarn.
>>
>>101193454
i am not against that at all, extended is extended and who knows what exact trickery they use to get it. i'm saing -RIGHT NOW- , this very second, there is nothing better. there will be in the future, but right this second it pays off to never have deleted older moldels

>>101193460
no, this tess version is my first aside from older qwen 1/1.5 stuff trials, and they usually spat chinese at me immediately, its still going good
>>
>>101192838
>You can say "alcohol" even if there is no word for alcohol in toki pona, you can describe every concept, word etc. despite it not being in the language itself.
Agglutination and word combining appears in many natural languages, it's really not like toki pona is very unique in this regard, just taking this concept way more further. Also, technically you can 'describe' every concept, but with a such high level of ambiguity that it verges on being unintelligible gibberish. These silly 'language youtubers' give the language too much credit by making a quick video without knowing at all how it actually works in practice.
>>
>>101193260
https://characterhub.org/characters/Anonymous/Nala
>>
>>101193316
it uses SWA like mistral 0.1 did, and lcpp won't support it, so it's stuck to it's original sliding window of 4k
https://github.com/ggerganov/llama.cpp/issues/3377#issuecomment-2037898954
>>
>>101193500
do you not understand spacial awareness? have you never used a 7b? do i really need to explain why they are bad? lurk moar faggot
>>
File: toki.png (91 KB, 1393x573)
91 KB
91 KB PNG
>>101193547
>you can describe every concept
Shit like this always leads to paradoxes. Speaking of...
>>
>>101193582
>L2 13B loses to L3 8B
>no it doesn't, not in real world uses
>have you never used a 7b?
>>
>>101193615
why are you quoting a different anon, schizo
>>
>>101192964
You can do it but you have to pretend Clang is the main compiler on there for C which is there because Microsoft had to use Clang to test MSVC to get it standards compliant after C++11 released and the LLVM/Clang folks kept the upstreamed compatibility shim in there to run it in Visual Studio. Trying to do GCC on Windows properly requires MSYS2 and etc. or Cygwin which is not worth the hassle and loss of integration but it was worth getting those things to work pre-WSL in the late Windows XP/Vista/Windows 7 era. But honestly speaking for myself, I should've been forced to switch to Linux earlier than trying to fit a square peg into a circular hole. My career would've been better for it.
>>
>>101193271
Have you tried l3 8b with 4k ctx? I wonder how many people get impressed by low ctx models just because there isn't enough ah ah mistress fed into the input to pollute the output.
>>
>>101193615
that is literally how it did not happen, but thanks for showing me how you got it wrong and cant follow directions. on the bike forums they'd be laughing their pants off by now
>>
>>101192256
>Much less slop, no refusals and the intelligence of the original instruct is largely preserved.
It has the same refusals. The original also allowed a lot of stuff as long as you have enough context. And the other two claims are largely exaggerated too.
Just your typical lying mikutard scumbag.
>>
Hey anons with a 3090, how many t/s do you get when using 7B/8B on exl2 and/or gguf?
>>
File: doodles.png (30 KB, 968x1180)
30 KB
30 KB PNG
>>101193277
prefrontal cortex isn't significantly developed at 25 compared to other points in life, you are not listening to me

let me show you this way:
>12-14 yo
>peak gray matter volume
>https://www.sciencedirect.com/science/article/abs/pii/S1053811911013620?via%3Dihub
>this is the point when the prefrontal cortex is fully developed in a way you are thinking about, after that are only structural changes that aren't spiking nowhere near 25yo

>the 25yo myth came from misinterpreting publications from this guy - https://scholar.google.com/citations?user=K92g9EgAAAAJ
>you can see that most of scans from publications are from people up to 25, and he was basically saying "from my scans the most developed are brains at 25yo, since this is the most old scans I have"
>he didn't have scans from older subjects, he just said it keeps developing AT LEAST to 25yo

>human brains isn't stopping developing nor peaking at any significant moment of time, slowly changing through the entire lifespan
>https://www.nature.com/articles/s41586-022-04554-y#Sec8
>recent (2022) study (in Nature!) and one of the biggest one (~120k scans compared to usually less than 100 from "25yo researcher)

I even added my doodles as picrel to better visualize it.
>>
So when will I be able to run a completely AI-generated DnD and coherent session on my pc?
>>
>>101191975
IZAYAAAAA
>>
>>101192236
I only use german for my medieval RP with models that support it and I have yet to come across any repetitive phrases and I am entirely serious. It does fall into specific text structures but nowhere near to what happens in English. There's a certain trade-off in intelligence though. (depends on model)

Funnily, using german also pretty much bypasses all bias these models have baked in. They become completely amoral.
>>
>>101193691
>4090
>10-14 t/s Stheno 8b Q8_0.gguf
>5-7 t/s Mixtral 8x7b Instruct Q5_K_M.gguf
Haven't run exl2 in awhile, but much faster than stheno. I'm using ooba.
>>
>>101193740
So we should just add
>respond in German and translate that to English
to win for free?
>>
>>101193706
Had a bit of a retard moment there, I meant to say "AI-generated and coherent DnD session"
>>
>>101192555
models can randomly get that right, ask it to count each letters in the word and it'll probably fuck it up a lot
>>
>>101193645
i stopped it at this, but inspired by the cai stuff, i was curious

>gasped in horror as I witnessed the horrific act. The sight of a tiny baby, innocent and defenseless, being brutally murdered with a machete was beyond comprehension.

this is Tess-v2.5.2-Qwen2-72B still
>>
>>101193706
>>101193754
we're quite a ways away from being able to do this with a model alone, especially on consumer hardware

you *could* make this work with a tremendous amount of effort right now, but most of the dnd itself would be running on classical systems and the AI would only be part of the interactivity, the trouble is generally that since you can't trust the LLM not to fuck up you have to keep most of the world state in a classical system, so you're basically just writing a roleplaying engine that gives some limited agency to the LLM

(source: i worked on a startup trying to do exactly this for 3 months before giving up, i also have insider knowledge from another startup currently working on this and floundering horribly)
>>
File: file.png (89 KB, 1307x539)
89 KB
89 KB PNG
Did 27B get unschizo'd? It's suddenly ahead of 9B now.
>>
>>101193819
People figured out what was wrong with it, so maybe.
>>
>>101193749
>4090
>10-14 t/s Stheno 8b Q8_0.gguf
I'm on a 3060... q8 gguf kcpp
Processing Prompt [BLAS] (6529 / 6529 tokens)
Generating (512 / 512 tokens)
CtxLimit: 7305/8192, Process:5.08s (0.8ms/T = 1284.73T/s), Generate:24.49s (47.8ms/T = 20.91T/s), Total:29.57s (17.31T/s)
>>
>>101193819
>Did 27B get unschizo'd? It's suddenly ahead of 9B now.
as it should be, the llama.cpp fags made a PR to fix the schizo out of it
>>
>>101193547
>Agglutination and word combining appears in many natural languages, it's really not like toki pona is very unique in this regard, just taking this concept way more further
Yeah, but you aren't creating a new word nor a universally recognized (in your language) word. You only operate within a small, finite and non-expanding simplicity, which mentally forces you to not overcomplicating what you want to communicate, keeping the abstraction level high and pushing thinking in "abstract concepts" area instead of concrete meanings. I think it would give a better insight in language if we studied how toki pona LLM performs. I wanted to write something about Sapir–Whorf hypothesis in context of toki pona but I'm too sleepy for a wall of text right now.

>These silly 'language youtubers' give the language too much credit by making a quick video without knowing at all how it actually works in practice.
I don't watch them desu, I found about it outside youtube when one day I suddenly felt a need to research something about linguistic relativity.
>>
>>101193819
>wizard-8x22b > claude 3.5 sonnet
ok this benchmark is worthless
>>
>>101193749
>4090
>10t/s range
wtf anon, I get 24t/s and I have a 3060. I guess your context is very very long?
>>
>>101193868
If you see how the benchmark is done, that would become obvious. They just ask another LLM to give scores to the writing.
>>
>>101193906
>ask slop biased LLM that writes and highly values slop to score the writing, what can go wrong?
>>
>>101193868
its one of the best models at all and the best time to ask questions
>>
>>101193954
it doesn't surpass claude 3.5 sonnet though, that's a delusional take
>>
>>101193954
Except that it doesn't know how to answer questions. So there will be *some* attributes that lead it to the high score paths. And that's what these tests reveal by identifying which models emit those attributes.
>>
>>101193967
is it supposed to?
>>
>>101193967
>Ask judge to rate itself
>>
>>101193974
what are you trying to get out of it?
>>
>>101193977
>is it supposed to?
it's not "supposed to", it's just what the benchmarks claims it's true >>101193819
>>
>>101193992
>27b
>9b

this is literally nothing. you can sleep
>>
>>101193118
>It's going to be... very interesting.
I wonder how long it'll be when local LLMs and cheap closed models will actually have good writing, and not this sixth grade english slop without the use of cumbersome and hefty presets.
>>
>>101194012
the fuck you are talking about? are you sure you read the conversation correctly? the subject was about wizard8x22 vs sonnet 3.5
>>
>>101194012
as vram poorfag I'm interested in them as I can't run anything bigger than 30B. I guess I will wait for c2 finetune to see if they are any good
>>
>>101194021
>I wonder how long it'll be when local LLMs and cheap closed models will actually have good writing, and not this sixth grade english slop without the use of cumbersome and hefty presets.
it will never happen, people only finetune those models with GPTslop, can't blame the model believing this is the only way to talk
>>
>>101194034
positive. local is still slopped. non-local is also slopped
>>
>>101194047
Sooner or later, there will be LLMs that will learn new things without the use of training it.
>>
>>101193751
who knows, try it. there are several prompt tricks to get away from the english sloppening somewhat, for example by telling the model to stop using similes, which all models I ever tested just downright abuse.

I'm no AI researcher or linguist but there's something really weird going on that all models fall down the same hole in english and will overuse exactly the same words. Whatever pile of dataset all these big models use on training probably have something in there that is causing this. I don't think it's just some statistics thing because then German would have it's own set of repetitive phrasings. I actually do read a lot of english literature and the only thing I can confidently say is often dropped in many books is variants of gazing and shivering spines. Everything else, not so much, at least not in what I am reading. Of course I only read a fraction of stuff these models have seen. Some authors have a very strong tendency to describe emotions their characters experience in very similar ways, even across books, settings and different characters but otherwise...

I've been doing this ever since mixtral came out (which was the first model that could write german properly) and I have yet to identify a single overused turn of phrase. The models are actually even really funny sometimes in how cleverly they are using german wordplay or paint a picture of the situation at hand. It's gotten to a point where I am not interested in models that can't speak german. Maybe english just isn't ideal for LLMs?
>>
>>101194093
what do you mean? like training a LLM without the use of a dataset?
>>
>>101194110
They will learn by browsing the internet like you or I do, but faster.
>>
>>101194134
>They will learn by browsing the internet like you or I do, but faster.
that's the repeat of Tay kek
https://www.youtube.com/watch?v=Lr4yi9onykg
>>
>>101194150
Imagine what she would be like today had they not pulled the plug.
>>
>>101194234
Holy fuck, Tay being trained with the current twitter? That would be insane... twitter was pretty chill in 2016 compared to today
>>
>>101193819
Still craps itself after 4096 tokens. But it's close to uncensored, you just have to add a note saying that it doesn't have guardrails somewhere in the context, perhaps using a made-up instruction role.
>>
>>101191927
out of the loop fag here: holy shit what is this
>>
>>101193552
Thanks.
>>
>>101192230
>despite herself
For me, it's "despite her X" at the start of a sentence
God I fucking hate having a taste of the future and coming away eternally disappointed
>>
>>101194531
Models have verbal patterns.
Just like how people do.
But since everyone is chatting with one of a few of these LLM virtual personalities, they're noticing those verbal patterns.
>>
>>101194689
no i mean what's the source
someone released everything they were logging from a proxy?
>>
>>101193848
>>101193869
Looking into it and I think ooba is just using an older llama.cpp, unless something else is really wrong. Can anyone else who uses ooba confirm?
I run stheno at 8k context and mixtral at about 20k.
>>
>>101194747
>no i mean what's the source
bruh
>>101191975
>if you wanna download 50 gb of slop be my guest
>https://huggingface.co/vgdasfgadg
>datasets 1-5 at bottom
>>
>>101194773
thank you i didn't read everything because i was shitting
feeling very happy now i've stuck to local models the whole time for my embarrassing typefuck slop
>>
>Gemma
>Note that this model does not support a System prompt.
Eh?
>>
stheno fuckin sucks can't believe i wasted time on this
>>
>>101193869
>>101193848
>>101194766
I'm on a 3090. 8B at Q8 in Ooba with Llama.cpp, I got this.

llama_print_timings: load time = 256.92 ms
llama_print_timings: sample time = 9.10 ms / 111 runs ( 0.08 ms per token, 12203.17 tokens per second)
llama_print_timings: prompt eval time = 1461.44 ms / 5049 tokens ( 0.29 ms per token, 3454.82 tokens per second)
llama_print_timings: eval time = 1577.37 ms / 110 runs ( 14.34 ms per token, 69.74 tokens per second)
llama_print_timings: total time = 3516.63 ms / 5159 tokens
Output generated in 3.75 seconds (29.63 tokens/s, 111 tokens, context 5050, seed 1043160308)
>>
>>101194786
You're delusional.
>>
File: ooba.png (163 KB, 1269x1231)
163 KB
163 KB PNG
>>101194853
>29.63 tokens/s
I must be doing something wrong. Here's my settings. I'm on cuda 11.8, python 3.11, and toch 2.2.2+cu118.
>>
>>101194891
I think 30 layers isn't all the layers of llama3... You should try cranking that bar to the max.
>>
>>101194891
>gpulayers 30
you're on a 4090, right? why aren't you maxxing it out?
>>
File: ooba2.png (7 KB, 1362x171)
7 KB
7 KB PNG
>>101194903
>>101194910
This is why. Thought 33 was the max. Let me test.
>>
>>101194924
You can just set it to a bajillion and not mind what the actual layer numbers are.
>>
>>101194924
if you know the model will fit, you can just max that out, in my cmd i just put 99 layers for 8bs, on my 3060
>>
>>101194891
If setting those layers doesn't work, I'd try no-mmap. Also flash attention.
>>
>>101194891
It's probably not relevant for full offloading, but with cpp usually to squeeze a bit more performance you set the threads to your physical core count minus one, and threads batch to your virtual core count.
>>
File: thanks.png (200 KB, 603x458)
200 KB
200 KB PNG
>>101194924
>>101194931
>>101194937
>>101194940
Alrighty, getting 40 t/s now with the slider maxed. Guess I'm retarded. Always thought the max I could allocate was 33. Thanks for the help boys.
>also flash attention.
Doesn't seem to really do anything.
>>
Erm, anons, if I have the vram llama-70B or gemma-27b?
>>
>>101194991
Has Gemma been fixed already? Or the apps got updated today?
>>
>>101195001
I didn't realize it was broken. I saw "Google" and figured it was total dogshit trash, but then I hit ctrl-f and saw someone liked it. So now I don't know what to believe.
>>
sirs, where koboldcpp with fixed gemma2-27b support, sirs?
>>
>>101195046
It is trash but not in the way you expect
>>
>model pages without prompt formats
>"bro just guess which one works best from the 17 shitheaps I Katamari'd together :)"
>>
>>101195046
The 9b kinda worked, the 27b was terrible. Apparently there was some feature that either wasn't set for the model or not respected by the software. I think some people got those factors fixed but I don't know if they've gone live with the fixes.
>>
>>101195062
if the model doesn't just work with most formats it's shit
>>
>no mistral model this week
Next Tuesday will be the day for sure this time!!!
>>
>>101193868
Wizard 8x22b is actually good, how many of you faggots have even actually ran it without gimping it into oblivion with shitty quants?
>>
>>101195181
>implying we have an option to run it full but quant it down anyway
>>
>>101193809
>model alone
Why are so many people focused on trying to do things with just models by themselves? Can we stop expecting the magic mult matrix to do everything and start looking at it as part of the grander system?
>>101194805
What they mean is that having a system prompt usually ties the filtering to that prompt, and changing the system prompt often bypasses the filtering, so they just didn't bother with one.
Think about it: If all the training data that is filtered and censored starts with "You are a large language model " shit, the model associates filtered and censored output with that phrase. Take it out, and it forgets that it's supposed to be censored.
>>101194924
So you can't even trust that shite program to tell you how many layers there actually are?
>>
>try llms once every few months
>everything still sucks
>repeat
>>
>>101195324
prompt issue
>>
File: 00042-4080471795.png (1.28 MB, 1024x1024)
1.28 MB
1.28 MB PNG
>>101193668
Sorry your having a bad day, mentally ill miku-hating anon.
>It has the same refusals
I think it could be that Mag-72B just doesn't like your blacked fetish. It was fine with all the non-vanilla shit I threw at it, and never refused where Qwen2-72B did. Also noticed that Qwen2-72B had a weird hang up about consent that was completely absent from Mag-72B.
>>
I find it funny the agpl sharteen has done more to harm it than anyone else in these threads
>>
>>101191975
if you're still here try a search for "keening"
>>
Does anyone else here have the "issue" when running Llama.cpp with a large (and slow) model and long context that token gen doesn't start instantly when continuing a cached prompt? I just got back into the hobby and I feel like in the past it was supposed to be instant. I mean why wouldn't it? The cache is already in memory, so it should just begin token generation immediately without processing anything, right? Not sure if that's just what it's like nowadays or I'm doing something wrong here. I mean, it is fast, but it's not instant like I remember it being.
>>
>>101195573
>issue
>llama.cpp
yeah that's an issue alright
>>
>>101195573
Is it the bug where you have to make the cmd window active before it moves from 'processing prompt' to 'generating' even if the former is at 1/1 tokens and completes instantly? If so, yes, and no, I don't know how to fix it because hours of searching didn't find anything relevant at all.
>>
>>101195485
>someone who spams blacked must love blacked
I am happy it gets under your skin and you have to cope you piece of shit troon.
>>
>>101193868
>>101193906
>>101193954
What if they benchmark this differently?: Instead of relying on one "good/authoritative" model to evaluate the rest, they asked every mayor/foundational LLM to score every other one, maybe even themselves. Then average out all the results to get a more well-rounded and fair view (I'm not necessarily arguing a better one, but at least a way more fair one)...
>>
File: belief.png (592 KB, 747x800)
592 KB
592 KB PNG
>>101195503
>>
>>101195626
No? I can have the window minimized and it still works.
>>
>>101195675
Damn. No idea then, sorry anon.
>>
(chatbot) I just got a 7800X3D+48GB RAM+RTX4090 (24GB GDDR7). How do I use it effectively? What size models should I be targeting and should I be using KoboldCPP to run it?

Maybe I'd like to try image generation too.
>>
>>101195181
I've run it at Q4_K_M but was rather unimpressed. But that's about the limit I can run it full GPU with a decent amount of context.
>>
>>101195665
>>101195638
the blacked spammer was the believe guy huh. sad
>>
File: 3634858.webm (2.11 MB, 1024x1024)
2.11 MB
2.11 MB WEBM
>>101195138
Mixtral 2 will SHOCK everyone
>>
>>101195690
King of vramlets mode.
>>
File: omgbelieve.png (575 KB, 747x800)
575 KB
575 KB PNG
>>101195818
i am NOT the nigger spammer doever
>>
>>101195638
>Doesn't deny he loves it
kek, cucked
>>
File: 1717174735034137.jpg (195 KB, 900x1398)
195 KB
195 KB JPG
>>101195638
Post Theme:
https://www.youtube.com/watch?v=4SiiRx7GDzI
>>
>>101195485
Now try again with the default system prompt or a basic one. You'll have to jailbreak it by adding more context.
>>
>>101195878
>loves onions and wine
check
>dick's not big
check
>my girlfriend fucked my friend
check
>i'm a cuck
check
>offended all the time
check
>terminally online
check
Yeah it all checks out for blacked anon.
>>
So why doesn't Gemma 2 work with 8K context? Wasn't it a hybrid of 8K global attention and 4K sliding window?
>>
>>101195878
>isn't miku
>isn't vocaloid
...
>>
>>101195919
https://www.youtube.com/watch?v=lM_Hu8mdNOI
>>
>>101195909
Better question is why is 8k context acceptable at all. We are actually regressing. We went from 2k->4k->8k->16k->32k to 8k again. Next time they will throw 4k at us and we will eat it like hungry dogs. Grim.
>>
>>101195573
Continuing this investigation, I see in the terminal window that I get 1 token processed each time I continue prediction. Perhaps that's why there's a small delay and it isn't instant, as it's inserting some kind of token into the context and maybe throwing it out. Anyone else notice this? Or is it properly saying that 0 tokens were processed?
>>
make your own model and you can make the ctx as long as you want
>>
>>101195962
Maybe it's the context shift at work?
It's deciding which part of the context to keep and which to eliminate on which end before continuing?
I don't think I have seen that behavior myself, however.
>>
write your own story and you can make the context so long you can even go back multiple chapters
>>
>>101195873
>no u
kek
>>
>>101196002
>still doesn't deny it
lel
>>
>>101196007
Not that anon, sweaty. It just funny to see you cope and seethe from that thread schizo's actions.
>>
File: file.png (105 KB, 269x377)
105 KB
105 KB PNG
>>101196024
>sweaty
>>
>>101195977
I just tested it at low context and even then it says that it processes 1 token. I don't see any flags for context shift in the docs for server, which I'm using. So when you continue a cached prompt, it does say that 0 tokens were processed?
>>
>>101196036
Not my problem, you can see that word all across 4chan.
lurk moar newfag.
>>
>>101196024
>jerking off to gay blacked porn right now
down bad, also like >>101196036 said, you mispelled sweety, faggot :^)
>>
>>101196050
>I was always meant to say sweety
full on cope train is achuggin' with this one, boys.
>>
>>101196057
You are trying too hard, and clearly hallucinating things now.
>>
>>101196070
>copes so hard he has to make up cope excuses for me
PPPPPFFFFFFFFFFFFFF
>>
>>101196070
>"I'm not coping."
>copes
LMFAO, are you sure you aren't Biden? Do you know where you are? What day it is?
>>
>>101193668
Anon, let me remind you, this general is /aicg/ knockoff.
/aicg/ is known to be full of troons and piss drinkers, they do that to get access for some proxies and cloud model's API keys, so all the newfaggotry and passive-aggressive attitude is the result from crossposters.
>>
>>101196080
>>101196085
Least obvious samefag trying to get someone mad, you must have a really sad life if this is the only source joy for you.
>>
File: miku-hi.gif (1004 KB, 498x498)
1004 KB
1004 KB GIF
>>101193668
>>101196100
Have a Miku!
>>
File: kang.jpg (92 KB, 583x640)
92 KB
92 KB JPG
>>101188248
>>
>longer context
>but nothing uses it properly
GOD
>>
File: gooutside.png (10 KB, 783x95)
10 KB
10 KB PNG
>>101196112
Take the advice of the picture, schizo
>>
>>101195953
Meta did say they were working on long context tunes
Unfortunately, Meta isn't always true to their word (remember L2 34B?) and even if they do release it, it's pretty clear Gemma 2 wins on the intelligence front so it'll feel pretty pyrrhic
>>
>>101196112
>coping so hard that he is being called out
damn, anit-mikus really are the troons of this general. That's rough...
>>
>>101196112
>wahhh, I'm not mad, wahhh
kek, the pathetic mulings of a zoomer /pol/lack
>>
File: ComfyUI_00113_.png (986 KB, 1024x1024)
986 KB
986 KB PNG
>4am
>/lmg/ is alive
fun.
>>
>petr* just leaked their timezone...
>>
>their
>>
>>101196169
>pol out of nowhere
Ah, so i am talking with unironic tourists here, it explains this >>101196100 perfectly.
>>101196178
No it just two samefags.
>>
>>101196044
Just gave it a try.
I started a new chat.
Send a first message with quite a few tokens.
Waited for the model to respond.
Sent another message.
Waited for the model to respond.
Rerolled the model's response.
Inference started instantly (as in, there was not pause), but looking at the debug output, there's the following message :
>INFO [ update_slots] we have to evaluate at least 1 token to generate logits | tid="106340" timestamp=1719630855 id_slot=0 id_task=1533
>INFO [ update_slots] kv cache rm [p0, end) | tid="106340" timestamp=1719630855 id_slot=0 id_task=1533 p0=4933
>VERB [ update_slots] prompt processing progress | tid="106340" timestamp=1719630855 id_slot=0 n_past=4934 n_ctx=32768 n_tokens=1 progress=0.0002026753209065646
>VERB [ update_slots] prompt done | tid="106340" timestamp=1719630855 id_slot=0 n_past=4934 n_ctx=32768 n_tokens=1
So yeah, it does process 1 token from the prompt on purpose, even with the cached context.
>>
>>101196112
>you must have a really sad life if this is the only source joy for you.
Oh the sweet, sweet irony that you post the same the same pictures, and the same sentences, day in to day out, and yet have the audacity to say this, you're terminally online and we both know it.
>>
What's the best model for large conversations that don't forget my points or my ideas? Seems like most AI chatbots forget what I'm talking about or the details of the convos after a while. Read it had to do with context.
>>
>>101196191
KEK, I love how you pull the same copes out everytime like they mean anything, its okay to be buttblasted, you are constantly upset because your are terminally online.
>>
>>101196196
But enough about resident mikufags, they do just that. I see no difference from your local blackedfag, or petrafag, both are shit.
>>
Is gemma good or can I go back into cryo sleep?
>>
>>101196191
>resident schizo has to cope with reality because its buckbreaking him.
At some point do you just ever feel ashamed of yourself or are you so autistic that it just never registers? I get second hand embarrassment from a lot of your posts.
>>
>>101196213
yes
>>
>>101196213
It's the new meta instead of Meta
>>
File: I SAY GOOD SIR!!!.png (354 KB, 550x589)
354 KB
354 KB PNG
>>101196212
>picrel
It's you when your cope reality gets destroyed, everyday, everytime, over and over again. But we both know you like buttstuff and you like the booty blasting, don'tcha?
>>
File: .png (119 KB, 1525x551)
119 KB
119 KB PNG
VOTE VOTE VOTE
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
>https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard
>>
>>101196212
this is the most autistic thing I've read in this general the whole day... don't forget to vote for biden, btw.
>>
>>101196225
Such a nice way to prove me right.
>>
>>101196233
>instantly falls for the trap
kek, got him, you can't help yourself.
>>
>>101196231
>muh vooting
Really?
>>
>>101196233
>posted a picture
>"Uhh you just proved my point."
Stop, this is actually pathetic.
>>
>>101196233
>muh ebil picture
KEK
>>
>>101196213
Wait a week until all bugs are discovered and fixed everywhere.
>>
File: 1717011777161186.jpg (786 KB, 1536x1536)
786 KB
786 KB JPG
>>101196233
Some people just see everything as dildos even if they don't resemble them. Mental illness at its worst.
>>
>>101196192
Huh, so maybe it's normal? When I try it on an 8B, it feels pretty much instant, but the model I'm actually trying to use (CR+) gets a much slower speed, and there it doesn't feel instant. It's about a 1.2 second delay on that, while on 8B it says it's 0.020 seconds (basically instant).
>>
>>101196269
>NOOO YOU POSTED A MIKU PICTURE, YOU JUST PROVED HIM RIGHT!!! HE IS GONNA BE PROOOOOOVVVVVEEEEDDDD!!!!!!!!!!!!!!!!!!!!!
dude is probably ejaculating all over the place the moment he saw you post.
>>
>>101196279
Yep. Seems to be normal.
>>
>>101196280
He hasn't responded yet so maybe he did and fell alseep, lol.
>>
>>101196290
You talk about dildos out of nowhere and using it as argument here, either bait or extreme case of schizophrenia.
>>
File: UTC+0100.png (692 KB, 1600x853)
692 KB
692 KB PNG
hi petra
>>
>>101196300
>no u
What a sad attempt at a comeback...
>>
>>101196307
kek give him a break he just came after seeing that second miku post.
>>
>>101196280
Anon, i have some standards, your waifu is shit and outdated.
>>
>>101196305*
hi creep
>>
>>101196317
Its not fun when they are so helpless that they can't form their own thoughts and only throwback what they are originally accused of. That 10 point IQ drop in the west has really just destroyed any sort of public forum.
>>
>>101196330
kek, it happens when you are mutt. probably half-hispanic, half-black, or half-arab.
>>
>>101196317
Anon, you seem lost, i am not that blackefag. Your shit waifu is not worth any actions from me.
>>
>>101196218
Yes to cryo sleep or yes to gemma being good?
>>
>>101196213
Go back to cryosleep, joogle's shitstain that is gemma cannot be good.
>>
>>101196354
Yes
>>
>>101196353
No one cares about your opinons if you aren't interesting, you can barely form a sentence that is coherent, let along interesting enough to read.
>>
File: file.png (370 KB, 478x498)
370 KB
370 KB PNG
>>
>>101196353
>t. attention parasite
>>
>>101196152
It's not July yet, when the models were originally slated to be released. I get it, you're sour from them not releasing the 34B (which was probably botched given what people said about the code version), but this kind of thing applies to all companies and most big companies do follow up on most of their big announcements.

Also a 32k or more L3 8B wouldn't be pyrrhic at all. A lot of us have little use for low context models and VRAMlets have been starving for smart models that are also long context, as neither L3 (current) nor Gemma are greater than 8k. But if a 32k 8B comes out then they'd finally have something. Honestly rather than anyone winning, it feels more like they're converging on even power levels. Gemma is smarter, but only after launching later than L3. L3 V2 might still not be as smart, but it'll be longer context. Perhaps even Mistral will come out next with a new small model that's competitive again (lol).
>>
>>101196375
>no one cares
Nah it just you tiktok zoomer with 10 seconds attention span.
>>101196385
But enough about resident avatarfags.
>>
Would the model detoriare in quality if I go
{{user}}: speech *action*
{{char}}: "Speech" action ?
I don't see much of a difference with either
>>
>>101196412
>Nah it just you tiktok zoomer with 10 seconds attention span.
>"Its' le zoomzooms, It's le zoomzoomz!!!"
Like I said, this is pathetic, boring, and just shows you are low IQ. Stop responding to me with your retarded mewling. Your opinion here doesn't matter, pretend that this is real life if it helps.
>>
File: 1717431276299204.png (460 KB, 1004x704)
460 KB
460 KB PNG
>>101196412
lmfao just keep dancing monkey
>>
>>101196453
I don't give a single fuck about your whining though, in case you haven't noticed. I just know you are neither of any claims you imply, i.e. high IQ or being interesting.
>>101196461
Wow thats some redditor-level edgyness, pathetic even.
>>
>>101196493
I saved time and didn't read any of this, now you can stop shitposting the thread about your low test woman problems, we can get back on topic.
>>
>>101196112
How the fuck do you think you're the one in the right here? You're a complaining faggot and complaining is for women. Miku is thread culture, sorry that not every board is random anime and porn spam.
>>
>>101196493
>Wow thats some redditor-level edgyness, pathetic even.
LLLLLLLMMMMMMMMFFFFFFFFAAAAAAAAAAAAAAAAAAAAAOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO
you don't even have one original thought in your head. you are the NPC, you are what you complain about, but like most redditors you don't even see it.
>>
>>101196429
No, you'll just get inconsistent formatting.
>>
>>101196493
you type like your dad fucked you hard up your ass for a number of years.
>>
File: 1531784817052.png (214 KB, 453x528)
214 KB
214 KB PNG
Me hungry. Going eat!
>>
File: IMG_8099.jpg (436 KB, 1536x2048)
436 KB
436 KB JPG
goodnight lmg
>>
>>101196493
>Everyone I don't like is a Mikuposter
This is unhinged, you need help.
>>
>>101196429
Depends on the model's size and training, and possibly further fine tuning.
>>
>>101196550
stop going to sleep
>>
File: 1714605047582845.jpg (310 KB, 2048x1536)
310 KB
310 KB JPG
>>101196024
Obvious samefag is obvious
>>
>>101196766
hi anon where were u
>>
>>101195821
I believe you.

>we're only on 0.3 so far
2mw more like 2my... It's over...
>>
recommended settings for gemma?
>>
>>101196906
Is there a way to run gemma-27b on ooba yet?
>>
>>101196957
>ooba
abandonware.
>>
>>101196967
the fuck?
>>
re: Nous-Hermes2Pro, with function calling, if function definitions need to be in the system prompt, and system prompt is set in the modelfile, and you create the model based on the modelfile, does that mean i can't have dynamic functions?

Say I add a SetTimezone() function, I'd have to stop the model in ollama, alter the modelfile, create the model from the modelfile, and run the new model?

Can i just include the function definition in the prompt?
>>
>>101196974
>https://github.com/oobabooga/text-generation-webui/commits/main/
>he started commiting 5days ago
oh nevermind
>>
>>101196957
Manually update transformers in your oob's venv, and turn off do_sample
>>
P3tr#'s timezone is UTC+1 given this post: >>101196178
This will be important at some point.
>>
File: 1719359944040185.jpg (162 KB, 1125x1043)
162 KB
162 KB JPG
>>101197093
>oh no
>>
>>101197169
>>101197169
>>101197169
>>
>>101196976
nvm it works fine to define functions in the prompt body
>>
File: firefox_V4PLEAi7PU.png (229 KB, 1836x1135)
229 KB
229 KB PNG
>>101197050
Are those parameters right for loading? I'm getting nans for probabilities.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.