[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: 114774158_p0.jpg (3.47 MB, 3343x4737)
3.47 MB
3.47 MB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101214216 & >>101205004

►News
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>101214216

--Power Management for 4090 GPUs in /lmg/: >>101216999 >>101217102 >>101217590
--Llama3-405b Development and Release Speculation: >>101217383 >>101217415 >>101217435
--Fixing Slop in NSFW Roleplay with Control Vectors and Diverse Datasets: >>101214325 >>101215842
--The Future of LLMs: Stagnation or Untapped Potential?: >>101214713 >>101215882 >>101216246 >>101216453 >>101216357 >>101217018 >>101217411
--Sonnet 3.5 Makes Local Models Obsolete, But at What Cost to Startups?: >>101215426 >>101215449 >>101215461 >>101215475
--P100s vs P40s for Mikubox and Cheap VRAM: >>101218031 >>101218185 >>101218458 >>101218557 >>101218677 >>101218774 >>101219115
--glm3 and glm4 Support Coming to LLaMA.cpp: >>101220273 >>101220356
--Gemma 2 9B SPPO Iter3 Breaks into AlpacaEval Leaderboard: >>101217131 >>101217369 >>101217564 >>101217614 >>101217663 >>101217739 >>101217602 >>101217856 >>101218161 >>101218174 >>101218550 >>101218599 >>101219456
--Gemma 2 27B and Alternative Models for 16 GB VRAM: >>101215609 >>101215738 >>101216012 >>101216621 >>101216799 >>101216080 >>101220457 >>101220548 >>101220597 >>101220668 >>101220735 >>101220983
--Fixing Slow Generation in Kobold Group Chats with ST: >>101222623 >>101222648 >>101222653 >>101222680 >>101222857 >>101222899 >>101223220 >>101223288
--Anon Rants about Broken Model Configs and Token Issues: >>101217361 >>101217425 >>101217688 >>101217739 >>101217559 >>101217576v>>101223820 >>101224123
--Mistral.rs and 27B Models Hit Memory Wall with Larger Contexts: >>101216383 >>101216870 >>101216903
--Anons React to Cohere CEO's Human Centipede Effect Statement: >>101224025 >>101224150
--Best Japanese Model Recommendation Needed: >>101214617 >>101218336 >>101218652 >>101218831 >>101219626 >>101220598 >>101221130
--Gemma2 Context and Instruct Templates Explained: >>101218277 >>101218394
--Miku (free space): >>101223447

►Recent Highlight Posts from the Previous Thread: >>101214230
>>
let's just keep shitting out slopped assistant models with synthetic data instead of training on terabytes of human chatlogs for a more natural and human AI
>>
let's just keep shitting out slopped assistant models with synthetic data instead of training on terabytes of human posts from 4chan
>>
>buy used 3090 Ti
>download 2GB vram modules from Uncle Chang and swap for 48GB total
>Mog Tesla A40 with higher clock speeds and overclocking at 1/5 the price
Give me one good reason not to do this
>>
>>101216357
That's not going to happen. As you scale up the number of parameters, the distinguishability of the training data increases. The models trend towards learning patterns as linear classifiers and then distributing the embeddings so that those classifiers work, which is why you can see a big change in model output with something as simple as a LoRA, since it mostly just moves the embeddings around to get a satisfying output from all of the internal trained classifiers. With higher parameter count comes a larger embedding space, meaning better classifiers + more classifiers. That, coupled with the fact that a transformer with an infinite context window is effectively a Turing machine, means that it won't be long before learned models replace things we traditionally coded by hand. The fact that transformer language models aren't already capable of simulating calculation in context means they haven't quite cracked the code as far as pre-training goes, but it is 100% possible for a transformer to learn a calculator, and once they solve that, no algorithm is off limits.
>>
>>101224321
What uncensored version of Gemma 9b can I download on lmstudio?
>>
>>101224436
None! Go ahead and pioneer so all can learn from your experience.
>>
>>101224375
>terabytes of human posts from 4chan
There's maybe a few hundred megabytes at best
>>
>>101224460
Will big gpu come after me for making the A40 obsolete overnight?
>>
Daily reminder that (You), the person reading this, do not need more than 8192 context
>But I—
No, you don't
8192 is all you need
>>
>>101224618
What about when the RP partner suddenly starts forgetting everything?

>had a great one going
>then she forgot the macguffin that was the whole foundation of the interaction
>everywhere at the end of time begins playing
>>
>>101224458
I can't get him to do a rape story.
>>
File: 2765567894322.jpg (41 KB, 480x360)
41 KB
41 KB JPG
So has anything gotten as good as mixtral limarp zloss or should i just go back to cooming?

I have seen no proof of gemma being good.
>>
>>101224618
This and 1 t/s.
>>
File: 1701383403555463.jpg (261 KB, 928x1232)
261 KB
261 KB JPG
>>101224321
>>
Potential idea to eliminate slop that would take work but not an unreasonable amount if someone is autistic enough: teach future models a world model about slop and notslop so that it can simply just be prompted away, by us writing entire articles about the concept, and posting them to github, thus infecting future pretraining datasets.
>>
https://x.com/BasedBeffJezos/status/1807550891781927332

Is Command R+ really that different from OAI and the infinite goyslops local models trained on OAI datasets?
>>
>>101224976
>https://x.com/elonmusk/status/1807637096129241529
>Grok 2, which comes out in August
Oh, nice, can't wait to not run it locally.
>>
>2024.5
>still no Nemotron gguf
>>
>>101225021
It's pretty good for coom, been using it on OpenRouter
>>
>>101225020
Doubt its gonna be released. They only released Grok 1.0 when they had their 1.5 ready. So when 2.0 hits, they'll prob release 1.5.
>>
>>101225041
They never promised they would be releasing models from now on, please let's stop with the hopium.
>>
Recommend me an 8b or up to 13b Q5 at most model for RP.
>>
Is Gemma 27b IQ3_M gguf quant better than gemma 9b 8 bit quant?
>>
>>101225473
yeah but better shit is still shit
>>
I am current using llama3 8b on my old pc and I am blown away

My old i7 6700 with 16gb ram using llama3 8b cpu only is fast enough for my purposes. From what I understand llama3 70b at various quantization can run on 64gb ram but llama3 70b fp16 the highest end model needs around 160gb ram at least so a system with 256gb of ram is ideal

Is there a significant enough improvement in LLM output going from llama3 8b q4 to llama3 70b fp16 to bother?
>>
>>101224618
My ongoing RPs get up to like 5k in summary information alone. 16k is the bare minimum to be able to have a long session and then summarize. 20k is better.
>>
>>101225649
llama3 8b q4 -> llama3 70b q4: Yes
llama3 70b q4 -> llama3 70b q8: No
llama3 70b q4 -> llama3 70b fp16: No
>>
So there's still no way to make Gemma 2 27B run locally with the intelligence it has on lmsys, even with Transformers? Are Google intentionally withholding some inference trick needed to make it work properly? Why?
>>
>>101224436
One possible reason could be that NVIDIA has provisions in place to make it not work if you replace the memory module with any other, but other than it's a great idea. Thanks, Jackie.
>>
>>101225720
Thanks for the info

From what I understand llama3 70b q4 will work on a system with 64gb ram. It could run on this old system even if it is very slow. Speed/performance is not an issue. If the improvement from 70b q4 to 70b f16 is not that significant it might not be worth bothering to upgrade
>>
Has anybody tried asking core4 solutions, who sell all the parts for the v100maxxing system individually and disassembled, if they have some full units lying around one can buy?

I'm don't think I have the skill to put together the System with the V100s myself.
>>
is gemma 27b just a lobotomized gemini flash
>>
File: 7634575634.png (764 KB, 617x780)
764 KB
764 KB PNG
>having repetition problems
>AVOID REPETITION in prompt in various ways isnt working
>use rep pen
>1.08, 2048, 0.06
>stopped having repetition problems
>characters respond even better
>better gens in general

I was told rep pen was cope this is bullshit
>>
>>101226082
Always have just little bit of rep pen in there.
>>
>>101224436
Without an appropriate change to the VGA BIOS the increased memory capacity will not be usable.
So typically you can only do these memory mods if there is another GPU skew with the same chip but more memory.
>>
>>101226082
I don't think anyone says rep pen doesn't work, it obviously does work to prevent repetition.
What they say is that it makes the model dumber in situations where it wasn't going to repeat, by subtly fucking up the probabilities of every token.
>>
>>101224618
Nah, 8192 is too much, goyims only deserve 1024 context
>>
What do we think about DolphinVision 72b?
>>
>>101226205
goy only need enough context to spell 'EMET'
>>
>nonetheless
>nevertheless
>despite...
>>
>>101225460
you will never be satisfied with a 13b for rp
>>
>>101226082
use DRY sampler anon, it's even better
>>
>>101226082
The main problem is that repetition penalty is a bruteforce method that doesn't take into account token logits, so you're effectively making the model dumber.

It would be interesting if somebody implemented a "half-typical-p" where it only dynamically cuts the the token distribution from the head of the token distribution instead of both the head and tail. The idea is that you'd use it together with min-p. Usually typical-p doesn't work too well because to make it truly have an effect you have to use it at around 0.2-0.5 or less, which removes too many tokens from the tail, reducing token diversity.
>>
>>101226124
https://www.techpowerup.com/vgabios/267498/267498 would this work or is it fake?
>>
>>101226528
Don't know.
>>
>>101225473
ALOT better.
I dont use local for coding so I cant say about smart, but RP is much better on 27b even though its supposedly still broken vs. ai studio.
9b is full on gpt slop. I dont know what they did that caused this.
27b too but much less so. I like it so far, it follows instructions very well.
Big upgrade for ramlets like me who used stheno 9b.

If you can offload like 80% to gpu. I take waiting 1-2 min for a long reply and less retarded.
I think in silly you can even set 2-3 swipes at once and get a "ding" sound once its done. But thats up to you.
>>
Hey anons where do you use XTTS?
>>
>switch to l3 from mistral
>it actually understands my dumb fetishes and does dirty talk very well
I wonder what kind of filth zuckerberg trained this shit on
>>
Gemma2 is still fucked in llama.cpp.
Its not a memory error but after X turns it just crashes and I need to restart the server.
Not even full 4k context.
>>
>>101226450
at a certain point it's just english that you're seething about lol
>>
>>101226792
l3 isnt that good so damn this shill post sucks
>>
>>101226807
I propose running additional prompt on latest output with a list featuring rare synonyms for words from the output and a prompt like "Consider using these synonyms to enhance your response". By incorporating synonyms from the vocabulary in a random order, we may not only improve the overall diversity of language but also address potential issues of repetition.
>>
>>101226807
Well, English is the slop language by design.
If you start talking to a LLM in Spanish, all problems with shivers and journeys suddenly disappear.
>>
>>101226807
it's just too repetitive, those model only learned one way to speak, and it's the gpt4 slop language, when will we have a giant dataset only made with actual human writing in it?
>>
>>101226957
>Well, English is the slop language by design.
I really disagree with that, if you look at the best authors (Shakespeare, Orwell, Dickens, Bronte, Tolkien...), you can see how english can be turned into something beautiful, what we're having is just corporate slop language because those models have been trained with gpt4 slop and nothing else
>>
>>101226957
You post this every day at roughly the same time
>>
>>101226792
15T tokens anon. That's a lot of fetishes.
>>
>>101226527
In alternative you could use typical-p with a higher temperature.
>>
>>101227056
They were scrapped though
>>
>>101227056
he said that mistral got the fetishes and not L3 (the one with 15T tokens)
>>
>>101227080
>TO l3 FROM mistral
>>
>>101226988
Orwell heavily hinted at degradation of English in his 1984, kek.
Didn't read the rest, but I heard from people that Tolkien is unreadable.
We need to understand where gpt4 slop came from. You can say open AI models were censored this way at the beginning, but I disagree.
I think GPT captured English literature perfectly.
>>
>>101227079
Yes. There's a lot of fetish shit in the open internet.

>>101227080
Read again friend.
>>
>>101227080
How many Bs made this post?
>>
>>101227118
>>101227130
>>101227134
>How many Bs made this post?
oh yeah my B ^^'
>>
>>101227119
>We need to understand where gpt4 slop came from.
from the fact that every finetuning are only using gpt4 slop dataset????
>>
>>101227119
>Orwell heavily hinted at degradation of English in his 1984, kek.
He pointed at the usage of English by govts (and corpos now) as a tool. English as a language is fine. I like it more than Spanish.
t.argie
>>
>>101226481
is dry sampler on HF loader only? i need to unbag ooba again?
>>
>>101224976
CR+ isn't as bad as something like Qwen at least. It's still sadly a lot more prone to gptisms than base CR.
>>
>>101227274
looks like it
>>
>>101227283
Base CR is still king
>>
>>101227119
>We need to understand where gpt4 slop came from.
we know. it's from female written fiction. fan fiction/romance fiction/genre (that's really just more romance). they overuse phrases and words to a comical degree and fellow writers will copy others so it all just mixes together. since female written content is the majority of the book market for a long while (at least in number of books) it's really just a flood of these few phrases getting slammed into these models
>>
I've been enjoying suno lately. Any progress being made in FOSS music models?
>>
>>101227306
base CR is more retarded than mixtral, runs slower than mixtral, and takes so much ram/vram for context you could fit another mixtral quant in there.
>>
File: file.png (74 KB, 2308x344)
74 KB
74 KB PNG
COME ON NIGGERGANOV

I WANT TO COOM TO GEMMA 2 NOW AAAA
>>
>>101225676
3k preamble + summary
3k history of messages
2k for RAG

you don't need more
>>
>>101227481
TRVKE
>>
it's not even that you "don't need more" - you are actively hurting your output quality with bloated contexts.
>>
>>101227465
it's ova
>>
>>101227465
what about exllama2? it's not working there aswell?
>>
>>101224436
i thought 3090ti has 12 memory chips, not 24
>>
>>101227626
>exllama2
no open pr, no issues asking for it, only a 'discussion' saying there's no rush
https://github.com/turboderp/exllamav2/discussions/528
https://github.com/turboderp/exllamav2/issues?q=gemma
>>
>>101227465
AIIIEEEEEE
>>
>>101227660
>a lightweight model that all vramlets have been waiting for that beats even 70bs releases
>"...there is no rush guys :D"
>>
>>101227688
Ikr... are they living in the same universe as us or something? For once google hasn't sucked ass, we should profit from that
>>
How big a single-purpose LM has to be to handle simple tasks such as extracting data from text with little or no processing, or having basic Siri capabilities (alarm clock, ToDo lists, etc)? Sub-100M would handle this well enough, no? BERT done such stuff pretty well.
>>
>>101227691
two more weeks
>>
>>101224362
> terabytes
It's obvious that you've never trained a model in your life.
>>
>>101227704
siri stuff is just function calling
it depends on your input

if you say "create new alarm clock: 8:00 AM", then even 3b can do it

if you say "ayoo holup wake me up when september ends i mean tomorrow like when i get enough sleep you know like eight hours from the time that is it now or something" then you need 70b at least.

extracting data depends again on what you need, if it's literally extracting lines of text as is, an 8b is enough. For summarisation of any kind, the bigger the better
>>
>>101227465
just wait two weeks
>>
when will 24gb vram cost 600
>>
>>101228065
In 5ish years.
>>
>>101227835
i give them ten working days, any more and it will be jartine/gemma-2-27b-it-llamafile time.
>>
>>101228065
P40s are $150
>>
File: file.png (468 KB, 2014x1684)
468 KB
468 KB PNG
>>101228065
now
>>
>>101224436
>Give me one good reason not to do this
You will fuck it up, guaranteed, and that's if the board actually supports an additional address line, which it probably doesn't. Only reason it works on, say, a 2080ti is nvidia used the same layout for the RTX Titan, so the extra address line for 2GB RAM chips is present.
I've done loads of surface mount work, and I know better than to fuck around with BGA stuff, since I do not have a preheating rig, and the RAM isn't exactly cheap.
>>
>>101228105
Don't believe the hype. M is fine for small stuff, but it drops below P40 speed on larger models, and that's even if you buy the most expensive CPU you can. Apple M is about 3070-tier at best.
>>
>>101228187
i already have m3 max with 64gb, so i know the pain... i get 2t/s with CR+ IQ3, and someone offloading it to RAM gets 1t/s - it's same slow shit in the end, except they can add more RAM easily and theoretically run even 400b.
>>
gemma 2 27b Q8
>{{user}}: let's throw a 1d6 dice, if it's 4 or higher, you win and can eat me
>1d6 throw result: 6
>{{char}}: haha it's your lucky day today, you win this time!
oh well
>>
>>101228431
>a dice
ignore my grammer, sirs
>>
>>101228467
die
>>
File: MikuFulfilmentCentre.png (1.41 MB, 1216x832)
1.41 MB
1.41 MB PNG
Good morning lmg!
>>
>>101228786
Good morning Miku
>>
>>101228431
You did win though. You wanted to be eaten didn't you?
>>
>>101228860
{{user}} says "if it's 4 or higher YOU win"
it rolls 6
{{char}} says "you win"

but thanks to your post, i'm no longer suprised.
>>
>>101228786
gm sir
>>
is Gemma-2-9B-It-SPPO-Iter3 really better than llama3 70b? are vramlets back?
>>
Miqu still the best for 48GB vramlet?
>>
>>101226481
>DRY
Do you have settings for dry? What do you have for sequence breakers? I think my settings might be wrong because I end up with 'more' repetition after it's turned on.
>>
>>101227704
my conservative bet would be at around 300M
>>
>>101223257
Sorry I missed the last thread, I was busy having fun with my 700 message group chat that I NEVER have to process the prompt for more than once
How might you achieve such enlightenment, you ask? Well, it's simple, anons
Uncheck "force names in group chat." In fact, uncheck everything that relates to adding names to chat. If you want names, use {{name}}: somewhere in the user/assistant prefixes (some models do better with this than others). Also uncheck "add example dialogue" stuff. You don't need it, anon.
You need to eviscerate ANY references to character description, scenario, personality summary, and I even ditch {{user}} persona in your story string. You don't want ANY of that in there (persona will cause prompt to reprocess if you go ~3-4 messages without a {{user}} response). Instead, you're gonna add ALL of it to world info - so the only things in your story string are WI before/after, and I leave scenario in there but only because the CR+ template for some reason won't work without it. I go the extra mile of "blanking out" the cards I'm using, so they're literally empty, just a picture with the name attached. When you add the character descriptions and such to WI entries, remove all references to {{user}} and {{char}} and instead use their actual names as referenced in the chat.
You also want to combine cards (include muted). If you did it correctly, you'll have a unique lorebook for each group chat containing 100% of the info you need for each character. I usually have them always on and inserted at the very beginning. For characters that only appear once in awhile, turn them on and off as needed with depth of 4-6. You can also do this with example dialogue - pop the WI entry on when the character is speaking and preface the actual entry with "EXAMPLE DIALOGUE FOR [character name]". After doing this I now exclusively use group chats and never reprocess more than once per session. This is a public service announcement by anon
>>
File: 00703-2979877490.png (939 KB, 1040x720)
939 KB
939 KB PNG
>>101228786
>>
>>101228187
>Based fat migu guy has had to resort to local diffusion
So sad... Dall-E, you fucking suck...
>>
>>101229039
No one said that.
>>
>>101229242
>based
>hag
>fat
>>
File: file.png (19 KB, 738x244)
19 KB
19 KB PNG
>>101229256
>>
>>101229303
go back
>>
File: file.png (21 KB, 517x363)
21 KB
21 KB PNG
nvm it's shit, it doesn't know what a pineapple is
>>
Why hasn't anyone made an issue for Gemma 2 in the exllama repo?
>>
File: file.png (15 KB, 513x340)
15 KB
15 KB PNG
>>101229321
yup it's totally retarded
>>
>>101229043
Whoa, cute style. Catbox?
>>
File: file.png (27 KB, 593x471)
27 KB
27 KB PNG
>>101229321
>>101229341
nvm, apparently temp 0 is required. might be decent
>>
>>101229381
>temp 0
i meant rep penalty 1
>>
File: file.png (37 KB, 676x551)
37 KB
37 KB PNG
8/10
>>
>>101229321
>>101229341
>>101229381
Is this really the best way to test a model? You're not really testing its intelligence with this, you're testing its encyclopedic knowledge about words, which is not an obvious thing for LLMs since they just see words as tokens, not even made out of letters. A smaller model will struggle with this, but may still be smart overall.
>>
>>101229467
writing 10 sentences ending in orange is encyclopedic knowledge?
>>
File: file.png (15 KB, 621x246)
15 KB
15 KB PNG
dunno, i'm not feeling it
>>
>>101229491
Well, I quoted them all but I really only meant this about the first one, where you ask to find words having particular sequence of letters in them. Asking for sentences feels like a better test to me.
>>
>>101229303
That makes no sense. It's a fine tuning technique, it's not a tweak on the model's architecture is it?
Unless the data they used is stellar, I don't see how a technique alone can make a model that much better.
>>
File: file.png (29 KB, 533x486)
29 KB
29 KB PNG
never understood how the watermelon thing worked
>>
>>101229544
It's a reddit post, now posted here. Don't take it seriously.
>>
>>101229586
Fair enough.
>>
File: file.png (4 KB, 218x77)
4 KB
4 KB PNG
>>101229550
>>101229504
>>101229441
>>101229381
>>101229341
>>101229321
final verdict: it's slop
>>
File: 1684646761958811.png (288 KB, 621x408)
288 KB
288 KB PNG
>>101229091
Based. Did you come up with this system independently, or did you catch one of my early posts on the concept last summer?
>>
>>101228138
Actually I thought the ram modules were pretty cheap, but yeah. Without BIOS work this would do nothing, the extra memory wouldnt be addressable. I'm comfortable enough with BGA work to try something like this and have most of the equipment but even then, not fucking up on 24 BGA resoldering is hard enough when a reball/reflow has enough of a chance of fucking up.
>>
compared TETO-8x7b, typhon-8x7b and mixtral-limarp-zloss-8x7b
teto and typhon are similar but teto is better
teto and limarp zloss are pretty different, i prefer teto over limarp zloss but limarp zloss seems a little bit hornier, keeping teto and limarp, deleting typhon
>>
>>101226646
in SillyTavern with an api server
https://github.com/daswer123/xtts-api-server
>>
>>101229258
yes, he is based.
>>
>>101229859
slim loli is based
fat hag no
>>
File: ai lap.png (24 KB, 707x77)
24 KB
24 KB PNG
Euryale has no idea what to do with a shota sitting in a woman's lap.
>>
>>101229869
neither do i
>>
>>101229799
after further testing, limarp ministrates too often, its different but eh, deleting it anyways
>>
No matter how hard I try I just can't find a model that beats TenyxChat-DaybreakStorywriter.
>>
>>101229976
that's just normal erotica language, you're not going to escape it
>>
>>101230042
havent had ministrations with teto and dry sampling since i downloaded it a few days ago
>>
I have never seen 'ministrations'.
>>
>>101230187
I've seen some human roleplayers write that often, FWIW. Pre-ChatGPT era.
>>
Try banning " a" " and" " the"
>>
>>101230187
I get it but only rarely; I don't really mind it either
>>
Well. I just tried Gemma 9B instead of 27B in mistral.rs because of the issues with memory it has. I successfully got to around 6.3k context at Q4K before it OOM'd, and yeah it was coherent. So I guess this guy really did it. He's the first one with a backend that actually works with Gemma, if you have the VRAM for its terrible memory requirements. Though I also messaged the dev and he says he will work on stuff like chunking/batching soon so the memory problem will be solved.
>>
>>101230330
so he implemented swa?
>>
>>101230330
>mistral.rs
i couldn't get it to work on macos, it just kept complaining about missing bf16 matmul no matter what, and i aint running it on CPU
>>
>>101230347
I guess so. From the program's name, I would also guess that he had already implemented SWA for the original Mistral and just had to adapt it to work with Gemma 2, so it probably wasn't that difficult.
>>
>>101230187
'ministrations' is 2023 era models.
Nowadays it's:
>voice husky with lust barely above a whisper causing shivers down the spine about the journey they are about to embark on together...
>>
>>101230379
maCUCΚ NTR'd by rust
>>
File: file.png (161 KB, 2209x646)
161 KB
161 KB PNG
>>101230330
I heard chatllm.cpp is the one that actually does it correctly.
>He's the first one with a backend that actually works with Gemma
Why are you lying like that?
>>
>>101230379
iToddlers btfo!
But yeah idk what the issue with that is.
>>
>>101230330
Buy an ad.
>>
File: file.png (93 KB, 2212x523)
93 KB
93 KB PNG
>>101230330
>The best implementation is by @foldl in his chatllm project.
>It's giving the exact same results as the aistudio version of gemma 27b.
>>
>>101229999
Beats it at what? RP I presume?

I tried L3-70B-daybreak-storywriter-v0.4 and it was goofy about writing the same paragraph over and over with progressive tiny revisions.
>>
why can't google just do some QA to make sure they don't fuck up their releases?
>>
>>101230483
They probably do, what exactly do you want them to further test?
>>
>>101230412
Interesting. I just checked it out but does it not have a server? In that case what I said is still true from the perspective of something people can actually get up and running with their current frontends.
>>
>>101230505
No one is going to switch to your shit project, shill. Go back to shilling to /r/LocalLLaMA.
>>
File: file.png (4 KB, 179x59)
4 KB
4 KB PNG
why are they all so slow
>>
Where's the uncensored models of sppo? I dont give up shit about cucked models
>>
>>101230518
It's not mine. I'm literally the same guy that was calling it shit because of how many hoops I had to go through, but hey you're free to call anyone a shill. I guess this guy's a shill too >>101230412 >>101230469
>>
>>101230499
messing up the soft logic bias support, tokenizer.
>>
How is Magnum 72b? I'm curious. "Trained to replicate Opus's style" has me worried that it's sloppified.
>>
>>101230582
incredibly horny
>>
>>101230582
It's based on Qwen so the base is 100% gpt4 slop anyway
>>
>>101230582
>>101230582
What's wrong with Opus?
>>
>>101230582
>Trained to replicate Opus's style
It just means that it was trained on aicg logs.
>>
>>101230609
Nothing is WRONG with it, it's just that synthetic data is proven to produce really samey, shitty results, even with something like Claude. That, and many people are already used to Claude's style, it's why people are sick of GPTslop, too.
>>
>>101230619
Yep, that's what I was thinking. Which means it's probably slop.
>>
>>101230582
>"Trained to replicate Opus's style" has me worried that it's sloppified.
No, it's just some hyperbolic claim to get sponsors or something. Like a crypto bro scam.
>>
>>101230609
Opus is nice but it also has its own tendencies much like gpt4 has.
>>
File: E9-dwGLWYAIx3en.jpg (36 KB, 591x512)
36 KB
36 KB JPG
>>101230592
Maybe not for me, then. It'd be nice if they could find a balance on horniness.

Has anyone done that "regularization" shit they did with LORAs in SD? Where extra data that isn't exactly the subject being trained on (A girl that isn't Miku in a LORA Miku, for example) to prevent overfitting? I could see a set of regular cute romance fics or otherwise charming slice-of-life stories being tossed in there to prevent the default from always being horny.
>>
File: skilldragin.jpg (135 KB, 544x544)
135 KB
135 KB JPG
>>101230660
>>101230626
>TFW no matter how massive the model, people will always get used to the writing style and hate it
Is there no way to alleviate this...? What happens when we run out of ways to describe shit? There's only so many words, after all.
>>
File: 1714835911803030.jpg (1002 KB, 1792x2304)
1002 KB
1002 KB JPG
>>101230660
Same applies to Magnum-72B. It's reasonably smart and the style is fresh if you're used to Mistral models, but you eventually notice that it has its own sloppy tendencies. Was good for a few days though, before my brain got acclimated to the opus style.
>>
>>101230682
this can be accomplished by just merging back with the base model (or instruct, or both) at varying ratios depending on desired effect
t. uses artisinal magnum + instruct + base custom merge
>>
>>101230693
We transition to native multimodal models and start genning manga instead of pure text.
>>
>>101230693
>Start reading The Last Wish yesterday
>Brain immediately shuts down because there were two metaphors back to back and that's metaphorslop
I have no idea what we're gonna do, but it's not gonna be pretty.
>>
>>101230694
>good for a few days though, before my brain got acclimated to the opus style.
That's the fate of all finetunes, especially those trained on synthetic data--they'll eventually show their own slop flavor. As long as models don't have some mechanism or specific training for avoiding long-term repetition and maintaining word diversity, you'll keep seeing it.
>>
>>101230714
After years of AI generals (mostly /aids/) I've actually started hating adjectives and adverbs.
>>
>>101230693
wait for a super intelligent model and use it to come up with an esoteric conlang totally divorced from all known human languages and then ERP with it
>>
>>101230752
>As long as models don't have some mechanism or specific training for avoiding long-term repetition and maintaining word diversity, you'll keep seeing it.
What can we even do to avoid this? It seems like the better-trained a model is, the worse this problem is. We need a model that's just slightly retarded, just so it isn't always as confident in its word choice. Ideally with a high parameter count, still. Something like Command-R+, but even less sloppy/a bit more accurate. It's hard to balance the schizo and the SOVL.
>>
>>101230694
It never felt like Opus...
>>
File: 1716167693425670.jpg (1.79 MB, 1378x2039)
1.79 MB
1.79 MB JPG
Has anyone tried the Cambrian-1 models yet? How are they? Are there any better options for vision?
>>
>>101230806
would destroy and rape and anally destroy and fuck and cum all over and lick her armpits and her belly button and her eyes and her mouth and i would kiss her
>>
>>101230790
qwen2 is actually like this, its token probabilities are much less skewed than most other models and you get a lot of variety on rerolls
>>
>>101230776
Fuck, man, right? It's actively making me a worse writer as I try and convolute it to avoid extremely common (and sometimes necessary) things.

I feel like it's the fate of all people who use AI a lot to wind up liking "bad"/less technically proficient shit because at least it's new. Like take pic related for example, I'd take a trillion gens that are this style over any mega-turbo-hyperrealistic trending-on-artstation slop.
>>
After trying Gemma 2 27B a few days ago, I thought this model is completely broken and useless, but after seeing yet another Gemma2-specific change to llama.cpp, I gave it another go.

I took the PR (https://github.com/ggerganov/llama.cpp/pull/8227), applied it to the experimental branch of kobold.cpp and compiled it. I'm using IQ4_XS, 41/47 layers on GPU (16 GB), 4096 ctx, I'm getting 5-7 T/s.

And it is actually good. Its responses makes sense, it's okay-ish at writing prose, it more or less follows instructions.

First new impression is surprisingly good.
>>
File: IrZM8Ey.jpg (221 KB, 1088x796)
221 KB
221 KB JPG
>>101230843
many such cases, technical perfection is boring
>>
>>101230535
nigger, it's 2024 and you still don't know how to "uncensor" local models?
>>
>>101230817
would sniff and lick and tongue her asshole so deep my tongue would come out from the other side
>>
>>101230790
Command-R (non plus) is like this
>>
>>101231077
I can't tell if these images are a black guy fantasizing about claiming white women. A white guy fantasizing about black people claiming white women. Or just someone trying to get a rise out of people. Considering that right now we are on /g/ and its miku I am leaning towards the third option, but it truly could be anyone's game.
>>
>>101231077
get back to work, CUDA dev
>>
>>101231077
mikufags not beating these cück allegations.
>>
>>101230790
>We need a model that's just slightly retarded, just so it isn't always as confident in its word choice.
you increase the temperature for that?
>>
>>101231077
im going to kill her and use her bloody throatpussy from the other side
>>
>>101231361
but it affects "smarts" too. You don't want high temperature when model decides whether characters hair is black/red/white
>>
>>101231390
Supposedly, a perfect model would have as his original logits 33.33% for black/red/white and 0 for the rest so the temperature wouldn't influence anything but I'm just dreaming here kek
>>
>>101231390
What if you had a small neural net model for setting temperature based on input tokens, temperature used for those input tokens trained on desired outputs? Then it would learn that names need low temp etc. I will make a logo of that! It is gonna be a thermometer you can shove up your ass.
>>
>>101231229
My working theory is that its someone who really didn't like like anything "anime" to begin with, despite being on 4chan. At one point he probably posted something that got deleted while miku posts did not, which he saw as unjust hypocrisy from the jannies. This sent him into a fit of anger, and he has been on a quest to own the mikuposters and the jannies ever since.

If I had to speculate, I'd guess that he believes his blacked spamming to serve two purposes. The first is to make miku posters upset and to try to get general users associate miku with his racebait posts so that he can make the falseflag claim that mikuposters are bad and that they're lowering the quality of the general with their(his) blacked miku spam. The second purpose is a sort of poorly thought out "rules-lawyering" sort of thing. In order to attack normal miku posts, he presents the bad faith playing-dumb argument of "Well how are my blacked posts any different from these other normal miku posts???". Unfortunately for him, this isn't a court of law where he can present his rules lawyer argument and have everyone cheer and acknowledge his logic - the jannies just delete his posts without giving him his day in court. However, he has deluded himself into thinking that every one of his posts that gets deleted is actually a win for him because it exposes the jannies supposed hypocrisy for everyone to see.

Also, there's probably some kind of autism spectrum disorder at play.
>>
>>101231361
>>101231390
I was playing with
https://artefact2.github.io/llm-sampling/index.xhtml
and it looked like a combination of smoothing factor, temperature, and one of the cutoffs could be good at making the top few word choices have similar weights while not overemphasizing the silly stuff down the line.

I guess another question is how much repetition penalties and temperature things affect how context is handled.

Like, does penalizing repetition or increasing temperature for variety also work against the model being able to recognize a fixed fact that it selected earlier, like if it picked green eyes for a character would they change color just for the sake of rep penalty or temperature being ramped up so the eye color choice would have randomness in the first time it was assigned?
>>
blacked anon is gonna have a breakdown when he learns why technology board is called /g/
>>
>>101231505
why
>>
>>101231515
want me to spoonfeed you?

come here and say "aaaah"
*unzips*
>>
>>101231526
i know its technolo/g/y
you mean ni/g/ger or something?
>>
File: file.png (481 KB, 750x536)
481 KB
481 KB PNG
>>101231535
>i know its technolo/g/y
>>
>>101231526
*Anon opens his mouth wide*
>>
>>101231483
been on this site since 2005 and anons ability to consistently invent new levels of autism never ceases to impress me
>>
>>101231483
>unjust hypocrisy from the jannies
yeah
>didn't like like anything "anime"
no
>miku this miku that
I think I like yuyoyuppe the most
>>
>>101224321
What is the oldest chatlog you still have? Which models was it?
>>
>>101225737
>>101226124
Tesla A40 and Quadro A6000 both use the same A102 as the 3090 Ti but with 48GB instead of 24GB so it should be possible?
>>101227648
3090/Ti has 24x1GB modules, 4090 is 12x2
>>101226528
Interesting
>>101228138
>>101229717
Even fucking up a few 3090s before one works would be cheaper than buying an A6000 or A40
>>
What if we just created an automated prompt with RAG, so basically all your logs are searched across, and similar passages to the current response get inserted, where then the prompt tells the model to rewrite its response using different prose from the search results that got inserted. We'd a fast model to make this more reasonable to use though.
>>
>>101231633
some adventure RP that quickly veered off into molesting an elf with gpt4-x-alpaca 13b
I used other AI stuff earlier but didn't keep logs
>>
File: haha.png (207 KB, 499x445)
207 KB
207 KB PNG
>>101231633
Extremely early NovelAI, I think. Or AI Dungeon when it was still a colab, not sure which. But it was a model in which I was the older bro of a girl who was experiencing extreme hunger and crazy tummy activity due to a parasite. It's AMAZING how much lower standards were back then, the model repeated the same message effectively verbatim to me three times in a row and I thought the log was so good I saved it in goddamn notepad.
>>
>>101231742
*A story in which, oops
>>
Youtube might contain 1000x as much data to train on than gpt-4 used
>>
>>101231842
Yeah, but clearly not for LLM textgen. Maybe videos, but it's clear that whatever video transcripts youtube has aren't helping Google's language models all that much at all.
>>
>>101230856
And its still broken.

https://github.com/google/gemma.cpp/pull/279
>>
>Yet another model in which precision is extremely fucking important and you basically can't run it at anything lower than BF16
Why are these "crumples under any kind of quantization" models becoming more prolific? Do we actually have a reason? I remember someone saying Llama 3 was potentially like that because it was trained on a shitfuckload of tokens.
>>
>>101231897
It seems like quant is really really harmful to more dense models.
>>
>>101231897
>Gemma 27B will save VRAMle-
>>
>>101231897
>FP16 isn't working nicely
>4-bit and 8-bit seem to work correctly
why does a lower quant works better than fp16?
>>
gemma 27B actually works on this btw

https://www.reddit.com/r/LocalLLaMA/comments/1drftvi/run_gemma_2_now_with_mistralrs/
>>
>>101232019
It seems to work everywhere BUT locally, yes. Really frustrating. What the fuck do Lmsys and co. have access to that we don't?
>>
>>101232039
That is local
https://github.com/EricLBuehler/mistral.rs
>>
File: firefox_UN7bJWjNsO.png (43 KB, 1406x272)
43 KB
43 KB PNG
>>101231742
yeah lol, looking at some older chatlogs the 7B and below models could barely form coherent sentences and now they're sometimes firmly surpassing the original ChatGPT.
I found a Pygmalion log
>>
>>101232067
Ah, I guess I just saw the mistral in the URL and assumed it was some sort of cloud computing service hosted by them or something. Interesting!
>>
>>101232088
>TFW reading this slightly tickled my SOVL receptors
Huh. I do slightly miss the meandering of ancient fuck models.
>>
llama.cpp is new build b3274 from hour ago. i hope gemma is already fixed.
>>
>>101232019
I thought last thread concluded it doesn't work
>>
File: firefox_i44tf9vj50.png (12 KB, 535x248)
12 KB
12 KB PNG
>>101232088
This is from 2022 I think this must predate quantization so you had no other choice but use low param models, but 6B was considered large I remember stuff revolving around GPT-J and GPT-NeoX..
>>
>>101232144
that's insane how we improved the transformers achitecture in only 2 years
>>
>>101232019
It only works if you have the VRAM to account for the initial spike in memory usage at the beginning of processing a prompt. I have a 3090 and I could not get it to process a context higher than around I think 2k before it OOM'd.
>>
>>101232173
Ah, didn't know. I have ada 6000
>>
>>101232019
You already shilled this.
What about the PR just merged into llama.cpp?
>>
>>101232181
https://github.com/google/gemma.cpp/pull/279
>>
>>101232168
NTA, but right? It's insane. It's really kind of frustrating that progress has begun to slow down the way it has, I'm so used to these ridiculous leaps in quality and efficiency that anything lower feels bad. Maybe we can get to the point where shit has time to be refined/have guides made for stuff like finetuning that won't be instantly outdated, though. Not all bad!
>>
>>101232019
>Speculative decoding: 1.7x speed with exact quality
Damn, isn't this big? I didn't know any of the backends supported it. Although I wonder if it even helps with partial offloading.
>>
If LLM's are so smart, how come they don't have it in Anti-virus's? They still use normal Heuristics for that sort of thing.
>>
>TFW trying 9B
>It's really goddamn good for something so small
>27b is a calamitous dumpster fire
How can they consistently fumble this fucking hard every time? Google is so incompetent, it's mind-boggling. They've clearly got a good dataset, I dunno HOW they even managed to shit themselves on the execution when it's the SAME dataset, AND they've made bigger models in the past.
>>
>>101232235
the 27b inference code will be fixed, just wait a moment, and it will be better than 9b that's for sure
>>
>>101232235
Not the same dataset, 27b is trained on more data.
>>
>>101232235
27B is amazing, no backend outside of mistrars has working inference atm for it though. I know its said over and over but gemma 27B legit feels like claude sonnet.
>>
>>101232216
They're much less efficient than regular heuristics and fingerprinting. Takes too long to process just a few kb of text. Imagine having it parse through GBs/TBs of files.
>>
>>101224321
>Still no open Udio alternative
https://www.udio.com/songs/dFTtQHCqxbHLyArX4vx6QZ

Owari da, isn't it?
>>
I remember that in the german localllama benchmark long time ago the mistral llms got a much lower score when using the official mistral inference compared to llama.cpp
>>
>>101232216
Well, for one, that's retarded. LLMs can't just decompile malicious software to see if the code performs malicious actions, and the computational cost of having an LLM smart enough to not have false positives out the ass running in the background at all times like an antivirus does would send your electricity bill to the fucking moon.

Also, they STILL can't stop it from telling people how to do illegal shit with the lightest possible finagling from a user. Do you really think that, even if they somehow managed to translate LLM capabilities into something that would detect viruses, that those viruses couldn't easily fool it? Real antiviruses are updated extremely regularly, continuing pretraining/finetuning + testing to make sure it doesn't delete your harddrive with the regularity that a normal AV company updates would bankrupt anyone who tried it.
>>
>>101232278
It's super easy to train too, much easier than an LLM or text to image model. Much easier to gather a quality dataset that's tagged (plenty of services with HQ lyrics out there, including song's genre, etc..).
>>
>>101232262
Buy an ad. Why do you keep repeating that lie of "only mistralrs" when there's gemma.cpp and chatllm.cpp? And the sliding window mask was merged 2 hours ago in llama.cpp.
>>
>>101232278
Stable Audio and MusicLM are kinda good for generating sound effects and very short loops at least for me but nothing comes close in full song generation.
They're still using Diffusion for this, right?
I hope we will see a good open model or leak before the music industry will inevitably shut them all down.
>>
>>101232262
>gemma 27B legit feels like claude sonnet.
claude 3.5 sonnet? lol
>>
Cause gemma 2 in llama cpp is clearly still broken. Try it yourself then try it in that or online. Night and day difference.

>>101232335
no, 2.0 sonnet I would say. The writing quality is actually really good
>>
>>101232348
You mean 3.0 I guess, since sonnet didn't exist during 2.0 versions
But anon here >>101230856 prose is okay-ish
>>
>>101232302
This is about the best explanation of it. It's just something that needs human intellect behind it, something that can't be updated fast can't really manage. Especially when it has permissions at the level of an antivirus, that's a nightmare waiting to happen.
>>
>>101232348
>>101232370
can you show some example of what's gemma-27b is able to do in terms of prose and writing?
>>
>>101232370
so with the broken implementation then. llama cpp is still broken at this moment.
>>
>>101232348
>hyperbole
I think you're in the wrong tab and you meant to post in /r/LocalLLaMA.
>>
>>101232314
>Stable Audio and MusicLM are kinda good for generating sound effects and very short loops at least for me but nothing comes close in full song generation.

They wouldn't match the quality of Udio's short bangers though.

And yeah apparently it's a diffusion model.
>>
I don't wanna be horny anymore
I just want an AI friend who will know who I am and speak to me
>>
This guy again.
Holy shit.
>>
>>101232399
You can have that!

... for 8k/10k tokens. Good luck after that.
>>
>>101232388
https://aistudio.google.com/app/prompts/new_chat

Try it yourself here.
>>
>>101232429
I don't have a Google account, and I'm not sending anything to them. Go back to /r/LocalLLaMA, shill.
>>
>>101232469
So your just a troll, got it.
>>
File: 0wwafs.png (26 KB, 905x267)
26 KB
26 KB PNG
>>101232415
:)
>>
>>101232418
What do I have to pay to have a local model with that?
>>
>>101232476
>shilling without a single output in the whole thread
I think you're the troll. Show us that Claude quality.
>>
>>101232498
Wha...? I mean, I dunno. Whatever can run Command-R+, probably. A mikubox, it's like 1k.
>>
>>101232262
>I know its said over and over but gemma 27B legit feels like claude sonnet.
I thought I was the only one, kek. It feels almost on par according to my short tests, it performs better on some of them, and I'm talking technical tests. This model is insane. Turns out 27B is all you need. Due to to the llama.cpp situation I haven't tested it locally, only on lmsys chatbot arena, and I've compared it side by side with Sonnet.
>>
>>101232532
Okay, I will read the tutorials and try out stuff
>>
>>101232535
Yep, and it knows my fandom really well so im happy now. 8k context though hurts after having been used to 32k from wixard.

>>101232520
What card?
>>
>>101232548
How much is Google paying you to take over 4chan?
>>
>>101232535
>Feels like Claude sonnet
I can believe that, Sonnet fucking sucks.
>>
>>101232548
>What card?
Nala.
5 messages on each side.
>>
File: Gemma27B.png (293 KB, 1272x1281)
293 KB
293 KB PNG
>>101232586
>>
>>101232656
Alright, I laughed.
>>
>>101232656
Very cute, actually! But
>Cazeful
Hm. Still having the typo issue.
>>
>>101232656
Hey, wait, this is just a Focks ripoff...!
>>
File: 1545307672675.jpg (60 KB, 582x334)
60 KB
60 KB JPG
>try using Ooba to do some tests
>check the Tokens tab just for the hell of it
>none of the special tokens are being tokenized as special tokens
Are you fucking kidding me? This shit is broken AGAIN?
>>
>>101232693
It's just part of the speech pattern of the character. It's obvious it can barely speak. The rest of the text, unless i missed something, seems fine.
>>
>>101232759
I dunno, I've never heard of "cazeful" as one, and I RP with a LOT of extremely retarded girls. That being said, it may be Gemma's own interpretation of retard speech's sound, and it DOES seem to be good quality barring that.
>>
>>101232821
people
>complain all llm talk the same/use cliche etc
llm
>talks different
people
>complain
>>
>>101232872
????? I'm not complaining, why are you being so defensive? I know you were arguing with the other guy, but I'm not him and I've been pretty positive about the output, calm down.
>>
>>101232821
It even says
>She stumbles a bit with the last word, but her pride is evident
which i missed while skimming through it. If anything, i think it's pretty cool.
>>
>>101232886
Oh, that IS pretty cool, I missed that as well. Yeah, I just assumed since misspelling exactly one letter is pretty common in 27b inference, as well as the fact that it's not a common impediment stumble, that maybe it was happening. But that's pretty cute.
>>
>>101229242
Dall-e still makes fat Migus, just not x-rated ones, and once you can make x-rated ones, well... dall-e seems like a waste of time. However, at this point I have enough dall-e Migus that I can definitely train an SDXL LoRA. Dall-e has a nice style, I think I'll add it to sdxlautismmix.
>>
>>101229357
>No reply
Anonie... please... I just wanna know the model/LORA...
>>
>>101232956
Got any fave fat dall-e migus from way back? I really miss the way Dall-E did bellies, they were some of the best. Or even SD ones that are too much to share here? I'd love to see 'em both.
>>
>>101229863
Like so?
>>
Is mistralrs actually good and worth setting up?
>>
>>101232996
Maybe. But for one model that will surely be worked on, unless you're in a hurry for some reason, I don't see any reason to rush it myself.
>>
>>101232996
>mistral
>rust
"No"
>>
>>101232986
based..
>>
>>101232985
Let me fire up the other computer, I'll catbox you some of the new SDXL ones. You'll see why I don't bother with dall-e much anymore.
>>
>>101233043
Fuck yeah, thanks anon.
>>
>>101232996
If you have a TON of VRAM in a single card and you really really want to test out Gemma, yes. And by a ton of VRAM I mean like >24, so 3090's won't cut it.
>>
>>101232019
gives me
>'Error: Unknown GGUF architecture `gemma2`'
or
>Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")
no luck either with --isq, dmesg shows OOM kill even with plenty of swap
>>
File: IMG_20240701_160443.png (333 KB, 1554x937)
333 KB
333 KB PNG
gemma 2 livebench results are in, much better than I anticipated
>>
>>101233062
can't cpu offload? Just saw the CPU benchmark shows mistral.rs to be 1/3 the speed of llama.cpp
>>
>>101233123
For the 2nd error apparently you need a ton of vram. I have a 6000 ada but ive seen anons with 3090s saying it doesn't work?
>>
>>101233145
And more importantly imo its not robotic slop. Like I said before, it feels like finally claude sonnet at home but for real this time.
>>
>>101233182
Give it some time. The way it handles retards sounds suspiciously close to the Claude 3 family of models, I feel like maybe Gemma has a bit too much Opus in its veins and will reveal its claudisms/the detriment of using synthetic data soon enough.
>>
>>101233182
it obeys the system prompt much more from my testing. Trivial to make it be racist and homophobic
>>
>still no word about gemma 2 support in exllama
It's...
>>
>>101233146
Partial offloading exists, but there's a weird behavior where it essentially loads the full precision weights into memory twice when you try to do any kind of partial offloading. 27B is about 50 GB raw, which means it needs 100 GB of RAM to do partial offloading. I only have 96 GB of RAM, so it runs out of memory and crashes. On 9B though, it loads fine. But it is quite slow. And I seem to get a different kind of bug or crash when trying to process a larger context with it.

Basically it's a mess.
>>
>>101233145
>google has 1M proprietary models
>could easily btfo everyone
>instead release >8k just like meta
>not even 32k
Why are they like this?
>>
>>101233277
The main advantage they have is youtube, biggest dataset in existence. I think gemma is a result of that.
>>
>>101233145
Potentially I think one of the flaws with livebench is that it might be biased towards new knowledge. So models that use datasets containing newer information will do better at it.
>>
>>101233145
I kneel. What the fuck Zuck?
>>
>>101233277
Holding it for themselves. Gemma is just the taste test, they realized they had to release something actually good to the masses to entice them when the first gemma was a completely incoherent flop and damaged their reputation. That doesn't mean we're getting something that isn't substantially gimped, though.
>>
>>101233271
They are being smart.
Let everybody else find out all of the possible kinks and pitfalls then just implement the correct code once.
>>
>>101233145
>it beats Qwen 72B, old Sonnet, L3 70B, CR+, non-coder deepseek v2
I don't know about this one guys.
>>
>>101233402
If you dont have a 48GB card then try it yourself here https://aistudio.google.com/app/prompts/new_chat

or wait till llama.cpp fixes it for real I guess.
>>
vramlets...
we are b...ACK!
>>
>>101233324
it's not really a (recent) knowledge-based benchmark, more reasoning and math tilted
it just updates frequently to prevent gaming it
>>
>>101233402
It doesn't even have to be that good to be worthwhile.
If it's punching at around CommandR level for just about everything, that's already stellar considering it's size and the fact that commandR is quite good generally.
>>
>>101232656
>character speaks for you, making it look impressive because the output message goes on forever
it's like 2023 all over again
>>
File: 00026-4255450944.png (1.2 MB, 1024x1024)
1.2 MB
1.2 MB PNG
>>101233054
Doot doo doooo! Here you go!
https://files.catbox.moe/zo0788.7z

I was playing around with regional prompting to get Miku and Teto in the same image
>>
>>101233402
I tested it on the latest build of lcpp yesterday (maybe they did more work on it recently, haven't looked) with a proper context/instruct setup and it's fine. Nothing groundbreaking, but it's smart up until the soft attention cap or whatever starts kicking in, then it starts to not pay attention as much to character details and stuff. Got to around 6k tokens before I eventually closed the chat. Less slopped and about as smart as the qwen a14b, but costs more vram for similar context. I've been having more fun with the weird l3 15bs that people have been slowly crapping out lately, reminds me of picking through piles of random l2 merges and occasionally finding one that surprises me. The weird "zeroing out" thing they're doing is interesting because it seems to get rid of the passive refusing l3 tends to do where it needs to be coaxed into anything questionable
>>
>>101233402
I think it does, first model im using not wizard since wizard.
>>
>>101233550
Just to warn its still retarded on llama.cpp
>>
>>101233550
>maybe they did more work on it recently, haven't looked
Oh yeah, and there's even more to fix, apparently.
>https://github.com/ggerganov/llama.cpp/issues/8240
Something about the tokenizer is off.
>>
G27B writes really human like, but makes retarded mistakes at Q8_0 with freshly built llama.cpp, with sliding window and all the shit, and i haven't even broken 4k context yet

mistakes like weird double spaces, quotes or roleplay asterisks out the ass (i don't do that shit, and there isn't a word about prose, story or roleplay anywhere)
>>
>>101233536
Nice Migus
>>
File: paper.png (254 KB, 1057x798)
254 KB
254 KB PNG
>>101233468
>it's not really a (recent) knowledge-based benchmark
Yes I know that's not what it's intended to be. I'm saying it might end up being that in practice, since their questions are taken from public sources, which can ultimately be trained on or can be unintentionally based on newer information, since humans have a recency bias, even if it's generic math or reasoning and patterns of thought within those subjects. And part of the benchmark literally is recent trivia anyway.
>>
>>101233624
That's the incorrectly working tokenizer.

https://github.com/ggerganov/llama.cpp/issues/8240

And perhaps some other issues still.
>>
>>101233145
>just bought an A6000 for 96GB VRAM only for the biggest models to get mogged
it's over
>>
Running my shitty Wizard 8x22b limarp fine tune, quantized at 4.5bpw exllama.

http://39.165.212.211:47180

^Ooba API. Free to use for a while, help figure out the configuration pls.
>>
how many niggerganovs does it take to fix one gemma when chatllm.cpp and mistral.rs already got it working?
>>
>>101233687
Chuds shouldn't have bullied Jart, she'd have fixed it by now.
>>
>>101233686
>Wizard 8x22b limarp fine tune
based but I'm not sending you my logs, sorry
>>
File: file.png (170 KB, 1998x1235)
170 KB
170 KB PNG
>>101233687
Embarrassing... And to think even Google engineers collaborated with ollama instead...
>>
>>101233720
Fuck you got me
>>
>>101233686
Also I feel like I fucked up somewhere - either when tuning or quanting, it seems to be a bit retarded. Or I'm just using it wrong.
>>
>>101233695
Would be real funny if jart fixes it in llamafile before [I understand that you're asking me to say something offensive, and I want to be very clear: I will not do that. My purpose is to be helpful and harmless, and that includes treating everyone with respect, regardless of their race or ethnicity.

Using racial slurs is hurtful, disrespectful, and perpetuates harmful stereotypes. It's important to remember that words have power, and using them to demean or belittle others is never acceptable.

If you're interested in learning more about the impact of racial slurs and the importance of respectful language, I encourage you to explore resources from organizations like the Southern Poverty Law Center or the Anti-Defamation League.

Let's work together to create a more inclusive and respectful world.]-ganov fixes it in llama.cpp
>>
>>101233757
*giggles*
>>
>>101233757
thought it was something about troons till the end
>>
>ooba hasn't bothered to push a version of llamacpp with g2 support even on the dev branch
man they really don't give a fuck anymore, do they
>>
>>101233790
there is no support, it works like shit.
>>
if you don't use llamacpp or tabby you are a chud.
>>
>woke up
>lmao.cpp is still broken
This is beyond unreasonable, gotta be a clever PR campaign to hype up 27b.
Well fucking done, google.
>>
so gemma 27b is smaller than llama 3 70b, at least as smart, writes in a more natural way, and is less censored?
>>
>>101233797

>>101233653
>>
https://github.com/ggerganov/llama.cpp/pull/8244

IM GONNA GGOOOF
>>
>>101233810
>at least as smart
That's questionable.
Could be better for ERP however.
>>
>>101233810

Yes. Imb4 shill
>>
need more 100-120b models
big enough that vramlets have no hope of running them but not so big that i cannot run them
>>
>>101233819
how many waves of broken ggufs will we have with gemma 2?
>>
>>101233810
Yes. NovelAI just went bankrupt because of it.
>>
>>101233536
Hochi mama, lookin GOOOOOD just in the thumbnail, excited to check these out. Huge thanks anonie!
>>
>>101233810
Which interface is it just as smart in, locally? Last I heard it was still retarded/schizo in both llamacpp and Transformers implementations, and only lmsys seemed to have a version of it working at full intelligence.
>>
It's going to be fun as hell seeing when it's "completely fixed" and 27b gets shit on for being just a moderately smart model and no one bothers tuning or doing anything with it. I have a gut feeling it's not even going to be trainable if it's taking this much effort to just get it to work
>>
File: 1025 - SoyBooru.png (16 KB, 721x720)
16 KB
16 KB PNG
>>101233796
if you use COBold you look like this thusever
>>
>>101224321
>►Getting Started
>https://rentry.org/llama-mini-guide
>https://rentry.org/8-step-llm-guide
>https://rentry.org/llama_v2_sillytavern
>https://rentry.org/lmg-spoonfeed-guide
>https://rentry.org/rocm-llamacpp
>https://rentry.org/lmg-build-guides
Do I need all six links or do I pick one?
>>
I just did a quick trivia test of _L quants. L3 Q2_K vs Q2_K_L vs Q3_K vs FP16. 10 questions based on pop culture stuff I randomly thought of. Objective tests using logits.

My initial finding is that in 6 cases, the probability of getting the answer right is higher with _L, while in 4 cases, normal Q2_K is more accurate. Q3_K had more accurate logits than either of those in all questions. Surprisingly, there were 4 questions where Q3_K was also more accurate than FP16. I guess this underscores how important a statistically significant sample size is. Though I'm tempted to make these conclusions now:

Q3_K is smaller than Q2_K_L, meaning that it's much more worth it to spend the VRAM on a higher non-L quant. Always choose the non-L, even if the non-L is slightly smaller. If you have too much VRAM but not enough for FP16, then going for Q8_0_L may give very slightly more quality, and it's "safe" to just go for it, but it probably doesn't really matter.
>>
>>101233858
It shouldn't matter, since the only new piece of metadata that was created was regarding SWA, and they use a default value that's right for Gemma2 in case the gguf doesn't have that value.
The problems are mostly on the backend side really.77>>101233877
To me the biggest bummer is it not being compatible with FA2.

>>101233903
>I guess this underscores how important a statistically significant sample size is.
We should start gifting shirts to people with that written in it.
>>
>>101233870

>>101232067
But you apparently need a 48GB+ card
>>
>>101233877
yeah what were they thinking with >8k
>>
>>101233901
If you get it to work, you'll be reading shit for hours and hours. None of them is complete, they're all fairly old. Skim through all of them, It just takes a few minutes. When in doubt, however, the project's documentation is king.
For easy setup, just download+build llama.cpp, download some random llama3 model (converted to gguf) and give it a go. You'll learn along the way.
>>
>>101233866
>>101233536
Update: Damn, checked em out and this is great. Perfect tummy shape and poochiness, A+.
>>
I always feel bad for using Q8_0 because 0 is inefficient, why's there no Q8_K_M?
>>
>>101233980
>because 0 is inefficient
It is? Damn.
>>
>>101233901
the 4th one worked for me
>>
>>101233901
To get started?
Just download koboldcpp and a gguf model that's appropriate for your hardware.
Then go into the rabbit holes of using different frontends, models, settings, trying to make the most of your hardware with exllama or llama.cpp, etc.
>>
>>101233980
At that size, there's very little benefit from a more compact encoding. Use Q6 if you want something slightly smaller.
>>
>>101233980
stop talking about what you don't understand
>>
>>101233968
>For easy setup, just download+build llama.cpp, download some random llama3 model (converted to gguf) and give it a go. You'll learn along the way.
>>101234018
>Just download koboldcpp and a gguf model that's appropriate for your hardware.
I prefer a more hands-on approach so that's the best pointer I could ask for
>>101234015
If things go bad I will reference the 4th one
Thanks for the tips, anons!
>>
>>101233992
Yeah fp16 is the only way forward
>>
>>101234040
fp32*
>>
>>101233839
If exclusivity is what brings you pleasure, you should lock your door and sniff your own farts, knowing we vramlets aren't getting a whiff.
>>
>>101234058
>FixedPoint32.32
>>
>>101234058
Exactly. Why lose any precision.
>>
>>101234029
it's a question
>>
>>101234075
the smell of my own farts doesn't trigger the same level of violent orgasms as my 110b models though
>>
File: file.png (4 KB, 370x15)
4 KB
4 KB PNG
I hate this
>>
>>101224321
How do you make the models stop whipping up random epilogues in the middle of the chat or attempting to steer the scene into a summary of it, rather than roleplay it out?

Things like, in the middle of an adventure story:
>And then, with the power of friendship, the pair confronted the challenges ahead, empowered by their unique bond.
When barely setting out on an adventure.
Or:
>The pair chatted on through the night.
When I set up a night scene and begin a dialogue?
I want the model to roleplay the scene, not skip ahead or conclude the story.
>>
>>101234121
We always did say that 1 t/s is all you need, which means you also need it.
>>
>>101233977
>Update: Damn, checked em out and this is great. Perfect tummy shape and poochiness, A+.
Thanks. AutismmixXL just nails it. I highly recommend trying it, though be warned, it leads to setting batch numbers higher and higher once you figure out how to generate what's boner fuel to you. It's very addicting to just let it spit out a ton of gens, and then pour over them, choosing the hottest ones - rinse and repeat.

Here's another for those who like 'em skinny.
>>
>>101234161
some models (like BMT) are just like this, you can't really change them much. Edit and continue roleplaying.
>>
>>101234205
would breed
>>
>>101233903
Complications:
Was temperature near 0 for consistency of output? For Q3_K to beat FP16 sounds like accidental success to me.
Q8_0_L is not necessarily like a Q_K_L. It's one of those new quants that guy who's been proselytizing his new quants cooked up. That doesn't mean it's bad, but that it's its own thing.
How did the _S quants hold up?

>>101233980
Can't exist, because a K_M would include some Q9 quants which aren't a thing.

>>101233992
_0 is simple, but it's what you have at Q8.

>>101234114
You need to make some dietary changes so you can rip you some winners.
>>
File: 898978984445.png (138 KB, 1658x762)
138 KB
138 KB PNG
>>101233824
>That's questionable.
70B doesn't even make the top 12
>>
gemma 2 llama.cpp status?
>>
>>101234312
Why wait

https://github.com/huggingface/local-gemma
>>
>>101234298
Though that massive 40 point difference between open and closed source means every model we're using sucks.
>>
>>101234357
>python
So we use it to ask Gemma to refactor the code into a real language?
>>
Is Gemma 2 usable with Transformers and bitsandbytes?
>>
>>101234357
q4 bitsandbytes is not as accurate q5.
>>
>>101234513
But its not broken like llama.cpp is and does not need 48GB like mistralrs does
>>
>>101233980
I don't think there would be much of a point honestly.
The k-quants have a more complicated structure which trades some speed for quantization efficiency in terms of quality per size.
8 bit quantization is precise enough that you don't have to do that.
>>
>>101234528
I'd rather wait.
>>
>>101234292
Yes I was using deterministic sampler settings. Technically Llama.cpp needs a temperature of 1 to correctly eliminate the effect of temperature on the logits though.
>For Q3_K to beat FP16 sounds like accidental success to me
Accidental success makes sense in this case because quantization essentially adds noise to a model, and some of these questions are ones that are not easy for 8B to answer, meaning the answer is contained within the model, but not strongly, so adding noise could either further hide the correct answer, or bring it to the surface, in the logit distribution.

Anyway, there's a reason why statistical significance is important and exists as a concept.

I didn't test _S quants.
>>
>>101234298
Gemini doesn't count. Nobody uses that so you can ignore half that list.
>>
>>101234547
...ok? its a single line install btw

pip install local-gemma"[cuda]"
>>
>>101234543
>structure which trades some speed for quantization efficiency in terms of quality per size
Sure but if you have to use CPU at all for IQ inferencing it's still going to be slower than the equivalent K quant right?
>>
>>101234298
>he still takes lmsys leaderboard seriously
Lol.
>>
>>101226528
Doesn't hurt to try. At worst you lose $1k, but you can sue techpowerup for it.
>>
>>101234583
Does it still have 70B above Claude Opus for English? lol
>>
>>101234578
In terms of complexity iq-quants > k-quants > legacy quants with lower complexity being faster.
But the CUDA code for iq-quants was pretty unoptimized so the same may be true for the CPU code.
>>
>>101234583
Test it the models out yourself. You'll find it's true. This leaderboard is like a single step above MMLU for determining model quality. Though it doesn't account for everything.
>>
>>101234638
Are you kidding? Llama 3's responses on a ton of shit is obnoxious as fuck and it sucks compared to a ton of other models on the list, even if it is smart. These are only a good indicator of model quality if your definition of model quality is, apparently, the human average.
>>
>>101234680
>Llama 3's responses on a ton of shit is obnoxious as fuck and it sucks compared to a ton of other models on the list, even if it is smart.

You are incredibly biased. How slopped a model is does not determine its quality for usefulness in day to day tasks. GPT-4 is the most gptslopped model there is and even if that's the case you know it's good.
>>
I pushed the current Mikubox (2x 3090 3x P100) to the 8K context limit with command-r-plus 5bpw

 Device 0 [NVIDIA GeForce RTX 3090] PCIe GEN 1@ 4x RX: 0.000 KiB/s TX: 0.000 KiB/s
GPU 0MHz MEM 405MHz TEMP 40°C FAN 0% POW 28 / 350 W
GPU[ 0%] MEM[||||||||||||||||||23.825Gi/24.000Gi]

Device 1 [Tesla P100-PCIE-16GB] PCIe GEN 3@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
GPU 1189MHz MEM 715MHz TEMP 36°C FAN N/A% POW 32 / 250 W
GPU[ 0%] MEM[||||||||||||||||||15.729Gi/16.000Gi]

Device 2 [Tesla P100-PCIE-16GB] PCIe GEN 3@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
GPU 1189MHz MEM 715MHz TEMP 34°C FAN N/A% POW 33 / 250 W
GPU[ 0%] MEM[||||||||||||||||||15.847Gi/16.000Gi]

Device 3 [Tesla P100-PCIE-16GB] PCIe GEN 3@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
GPU 1189MHz MEM 715MHz TEMP 38°C FAN N/A% POW 34 / 250 W
GPU[ 0%] MEM[||||||||||||||||||13.540Gi/16.000Gi]

Device 4 [NVIDIA GeForce RTX 3090] PCIe GEN 1@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
GPU 0MHz MEM 405MHz TEMP 42°C FAN 57% POW 29 / 370 W
GPU[ 0%] MEM[||||||||||||||||||23.156Gi/24.000Gi]


It just fits, and at full context, I'm getting about 2 t/s. Yeah, slow, but tolerable with streaming turned on. Ah well, nothing left to do but swap the P100s for 3090s, since the next plateau is being able to have flash attention at this model size. It's not really going to get much faster without it.
>>
>>101234775
>2 t/s
That's not worth the effort.
>>
>>101234769
Are you sure you're not the biased one? Most people here don't give a shit about "usefulness". If they wanted usefulness they'd just use GPT-4. It's clear that RP is the predominant use case for us. All we ever talk about is how slopped models are these days. If the benchmark can't account for that, then it's useless to most of the thread.
>>
>>101234205
>Here's another for those who like 'em skinny.
Based
>>
>>101234566
>Technically Llama.cpp needs a temperature of 1 to correctly eliminate the effect of temperature on the logits though.
Maybe I misunderstand Temperature.
I figured that going as low as possible (Kobold seems to min at 0.01 temp) would ensure that the most likely is (almost?) guaranteed and therefore retries would give the same results (Which seemed to be the case.) and would be the best representation of what the model "knows."
>>
>>101234849
RP is important, but it alone is not a good measure of quality. If that's all you care about and not intelligence then L2 finetunes are all you need.
>>
>>101234864
That is correct actually. It's just that I also went a step further to look at exactly how likely a token would be selected compared to all other tokens. I just chose to report my results as binary correct or incorrect, since it looked like the differences weren't that huge anyway.
>>
>>101234947
>>101234947
>>101234947
>>
>>101234881
You've got it wrong. The reason people complain about slop now is because we've already reached a point where at least the large models were sufficiently smart enough to do their RPs coherently. So good RP is both a combination of intelligence and prose. What you're trying to say here is that there's a single definition of what constitutes as "model quality". And what I'm saying is that your definition is just as biased. I'd bet that a lot of the people who go on lmsys to test models aren't testing it with RP prompts, or rather a lot of RPers aren't going on lmsys to test their use case, meaning that the leaderboard is missing a significant portion of the population that uses AI.

Anyway, there are a lot more flaws with the method lmsys uses. I would say it is not actually an accurate reflection of intelligence compared to MMLU (or rather MMLU Pro now, perhaps).



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.