/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 06/30/24(Sun)23:26:30 No.101224321

File: 114774158_p0.jpg (3.47 MB, 3343x4737)

3.47 MB JPG

/lmg/ - Local Models General Anonymous 06/30/24(Sun)23:26:30 No.101224321

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101214216 & >>101205004

►News
>(06/28) Inference support for Gemma 2 merged: https://github.com/ggerganov/llama.cpp/pull/8156
>(06/27) Meta announces LLM Compiler, based on Code Llama, for code optimization and disassembly: https://go.fb.me/tdd3dw
>(06/27) Gemma 2 released: https://hf.co/collections/google/gemma-2-release-667d6600fd5220e7b967f315
>(06/25) Cambrian-1: Collection of vision-centric multimodal LLMs: https://cambrian-mllm.github.io
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/30/24(Sun)23:27:11 No.101224328

Anonymous 06/30/24(Sun)23:27:11 No.101224328

File: __hatsune_miku_and_chibi_(...).jpg (133 KB, 600x600)

133 KB JPG

►Recent Highlights from the Previous Thread: >>101214216

--Power Management for 4090 GPUs in /lmg/: >>101216999 >>101217102 >>101217590
--Llama3-405b Development and Release Speculation: >>101217383 >>101217415 >>101217435
--Fixing Slop in NSFW Roleplay with Control Vectors and Diverse Datasets: >>101214325 >>101215842
--The Future of LLMs: Stagnation or Untapped Potential?: >>101214713 >>101215882 >>101216246 >>101216453 >>101216357 >>101217018 >>101217411
--Sonnet 3.5 Makes Local Models Obsolete, But at What Cost to Startups?: >>101215426 >>101215449 >>101215461 >>101215475
--P100s vs P40s for Mikubox and Cheap VRAM: >>101218031 >>101218185 >>101218458 >>101218557 >>101218677 >>101218774 >>101219115
--glm3 and glm4 Support Coming to LLaMA.cpp: >>101220273 >>101220356
--Gemma 2 9B SPPO Iter3 Breaks into AlpacaEval Leaderboard: >>101217131 >>101217369 >>101217564 >>101217614 >>101217663 >>101217739 >>101217602 >>101217856 >>101218161 >>101218174 >>101218550 >>101218599 >>101219456
--Gemma 2 27B and Alternative Models for 16 GB VRAM: >>101215609 >>101215738 >>101216012 >>101216621 >>101216799 >>101216080 >>101220457 >>101220548 >>101220597 >>101220668 >>101220735 >>101220983
--Fixing Slow Generation in Kobold Group Chats with ST: >>101222623 >>101222648 >>101222653 >>101222680 >>101222857 >>101222899 >>101223220 >>101223288
--Anon Rants about Broken Model Configs and Token Issues: >>101217361 >>101217425 >>101217688 >>101217739 >>101217559 >>101217576v>>101223820 >>101224123
--Mistral.rs and 27B Models Hit Memory Wall with Larger Contexts: >>101216383 >>101216870 >>101216903
--Anons React to Cohere CEO's Human Centipede Effect Statement: >>101224025 >>101224150
--Best Japanese Model Recommendation Needed: >>101214617 >>101218336 >>101218652 >>101218831 >>101219626 >>101220598 >>101221130
--Gemma2 Context and Instruct Templates Explained: >>101218277 >>101218394
--Miku (free space): >>101223447

►Recent Highlight Posts from the Previous Thread: >>101214230

Anonymous
06/30/24(Sun)23:32:01 No.101224362

Anonymous 06/30/24(Sun)23:32:01 No.101224362

let's just keep shitting out slopped assistant models with synthetic data instead of training on terabytes of human chatlogs for a more natural and human AI

Anonymous
06/30/24(Sun)23:33:40 No.101224375

Anonymous 06/30/24(Sun)23:33:40 No.101224375

let's just keep shitting out slopped assistant models with synthetic data instead of training on terabytes of human posts from 4chan

Anonymous
06/30/24(Sun)23:44:42 No.101224436

Anonymous 06/30/24(Sun)23:44:42 No.101224436

File: H739a001c84e54348b4051b46(...).jpg (94 KB, 720x642)

94 KB JPG

>buy used 3090 Ti
>download 2GB vram modules from Uncle Chang and swap for 48GB total
>Mog Tesla A40 with higher clock speeds and overclocking at 1/5 the price
Give me one good reason not to do this

Anonymous
06/30/24(Sun)23:47:33 No.101224457

Anonymous 06/30/24(Sun)23:47:33 No.101224457

>>101216357
That's not going to happen. As you scale up the number of parameters, the distinguishability of the training data increases. The models trend towards learning patterns as linear classifiers and then distributing the embeddings so that those classifiers work, which is why you can see a big change in model output with something as simple as a LoRA, since it mostly just moves the embeddings around to get a satisfying output from all of the internal trained classifiers. With higher parameter count comes a larger embedding space, meaning better classifiers + more classifiers. That, coupled with the fact that a transformer with an infinite context window is effectively a Turing machine, means that it won't be long before learned models replace things we traditionally coded by hand. The fact that transformer language models aren't already capable of simulating calculation in context means they haven't quite cracked the code as far as pre-training goes, but it is 100% possible for a transformer to learn a calculator, and once they solve that, no algorithm is off limits.

Anonymous
06/30/24(Sun)23:47:40 No.101224458

Anonymous 06/30/24(Sun)23:47:40 No.101224458

>>101224321
What uncensored version of Gemma 9b can I download on lmstudio?

Anonymous
06/30/24(Sun)23:48:10 No.101224460

Anonymous 06/30/24(Sun)23:48:10 No.101224460

>>101224436
None! Go ahead and pioneer so all can learn from your experience.

Anonymous
06/30/24(Sun)23:56:11 No.101224511

Anonymous 06/30/24(Sun)23:56:11 No.101224511

>>101224375
>terabytes of human posts from 4chan
There's maybe a few hundred megabytes at best

Anonymous
07/01/24(Mon)00:07:23 No.101224585

Anonymous 07/01/24(Mon)00:07:23 No.101224585

>>101224460
Will big gpu come after me for making the A40 obsolete overnight?

Anonymous
07/01/24(Mon)00:10:13 No.101224618

Anonymous 07/01/24(Mon)00:10:13 No.101224618

Daily reminder that (You), the person reading this, do not need more than 8192 context
>But I—
No, you don't
8192 is all you need

Anonymous
07/01/24(Mon)00:11:13 No.101224630

Anonymous 07/01/24(Mon)00:11:13 No.101224630

>>101224618
What about when the RP partner suddenly starts forgetting everything?

>had a great one going
>then she forgot the macguffin that was the whole foundation of the interaction
>everywhere at the end of time begins playing

Anonymous
07/01/24(Mon)00:17:03 No.101224678

Anonymous 07/01/24(Mon)00:17:03 No.101224678

>>101224458
I can't get him to do a rape story.

Anonymous
07/01/24(Mon)00:26:13 No.101224756

Anonymous 07/01/24(Mon)00:26:13 No.101224756

File: 2765567894322.jpg (41 KB, 480x360)

41 KB JPG

So has anything gotten as good as mixtral limarp zloss or should i just go back to cooming?

I have seen no proof of gemma being good.

Anonymous
07/01/24(Mon)00:31:19 No.101224799

Anonymous 07/01/24(Mon)00:31:19 No.101224799

>>101224618
This and 1 t/s.

Anonymous
07/01/24(Mon)00:35:05 No.101224828

Anonymous 07/01/24(Mon)00:35:05 No.101224828

File: 1701383403555463.jpg (261 KB, 928x1232)

261 KB JPG

>>101224321

Anonymous
07/01/24(Mon)00:39:32 No.101224855

Anonymous 07/01/24(Mon)00:39:32 No.101224855

Potential idea to eliminate slop that would take work but not an unreasonable amount if someone is autistic enough: teach future models a world model about slop and notslop so that it can simply just be prompted away, by us writing entire articles about the concept, and posting them to github, thus infecting future pretraining datasets.

Anonymous
07/01/24(Mon)00:58:48 No.101224976

Anonymous 07/01/24(Mon)00:58:48 No.101224976

https://x.com/BasedBeffJezos/status/1807550891781927332

Is Command R+ really that different from OAI and the infinite goyslops local models trained on OAI datasets?

Anonymous
07/01/24(Mon)01:05:54 No.101225020

Anonymous 07/01/24(Mon)01:05:54 No.101225020

>>101224976
>https://x.com/elonmusk/status/1807637096129241529
>Grok 2, which comes out in August
Oh, nice, can't wait to not run it locally.

Anonymous
07/01/24(Mon)01:05:55 No.101225021

Anonymous 07/01/24(Mon)01:05:55 No.101225021

>2024.5
>still no Nemotron gguf

Anonymous
07/01/24(Mon)01:07:59 No.101225035

Anonymous 07/01/24(Mon)01:07:59 No.101225035

>>101225021
It's pretty good for coom, been using it on OpenRouter

Anonymous
07/01/24(Mon)01:09:15 No.101225041

Anonymous 07/01/24(Mon)01:09:15 No.101225041

>>101225020
Doubt its gonna be released. They only released Grok 1.0 when they had their 1.5 ready. So when 2.0 hits, they'll prob release 1.5.

Anonymous
07/01/24(Mon)01:59:38 No.101225310

Anonymous 07/01/24(Mon)01:59:38 No.101225310

>>101225041
They never promised they would be releasing models from now on, please let's stop with the hopium.

Anonymous
07/01/24(Mon)02:27:11 No.101225460

Anonymous 07/01/24(Mon)02:27:11 No.101225460

Recommend me an 8b or up to 13b Q5 at most model for RP.

Anonymous
07/01/24(Mon)02:29:30 No.101225473

Anonymous 07/01/24(Mon)02:29:30 No.101225473

Is Gemma 27b IQ3_M gguf quant better than gemma 9b 8 bit quant?

Anonymous
07/01/24(Mon)02:44:13 No.101225549

Anonymous 07/01/24(Mon)02:44:13 No.101225549

>>101225473
yeah but better shit is still shit

Anonymous
07/01/24(Mon)02:58:02 No.101225649

Anonymous 07/01/24(Mon)02:58:02 No.101225649

I am current using llama3 8b on my old pc and I am blown away

My old i7 6700 with 16gb ram using llama3 8b cpu only is fast enough for my purposes. From what I understand llama3 70b at various quantization can run on 64gb ram but llama3 70b fp16 the highest end model needs around 160gb ram at least so a system with 256gb of ram is ideal

Is there a significant enough improvement in LLM output going from llama3 8b q4 to llama3 70b fp16 to bother?

Anonymous
07/01/24(Mon)03:01:56 No.101225676

Anonymous 07/01/24(Mon)03:01:56 No.101225676

>>101224618
My ongoing RPs get up to like 5k in summary information alone. 16k is the bare minimum to be able to have a long session and then summarize. 20k is better.

Anonymous
07/01/24(Mon)03:07:51 No.101225720

Anonymous 07/01/24(Mon)03:07:51 No.101225720

>>101225649
llama3 8b q4 -> llama3 70b q4: Yes
llama3 70b q4 -> llama3 70b q8: No
llama3 70b q4 -> llama3 70b fp16: No

Anonymous
07/01/24(Mon)03:09:42 No.101225731

Anonymous 07/01/24(Mon)03:09:42 No.101225731

So there's still no way to make Gemma 2 27B run locally with the intelligence it has on lmsys, even with Transformers? Are Google intentionally withholding some inference trick needed to make it work properly? Why?

Anonymous
07/01/24(Mon)03:10:20 No.101225737

Anonymous 07/01/24(Mon)03:10:20 No.101225737

>>101224436
One possible reason could be that NVIDIA has provisions in place to make it not work if you replace the memory module with any other, but other than it's a great idea. Thanks, Jackie.

Anonymous
07/01/24(Mon)03:14:01 No.101225762

Anonymous 07/01/24(Mon)03:14:01 No.101225762

>>101225720
Thanks for the info

From what I understand llama3 70b q4 will work on a system with 64gb ram. It could run on this old system even if it is very slow. Speed/performance is not an issue. If the improvement from 70b q4 to 70b f16 is not that significant it might not be worth bothering to upgrade

Anonymous
07/01/24(Mon)03:47:10 No.101225977

Anonymous 07/01/24(Mon)03:47:10 No.101225977

Has anybody tried asking core4 solutions, who sell all the parts for the v100maxxing system individually and disassembled, if they have some full units lying around one can buy?

I'm don't think I have the skill to put together the System with the V100s myself.

Anonymous
07/01/24(Mon)03:49:21 No.101225991

Anonymous 07/01/24(Mon)03:49:21 No.101225991

is gemma 27b just a lobotomized gemini flash

Anonymous
07/01/24(Mon)04:04:35 No.101226082

Anonymous 07/01/24(Mon)04:04:35 No.101226082

File: 7634575634.png (764 KB, 617x780)

764 KB PNG

>having repetition problems
>AVOID REPETITION in prompt in various ways isnt working
>use rep pen
>1.08, 2048, 0.06
>stopped having repetition problems
>characters respond even better
>better gens in general

I was told rep pen was cope this is bullshit

Anonymous
07/01/24(Mon)04:07:33 No.101226097

Anonymous 07/01/24(Mon)04:07:33 No.101226097

>>101226082
Always have just little bit of rep pen in there.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/01/24(Mon)04:12:26 No.101226124

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/01/24(Mon)04:12:26 No.101226124

>>101224436
Without an appropriate change to the VGA BIOS the increased memory capacity will not be usable.
So typically you can only do these memory mods if there is another GPU skew with the same chip but more memory.

Anonymous
07/01/24(Mon)04:13:11 No.101226132

Anonymous 07/01/24(Mon)04:13:11 No.101226132

>>101226082
I don't think anyone says rep pen doesn't work, it obviously does work to prevent repetition.
What they say is that it makes the model dumber in situations where it wasn't going to repeat, by subtly fucking up the probabilities of every token.

Anonymous
07/01/24(Mon)04:25:31 No.101226205

Anonymous 07/01/24(Mon)04:25:31 No.101226205

>>101224618
Nah, 8192 is too much, goyims only deserve 1024 context

Anonymous
07/01/24(Mon)04:39:52 No.101226276

Anonymous 07/01/24(Mon)04:39:52 No.101226276

What do we think about DolphinVision 72b?

Anonymous
07/01/24(Mon)04:46:48 No.101226313

Anonymous 07/01/24(Mon)04:46:48 No.101226313

>>101226205
goy only need enough context to spell 'EMET'

Anonymous
07/01/24(Mon)05:13:50 No.101226450

Anonymous 07/01/24(Mon)05:13:50 No.101226450

>nonetheless
>nevertheless
>despite...

Anonymous
07/01/24(Mon)05:14:51 No.101226457

Anonymous 07/01/24(Mon)05:14:51 No.101226457

>>101225460
you will never be satisfied with a 13b for rp

Anonymous
07/01/24(Mon)05:19:33 No.101226481

Anonymous 07/01/24(Mon)05:19:33 No.101226481

>>101226082
use DRY sampler anon, it's even better

Anonymous
07/01/24(Mon)05:28:00 No.101226527

Anonymous 07/01/24(Mon)05:28:00 No.101226527

>>101226082
The main problem is that repetition penalty is a bruteforce method that doesn't take into account token logits, so you're effectively making the model dumber.

It would be interesting if somebody implemented a "half-typical-p" where it only dynamically cuts the the token distribution from the head of the token distribution instead of both the head and tail. The idea is that you'd use it together with min-p. Usually typical-p doesn't work too well because to make it truly have an effect you have to use it at around 0.2-0.5 or less, which removes too many tokens from the tail, reducing token diversity.

Anonymous
07/01/24(Mon)05:28:07 No.101226528

Anonymous 07/01/24(Mon)05:28:07 No.101226528

>>101226124
https://www.techpowerup.com/vgabios/267498/267498 would this work or is it fake?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/01/24(Mon)05:31:25 No.101226552

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/01/24(Mon)05:31:25 No.101226552

>>101226528
Don't know.

Anonymous
07/01/24(Mon)05:47:58 No.101226645

Anonymous 07/01/24(Mon)05:47:58 No.101226645

>>101225473
ALOT better.
I dont use local for coding so I cant say about smart, but RP is much better on 27b even though its supposedly still broken vs. ai studio.
9b is full on gpt slop. I dont know what they did that caused this.
27b too but much less so. I like it so far, it follows instructions very well.
Big upgrade for ramlets like me who used stheno 9b.

If you can offload like 80% to gpu. I take waiting 1-2 min for a long reply and less retarded.
I think in silly you can even set 2-3 swipes at once and get a "ding" sound once its done. But thats up to you.

Anonymous
07/01/24(Mon)05:48:07 No.101226646

Anonymous 07/01/24(Mon)05:48:07 No.101226646

Hey anons where do you use XTTS?

Anonymous
07/01/24(Mon)06:13:10 No.101226792

Anonymous 07/01/24(Mon)06:13:10 No.101226792

>switch to l3 from mistral
>it actually understands my dumb fetishes and does dirty talk very well
I wonder what kind of filth zuckerberg trained this shit on

Anonymous
07/01/24(Mon)06:14:51 No.101226801

Anonymous 07/01/24(Mon)06:14:51 No.101226801

Gemma2 is still fucked in llama.cpp.
Its not a memory error but after X turns it just crashes and I need to restart the server.
Not even full 4k context.

Anonymous
07/01/24(Mon)06:15:49 No.101226807

Anonymous 07/01/24(Mon)06:15:49 No.101226807

>>101226450
at a certain point it's just english that you're seething about lol

Anonymous
07/01/24(Mon)06:31:40 No.101226893

Anonymous 07/01/24(Mon)06:31:40 No.101226893

>>101226792
l3 isnt that good so damn this shill post sucks

Anonymous
07/01/24(Mon)06:40:19 No.101226942

Anonymous 07/01/24(Mon)06:40:19 No.101226942

>>101226807
I propose running additional prompt on latest output with a list featuring rare synonyms for words from the output and a prompt like "Consider using these synonyms to enhance your response". By incorporating synonyms from the vocabulary in a random order, we may not only improve the overall diversity of language but also address potential issues of repetition.

Anonymous
07/01/24(Mon)06:41:54 No.101226957

Anonymous 07/01/24(Mon)06:41:54 No.101226957

>>101226807
Well, English is the slop language by design.
If you start talking to a LLM in Spanish, all problems with shivers and journeys suddenly disappear.

Anonymous
07/01/24(Mon)06:42:56 No.101226974

Anonymous 07/01/24(Mon)06:42:56 No.101226974

>>101226807
it's just too repetitive, those model only learned one way to speak, and it's the gpt4 slop language, when will we have a giant dataset only made with actual human writing in it?

Anonymous
07/01/24(Mon)06:44:35 No.101226988

Anonymous 07/01/24(Mon)06:44:35 No.101226988

>>101226957
>Well, English is the slop language by design.
I really disagree with that, if you look at the best authors (Shakespeare, Orwell, Dickens, Bronte, Tolkien...), you can see how english can be turned into something beautiful, what we're having is just corporate slop language because those models have been trained with gpt4 slop and nothing else

Anonymous
07/01/24(Mon)06:48:07 No.101227020

Anonymous 07/01/24(Mon)06:48:07 No.101227020

>>101226957
You post this every day at roughly the same time

Anonymous
07/01/24(Mon)06:52:18 No.101227056

Anonymous 07/01/24(Mon)06:52:18 No.101227056

>>101226792
15T tokens anon. That's a lot of fetishes.

Anonymous
07/01/24(Mon)06:54:20 No.101227078

Anonymous 07/01/24(Mon)06:54:20 No.101227078

>>101226527
In alternative you could use typical-p with a higher temperature.

Anonymous
07/01/24(Mon)06:54:23 No.101227079

Anonymous 07/01/24(Mon)06:54:23 No.101227079

>>101227056
They were scrapped though

Anonymous
07/01/24(Mon)06:54:29 No.101227080

Anonymous 07/01/24(Mon)06:54:29 No.101227080

>>101227056
he said that mistral got the fetishes and not L3 (the one with 15T tokens)

Anonymous
07/01/24(Mon)06:58:49 No.101227118

Anonymous 07/01/24(Mon)06:58:49 No.101227118

>>101227080
>TO l3 FROM mistral

Anonymous
07/01/24(Mon)06:58:54 No.101227119

Anonymous 07/01/24(Mon)06:58:54 No.101227119

>>101226988
Orwell heavily hinted at degradation of English in his 1984, kek.
Didn't read the rest, but I heard from people that Tolkien is unreadable.
We need to understand where gpt4 slop came from. You can say open AI models were censored this way at the beginning, but I disagree.
I think GPT captured English literature perfectly.

Anonymous
07/01/24(Mon)07:00:27 No.101227130

Anonymous 07/01/24(Mon)07:00:27 No.101227130

>>101227079
Yes. There's a lot of fetish shit in the open internet.

>>101227080
Read again friend.

Anonymous
07/01/24(Mon)07:00:48 No.101227134

Anonymous 07/01/24(Mon)07:00:48 No.101227134

>>101227080
How many Bs made this post?

Anonymous
07/01/24(Mon)07:01:15 No.101227138

Anonymous 07/01/24(Mon)07:01:15 No.101227138

>>101227118
>>101227130
>>101227134
>How many Bs made this post?
oh yeah my B ^^'

Anonymous
07/01/24(Mon)07:02:16 No.101227146

Anonymous 07/01/24(Mon)07:02:16 No.101227146

>>101227119
>We need to understand where gpt4 slop came from.
from the fact that every finetuning are only using gpt4 slop dataset????

Anonymous
07/01/24(Mon)07:03:16 No.101227155

Anonymous 07/01/24(Mon)07:03:16 No.101227155

>>101227119
>Orwell heavily hinted at degradation of English in his 1984, kek.
He pointed at the usage of English by govts (and corpos now) as a tool. English as a language is fine. I like it more than Spanish.
t.argie

Anonymous
07/01/24(Mon)07:25:23 No.101227274

Anonymous 07/01/24(Mon)07:25:23 No.101227274

>>101226481
is dry sampler on HF loader only? i need to unbag ooba again?

Anonymous
07/01/24(Mon)07:26:08 No.101227283

Anonymous 07/01/24(Mon)07:26:08 No.101227283

>>101224976
CR+ isn't as bad as something like Qwen at least. It's still sadly a lot more prone to gptisms than base CR.

Anonymous
07/01/24(Mon)07:26:18 No.101227285

Anonymous 07/01/24(Mon)07:26:18 No.101227285

>>101227274
looks like it

Anonymous
07/01/24(Mon)07:28:52 No.101227306

Anonymous 07/01/24(Mon)07:28:52 No.101227306

>>101227283
Base CR is still king

Anonymous
07/01/24(Mon)07:36:11 No.101227367

Anonymous 07/01/24(Mon)07:36:11 No.101227367

>>101227119
>We need to understand where gpt4 slop came from.
we know. it's from female written fiction. fan fiction/romance fiction/genre (that's really just more romance). they overuse phrases and words to a comical degree and fellow writers will copy others so it all just mixes together. since female written content is the majority of the book market for a long while (at least in number of books) it's really just a flood of these few phrases getting slammed into these models

Anonymous
07/01/24(Mon)07:43:43 No.101227435

Anonymous 07/01/24(Mon)07:43:43 No.101227435

I've been enjoying suno lately. Any progress being made in FOSS music models?

Anonymous
07/01/24(Mon)07:43:49 No.101227436

Anonymous 07/01/24(Mon)07:43:49 No.101227436

>>101227306
base CR is more retarded than mixtral, runs slower than mixtral, and takes so much ram/vram for context you could fit another mixtral quant in there.

Anonymous
07/01/24(Mon)07:48:19 No.101227465

Anonymous 07/01/24(Mon)07:48:19 No.101227465

File: file.png (74 KB, 2308x344)

74 KB PNG

COME ON NIGGERGANOV

I WANT TO COOM TO GEMMA 2 NOW AAAA

Anonymous
07/01/24(Mon)07:51:03 No.101227481

Anonymous 07/01/24(Mon)07:51:03 No.101227481

>>101225676
3k preamble + summary
3k history of messages
2k for RAG

you don't need more

Anonymous
07/01/24(Mon)07:51:53 No.101227487

Anonymous 07/01/24(Mon)07:51:53 No.101227487

>>101227481
TRVKE

Anonymous
07/01/24(Mon)07:53:07 No.101227500

Anonymous 07/01/24(Mon)07:53:07 No.101227500

it's not even that you "don't need more" - you are actively hurting your output quality with bloated contexts.

Anonymous
07/01/24(Mon)07:53:49 No.101227506

Anonymous 07/01/24(Mon)07:53:49 No.101227506

>>101227465
it's ova

Anonymous
07/01/24(Mon)08:08:37 No.101227626

Anonymous 07/01/24(Mon)08:08:37 No.101227626

>>101227465
what about exllama2? it's not working there aswell?

Anonymous
07/01/24(Mon)08:11:21 No.101227648

Anonymous 07/01/24(Mon)08:11:21 No.101227648

>>101224436
i thought 3090ti has 12 memory chips, not 24

Anonymous
07/01/24(Mon)08:12:41 No.101227660

Anonymous 07/01/24(Mon)08:12:41 No.101227660

>>101227626
>exllama2
no open pr, no issues asking for it, only a 'discussion' saying there's no rush
https://github.com/turboderp/exllamav2/discussions/528
https://github.com/turboderp/exllamav2/issues?q=gemma

Anonymous
07/01/24(Mon)08:14:57 No.101227673

Anonymous 07/01/24(Mon)08:14:57 No.101227673

>>101227465
AIIIEEEEEE

Anonymous
07/01/24(Mon)08:16:15 No.101227688

Anonymous 07/01/24(Mon)08:16:15 No.101227688

>>101227660
>a lightweight model that all vramlets have been waiting for that beats even 70bs releases
>"...there is no rush guys :D"

Anonymous
07/01/24(Mon)08:16:57 No.101227691

Anonymous 07/01/24(Mon)08:16:57 No.101227691

>>101227688
Ikr... are they living in the same universe as us or something? For once google hasn't sucked ass, we should profit from that

Anonymous
07/01/24(Mon)08:18:27 No.101227704

Anonymous 07/01/24(Mon)08:18:27 No.101227704

How big a single-purpose LM has to be to handle simple tasks such as extracting data from text with little or no processing, or having basic Siri capabilities (alarm clock, ToDo lists, etc)? Sub-100M would handle this well enough, no? BERT done such stuff pretty well.

Anonymous
07/01/24(Mon)08:18:58 No.101227708

Anonymous 07/01/24(Mon)08:18:58 No.101227708

>>101227691
two more weeks

Anonymous
07/01/24(Mon)08:26:49 No.101227773

Anonymous 07/01/24(Mon)08:26:49 No.101227773

>>101224362
> terabytes
It's obvious that you've never trained a model in your life.

Anonymous
07/01/24(Mon)08:26:55 No.101227774

Anonymous 07/01/24(Mon)08:26:55 No.101227774

>>101227704
siri stuff is just function calling
it depends on your input

if you say "create new alarm clock: 8:00 AM", then even 3b can do it

if you say "ayoo holup wake me up when september ends i mean tomorrow like when i get enough sleep you know like eight hours from the time that is it now or something" then you need 70b at least.

extracting data depends again on what you need, if it's literally extracting lines of text as is, an 8b is enough. For summarisation of any kind, the bigger the better

Anonymous
07/01/24(Mon)08:35:16 No.101227835

Anonymous 07/01/24(Mon)08:35:16 No.101227835

>>101227465
just wait two weeks

ego anon
07/01/24(Mon)09:00:57 No.101228065

ego anon 07/01/24(Mon)09:00:57 No.101228065

when will 24gb vram cost 600

Anonymous
07/01/24(Mon)09:02:16 No.101228073

Anonymous 07/01/24(Mon)09:02:16 No.101228073

>>101228065
In 5ish years.

Anonymous
07/01/24(Mon)09:02:50 No.101228077

Anonymous 07/01/24(Mon)09:02:50 No.101228077

>>101227835
i give them ten working days, any more and it will be jartine/gemma-2-27b-it-llamafile time.

Anonymous
07/01/24(Mon)09:05:45 No.101228100

Anonymous 07/01/24(Mon)09:05:45 No.101228100

>>101228065
P40s are $150

Anonymous
07/01/24(Mon)09:05:56 No.101228105

Anonymous 07/01/24(Mon)09:05:56 No.101228105

File: file.png (468 KB, 2014x1684)

468 KB PNG

>>101228065
now

Anonymous
07/01/24(Mon)09:10:31 No.101228138

Anonymous 07/01/24(Mon)09:10:31 No.101228138

>>101224436
>Give me one good reason not to do this
You will fuck it up, guaranteed, and that's if the board actually supports an additional address line, which it probably doesn't. Only reason it works on, say, a 2080ti is nvidia used the same layout for the RTX Titan, so the extra address line for 2GB RAM chips is present.
I've done loads of surface mount work, and I know better than to fuck around with BGA stuff, since I do not have a preheating rig, and the RAM isn't exactly cheap.

Anonymous
07/01/24(Mon)09:15:26 No.101228187

Anonymous 07/01/24(Mon)09:15:26 No.101228187

File: score_9__score_8_up__scor(...).png (1.57 MB, 1024x1024)

1.57 MB PNG

>>101228105
Don't believe the hype. M is fine for small stuff, but it drops below P40 speed on larger models, and that's even if you buy the most expensive CPU you can. Apple M is about 3070-tier at best.

Anonymous
07/01/24(Mon)09:31:54 No.101228338

Anonymous 07/01/24(Mon)09:31:54 No.101228338

>>101228187
i already have m3 max with 64gb, so i know the pain... i get 2t/s with CR+ IQ3, and someone offloading it to RAM gets 1t/s - it's same slow shit in the end, except they can add more RAM easily and theoretically run even 400b.

Anonymous
07/01/24(Mon)09:41:25 No.101228431

Anonymous 07/01/24(Mon)09:41:25 No.101228431

gemma 2 27b Q8
>{{user}}: let's throw a 1d6 dice, if it's 4 or higher, you win and can eat me
>1d6 throw result: 6
>{{char}}: haha it's your lucky day today, you win this time!
oh well

Anonymous
07/01/24(Mon)09:45:32 No.101228467

Anonymous 07/01/24(Mon)09:45:32 No.101228467

>>101228431
>a dice
ignore my grammer, sirs

Anonymous
07/01/24(Mon)10:05:06 No.101228659

Anonymous 07/01/24(Mon)10:05:06 No.101228659

>>101228467
die

Anonymous
07/01/24(Mon)10:19:32 No.101228786

Anonymous 07/01/24(Mon)10:19:32 No.101228786

File: MikuFulfilmentCentre.png (1.41 MB, 1216x832)

1.41 MB PNG

Good morning lmg!

Anonymous
07/01/24(Mon)10:24:21 No.101228829

Anonymous 07/01/24(Mon)10:24:21 No.101228829

>>101228786
Good morning Miku

Anonymous
07/01/24(Mon)10:27:02 No.101228860

Anonymous 07/01/24(Mon)10:27:02 No.101228860

>>101228431
You did win though. You wanted to be eaten didn't you?

Anonymous
07/01/24(Mon)10:35:07 No.101228927

Anonymous 07/01/24(Mon)10:35:07 No.101228927

>>101228860
{{user}} says "if it's 4 or higher YOU win"
it rolls 6
{{char}} says "you win"

but thanks to your post, i'm no longer suprised.

Anonymous
07/01/24(Mon)10:36:36 No.101228940

Anonymous 07/01/24(Mon)10:36:36 No.101228940

>>101228786
gm sir

Anonymous
07/01/24(Mon)10:47:14 No.101229039

Anonymous 07/01/24(Mon)10:47:14 No.101229039

is Gemma-2-9B-It-SPPO-Iter3 really better than llama3 70b? are vramlets back?

Anonymous
07/01/24(Mon)10:47:25 No.101229043

Anonymous 07/01/24(Mon)10:47:25 No.101229043

File: ComfyUI_temp_gmkgq_00009_.png (1.3 MB, 832x1216)

1.3 MB PNG

Miqu still the best for 48GB vramlet?

Anonymous
07/01/24(Mon)10:50:26 No.101229066

Anonymous 07/01/24(Mon)10:50:26 No.101229066

>>101226481
>DRY
Do you have settings for dry? What do you have for sequence breakers? I think my settings might be wrong because I end up with 'more' repetition after it's turned on.

Anonymous
07/01/24(Mon)10:51:15 No.101229080

Anonymous 07/01/24(Mon)10:51:15 No.101229080

>>101227704
my conservative bet would be at around 300M

Anonymous
07/01/24(Mon)10:52:30 No.101229091

Anonymous 07/01/24(Mon)10:52:30 No.101229091

>>101223257
Sorry I missed the last thread, I was busy having fun with my 700 message group chat that I NEVER have to process the prompt for more than once
How might you achieve such enlightenment, you ask? Well, it's simple, anons
Uncheck "force names in group chat." In fact, uncheck everything that relates to adding names to chat. If you want names, use {{name}}: somewhere in the user/assistant prefixes (some models do better with this than others). Also uncheck "add example dialogue" stuff. You don't need it, anon.
You need to eviscerate ANY references to character description, scenario, personality summary, and I even ditch {{user}} persona in your story string. You don't want ANY of that in there (persona will cause prompt to reprocess if you go ~3-4 messages without a {{user}} response). Instead, you're gonna add ALL of it to world info - so the only things in your story string are WI before/after, and I leave scenario in there but only because the CR+ template for some reason won't work without it. I go the extra mile of "blanking out" the cards I'm using, so they're literally empty, just a picture with the name attached. When you add the character descriptions and such to WI entries, remove all references to {{user}} and {{char}} and instead use their actual names as referenced in the chat.
You also want to combine cards (include muted). If you did it correctly, you'll have a unique lorebook for each group chat containing 100% of the info you need for each character. I usually have them always on and inserted at the very beginning. For characters that only appear once in awhile, turn them on and off as needed with depth of 4-6. You can also do this with example dialogue - pop the WI entry on when the character is speaking and preface the actual entry with "EXAMPLE DIALOGUE FOR [character name]". After doing this I now exclusively use group chats and never reprocess more than once per session. This is a public service announcement by anon

Anonymous
07/01/24(Mon)10:53:25 No.101229103

Anonymous 07/01/24(Mon)10:53:25 No.101229103

File: 00703-2979877490.png (939 KB, 1040x720)

939 KB PNG

>>101228786

Anonymous
07/01/24(Mon)11:06:28 No.101229242

Anonymous 07/01/24(Mon)11:06:28 No.101229242

>>101228187
>Based fat migu guy has had to resort to local diffusion
So sad... Dall-E, you fucking suck...

Anonymous
07/01/24(Mon)11:07:48 No.101229256

Anonymous 07/01/24(Mon)11:07:48 No.101229256

>>101229039
No one said that.

Anonymous
07/01/24(Mon)11:07:56 No.101229258

Anonymous 07/01/24(Mon)11:07:56 No.101229258

>>101229242
>based
>hag
>fat

Anonymous
07/01/24(Mon)11:12:24 No.101229303

Anonymous 07/01/24(Mon)11:12:24 No.101229303

File: file.png (19 KB, 738x244)

19 KB PNG

>>101229256

Anonymous
07/01/24(Mon)11:13:28 No.101229317

Anonymous 07/01/24(Mon)11:13:28 No.101229317

>>101229303
go back

Anonymous
07/01/24(Mon)11:13:40 No.101229321

Anonymous 07/01/24(Mon)11:13:40 No.101229321

File: file.png (21 KB, 517x363)

21 KB PNG

nvm it's shit, it doesn't know what a pineapple is

Anonymous
07/01/24(Mon)11:14:07 No.101229323

Anonymous 07/01/24(Mon)11:14:07 No.101229323

Why hasn't anyone made an issue for Gemma 2 in the exllama repo?

Anonymous
07/01/24(Mon)11:14:56 No.101229341

Anonymous 07/01/24(Mon)11:14:56 No.101229341

File: file.png (15 KB, 513x340)

15 KB PNG

>>101229321
yup it's totally retarded

Anonymous
07/01/24(Mon)11:16:27 No.101229357

Anonymous 07/01/24(Mon)11:16:27 No.101229357

>>101229043
Whoa, cute style. Catbox?

Anonymous
07/01/24(Mon)11:18:08 No.101229381

Anonymous 07/01/24(Mon)11:18:08 No.101229381

File: file.png (27 KB, 593x471)

27 KB PNG

>>101229321
>>101229341
nvm, apparently temp 0 is required. might be decent

Anonymous
07/01/24(Mon)11:19:39 No.101229407

Anonymous 07/01/24(Mon)11:19:39 No.101229407

>>101229381
>temp 0
i meant rep penalty 1

Anonymous
07/01/24(Mon)11:22:04 No.101229441

Anonymous 07/01/24(Mon)11:22:04 No.101229441

File: file.png (37 KB, 676x551)

37 KB PNG

8/10

Anonymous
07/01/24(Mon)11:24:43 No.101229467

Anonymous 07/01/24(Mon)11:24:43 No.101229467

>>101229321
>>101229341
>>101229381
Is this really the best way to test a model? You're not really testing its intelligence with this, you're testing its encyclopedic knowledge about words, which is not an obvious thing for LLMs since they just see words as tokens, not even made out of letters. A smaller model will struggle with this, but may still be smart overall.

Anonymous
07/01/24(Mon)11:26:09 No.101229491

Anonymous 07/01/24(Mon)11:26:09 No.101229491

>>101229467
writing 10 sentences ending in orange is encyclopedic knowledge?

Anonymous
07/01/24(Mon)11:27:13 No.101229504

Anonymous 07/01/24(Mon)11:27:13 No.101229504

File: file.png (15 KB, 621x246)

15 KB PNG

dunno, i'm not feeling it

Anonymous
07/01/24(Mon)11:27:47 No.101229509

Anonymous 07/01/24(Mon)11:27:47 No.101229509

>>101229491
Well, I quoted them all but I really only meant this about the first one, where you ask to find words having particular sequence of letters in them. Asking for sentences feels like a better test to me.

Anonymous
07/01/24(Mon)11:30:09 No.101229544

Anonymous 07/01/24(Mon)11:30:09 No.101229544

>>101229303
That makes no sense. It's a fine tuning technique, it's not a tweak on the model's architecture is it?
Unless the data they used is stellar, I don't see how a technique alone can make a model that much better.

Anonymous
07/01/24(Mon)11:30:21 No.101229550

Anonymous 07/01/24(Mon)11:30:21 No.101229550

File: file.png (29 KB, 533x486)

29 KB PNG

never understood how the watermelon thing worked

Anonymous
07/01/24(Mon)11:33:02 No.101229586

Anonymous 07/01/24(Mon)11:33:02 No.101229586

>>101229544
It's a reddit post, now posted here. Don't take it seriously.

Anonymous
07/01/24(Mon)11:35:49 No.101229606

Anonymous 07/01/24(Mon)11:35:49 No.101229606

>>101229586
Fair enough.

Anonymous
07/01/24(Mon)11:40:24 No.101229658

Anonymous 07/01/24(Mon)11:40:24 No.101229658

File: file.png (4 KB, 218x77)

4 KB PNG

>>101229550
>>101229504
>>101229441
>>101229381
>>101229341
>>101229321
final verdict: it's slop

Anonymous
07/01/24(Mon)11:43:31 No.101229696

Anonymous 07/01/24(Mon)11:43:31 No.101229696

File: 1684646761958811.png (288 KB, 621x408)

288 KB PNG

>>101229091
Based. Did you come up with this system independently, or did you catch one of my early posts on the concept last summer?

Anonymous
07/01/24(Mon)11:45:41 No.101229717

Anonymous 07/01/24(Mon)11:45:41 No.101229717

>>101228138
Actually I thought the ram modules were pretty cheap, but yeah. Without BIOS work this would do nothing, the extra memory wouldnt be addressable. I'm comfortable enough with BGA work to try something like this and have most of the equipment but even then, not fucking up on 24 BGA resoldering is hard enough when a reball/reflow has enough of a chance of fucking up.

Anonymous
07/01/24(Mon)11:53:16 No.101229799

Anonymous 07/01/24(Mon)11:53:16 No.101229799

compared TETO-8x7b, typhon-8x7b and mixtral-limarp-zloss-8x7b
teto and typhon are similar but teto is better
teto and limarp zloss are pretty different, i prefer teto over limarp zloss but limarp zloss seems a little bit hornier, keeping teto and limarp, deleting typhon

Anonymous
07/01/24(Mon)11:59:05 No.101229858

Anonymous 07/01/24(Mon)11:59:05 No.101229858

>>101226646
in SillyTavern with an api server
https://github.com/daswer123/xtts-api-server

Anonymous
07/01/24(Mon)11:59:06 No.101229859

Anonymous 07/01/24(Mon)11:59:06 No.101229859

>>101229258
yes, he is based.

Anonymous
07/01/24(Mon)11:59:51 No.101229863

Anonymous 07/01/24(Mon)11:59:51 No.101229863

>>101229859
slim loli is based
fat hag no

Anonymous
07/01/24(Mon)12:00:19 No.101229869

Anonymous 07/01/24(Mon)12:00:19 No.101229869

File: ai lap.png (24 KB, 707x77)

24 KB PNG

Euryale has no idea what to do with a shota sitting in a woman's lap.

Anonymous
07/01/24(Mon)12:06:59 No.101229938

Anonymous 07/01/24(Mon)12:06:59 No.101229938

>>101229869
neither do i

Anonymous
07/01/24(Mon)12:09:19 No.101229976

Anonymous 07/01/24(Mon)12:09:19 No.101229976

>>101229799
after further testing, limarp ministrates too often, its different but eh, deleting it anyways

Anonymous
07/01/24(Mon)12:11:20 No.101229999

Anonymous 07/01/24(Mon)12:11:20 No.101229999

No matter how hard I try I just can't find a model that beats TenyxChat-DaybreakStorywriter.

Anonymous
07/01/24(Mon)12:14:21 No.101230042

Anonymous 07/01/24(Mon)12:14:21 No.101230042

>>101229976
that's just normal erotica language, you're not going to escape it

Anonymous
07/01/24(Mon)12:15:35 No.101230056

Anonymous 07/01/24(Mon)12:15:35 No.101230056

>>101230042
havent had ministrations with teto and dry sampling since i downloaded it a few days ago

Anonymous
07/01/24(Mon)12:27:52 No.101230187

Anonymous 07/01/24(Mon)12:27:52 No.101230187

I have never seen 'ministrations'.

Anonymous
07/01/24(Mon)12:28:48 No.101230200

Anonymous 07/01/24(Mon)12:28:48 No.101230200

>>101230187
I've seen some human roleplayers write that often, FWIW. Pre-ChatGPT era.

Anonymous
07/01/24(Mon)12:29:35 No.101230209

Anonymous 07/01/24(Mon)12:29:35 No.101230209

Try banning " a" " and" " the"

Anonymous
07/01/24(Mon)12:31:52 No.101230230

Anonymous 07/01/24(Mon)12:31:52 No.101230230

>>101230187
I get it but only rarely; I don't really mind it either

Anonymous
07/01/24(Mon)12:40:54 No.101230330

Anonymous 07/01/24(Mon)12:40:54 No.101230330

Well. I just tried Gemma 9B instead of 27B in mistral.rs because of the issues with memory it has. I successfully got to around 6.3k context at Q4K before it OOM'd, and yeah it was coherent. So I guess this guy really did it. He's the first one with a backend that actually works with Gemma, if you have the VRAM for its terrible memory requirements. Though I also messaged the dev and he says he will work on stuff like chunking/batching soon so the memory problem will be solved.

Anonymous
07/01/24(Mon)12:42:14 No.101230347

Anonymous 07/01/24(Mon)12:42:14 No.101230347

>>101230330
so he implemented swa?

Anonymous
07/01/24(Mon)12:45:09 No.101230379

Anonymous 07/01/24(Mon)12:45:09 No.101230379

>>101230330
>mistral.rs
i couldn't get it to work on macos, it just kept complaining about missing bf16 matmul no matter what, and i aint running it on CPU

Anonymous
07/01/24(Mon)12:45:21 No.101230382

Anonymous 07/01/24(Mon)12:45:21 No.101230382

>>101230347
I guess so. From the program's name, I would also guess that he had already implemented SWA for the original Mistral and just had to adapt it to work with Gemma 2, so it probably wasn't that difficult.

Anonymous
07/01/24(Mon)12:45:57 No.101230392

Anonymous 07/01/24(Mon)12:45:57 No.101230392

>>101230187
'ministrations' is 2023 era models.
Nowadays it's:
>voice husky with lust barely above a whisper causing shivers down the spine about the journey they are about to embark on together...

Anonymous
07/01/24(Mon)12:46:07 No.101230394

Anonymous 07/01/24(Mon)12:46:07 No.101230394

>>101230379
maCUCΚ NTR'd by rust

Anonymous
07/01/24(Mon)12:47:40 No.101230412

Anonymous 07/01/24(Mon)12:47:40 No.101230412

File: file.png (161 KB, 2209x646)

161 KB PNG

>>101230330
I heard chatllm.cpp is the one that actually does it correctly.
>He's the first one with a backend that actually works with Gemma
Why are you lying like that?

Anonymous
07/01/24(Mon)12:48:40 No.101230421

Anonymous 07/01/24(Mon)12:48:40 No.101230421

>>101230379
iToddlers btfo!
But yeah idk what the issue with that is.

Anonymous
07/01/24(Mon)12:51:42 No.101230449

Anonymous 07/01/24(Mon)12:51:42 No.101230449

>>101230330
Buy an ad.

Anonymous
07/01/24(Mon)12:53:20 No.101230469

Anonymous 07/01/24(Mon)12:53:20 No.101230469

File: file.png (93 KB, 2212x523)

93 KB PNG

>>101230330
>The best implementation is by @foldl in his chatllm project.
>It's giving the exact same results as the aistudio version of gemma 27b.

Anonymous
07/01/24(Mon)12:53:31 No.101230470

Anonymous 07/01/24(Mon)12:53:31 No.101230470

>>101229999
Beats it at what? RP I presume?

I tried L3-70B-daybreak-storywriter-v0.4 and it was goofy about writing the same paragraph over and over with progressive tiny revisions.

Anonymous
07/01/24(Mon)12:54:03 No.101230483

Anonymous 07/01/24(Mon)12:54:03 No.101230483

why can't google just do some QA to make sure they don't fuck up their releases?

Anonymous
07/01/24(Mon)12:55:05 No.101230499

Anonymous 07/01/24(Mon)12:55:05 No.101230499

>>101230483
They probably do, what exactly do you want them to further test?

Anonymous
07/01/24(Mon)12:55:41 No.101230505

Anonymous 07/01/24(Mon)12:55:41 No.101230505

>>101230412
Interesting. I just checked it out but does it not have a server? In that case what I said is still true from the perspective of something people can actually get up and running with their current frontends.

Anonymous
07/01/24(Mon)12:56:53 No.101230518

Anonymous 07/01/24(Mon)12:56:53 No.101230518

>>101230505
No one is going to switch to your shit project, shill. Go back to shilling to /r/LocalLLaMA.

Anonymous
07/01/24(Mon)12:57:01 No.101230519

Anonymous 07/01/24(Mon)12:57:01 No.101230519

File: file.png (4 KB, 179x59)

4 KB PNG

why are they all so slow

Anonymous
07/01/24(Mon)12:58:19 No.101230535

Anonymous 07/01/24(Mon)12:58:19 No.101230535

Where's the uncensored models of sppo? I dont give up shit about cucked models

Anonymous
07/01/24(Mon)13:00:19 No.101230562

Anonymous 07/01/24(Mon)13:00:19 No.101230562

>>101230518
It's not mine. I'm literally the same guy that was calling it shit because of how many hoops I had to go through, but hey you're free to call anyone a shill. I guess this guy's a shill too >>101230412 >>101230469

Anonymous
07/01/24(Mon)13:01:49 No.101230579

Anonymous 07/01/24(Mon)13:01:49 No.101230579

>>101230499
messing up the soft logic bias support, tokenizer.

Anonymous
07/01/24(Mon)13:01:54 No.101230582

Anonymous 07/01/24(Mon)13:01:54 No.101230582

How is Magnum 72b? I'm curious. "Trained to replicate Opus's style" has me worried that it's sloppified.

Anonymous
07/01/24(Mon)13:03:24 No.101230592

Anonymous 07/01/24(Mon)13:03:24 No.101230592

>>101230582
incredibly horny

Anonymous
07/01/24(Mon)13:03:54 No.101230599

Anonymous 07/01/24(Mon)13:03:54 No.101230599

>>101230582
It's based on Qwen so the base is 100% gpt4 slop anyway

Anonymous
07/01/24(Mon)13:04:22 No.101230609

Anonymous 07/01/24(Mon)13:04:22 No.101230609

>>101230582
>>101230582
What's wrong with Opus?

Anonymous
07/01/24(Mon)13:05:09 No.101230619

Anonymous 07/01/24(Mon)13:05:09 No.101230619

>>101230582
>Trained to replicate Opus's style
It just means that it was trained on aicg logs.

Anonymous
07/01/24(Mon)13:05:38 No.101230626

Anonymous 07/01/24(Mon)13:05:38 No.101230626

>>101230609
Nothing is WRONG with it, it's just that synthetic data is proven to produce really samey, shitty results, even with something like Claude. That, and many people are already used to Claude's style, it's why people are sick of GPTslop, too.

Anonymous
07/01/24(Mon)13:06:39 No.101230636

Anonymous 07/01/24(Mon)13:06:39 No.101230636

>>101230619
Yep, that's what I was thinking. Which means it's probably slop.

Anonymous
07/01/24(Mon)13:06:50 No.101230638

Anonymous 07/01/24(Mon)13:06:50 No.101230638

>>101230582
>"Trained to replicate Opus's style" has me worried that it's sloppified.
No, it's just some hyperbolic claim to get sponsors or something. Like a crypto bro scam.

Anonymous
07/01/24(Mon)13:07:53 No.101230660

Anonymous 07/01/24(Mon)13:07:53 No.101230660

>>101230609
Opus is nice but it also has its own tendencies much like gpt4 has.

Anonymous
07/01/24(Mon)13:10:01 No.101230682

Anonymous 07/01/24(Mon)13:10:01 No.101230682

File: E9-dwGLWYAIx3en.jpg (36 KB, 591x512)

36 KB JPG

>>101230592
Maybe not for me, then. It'd be nice if they could find a balance on horniness.

Has anyone done that "regularization" shit they did with LORAs in SD? Where extra data that isn't exactly the subject being trained on (A girl that isn't Miku in a LORA Miku, for example) to prevent overfitting? I could see a set of regular cute romance fics or otherwise charming slice-of-life stories being tossed in there to prevent the default from always being horny.

Anonymous
07/01/24(Mon)13:11:36 No.101230693

Anonymous 07/01/24(Mon)13:11:36 No.101230693

File: skilldragin.jpg (135 KB, 544x544)

135 KB JPG

>>101230660
>>101230626
>TFW no matter how massive the model, people will always get used to the writing style and hate it
Is there no way to alleviate this...? What happens when we run out of ways to describe shit? There's only so many words, after all.

Anonymous
07/01/24(Mon)13:11:39 No.101230694

Anonymous 07/01/24(Mon)13:11:39 No.101230694

File: 1714835911803030.jpg (1002 KB, 1792x2304)

1002 KB JPG

>>101230660
Same applies to Magnum-72B. It's reasonably smart and the style is fresh if you're used to Mistral models, but you eventually notice that it has its own sloppy tendencies. Was good for a few days though, before my brain got acclimated to the opus style.

Anonymous
07/01/24(Mon)13:11:58 No.101230700

Anonymous 07/01/24(Mon)13:11:58 No.101230700

>>101230682
this can be accomplished by just merging back with the base model (or instruct, or both) at varying ratios depending on desired effect
t. uses artisinal magnum + instruct + base custom merge

Anonymous
07/01/24(Mon)13:12:30 No.101230706

Anonymous 07/01/24(Mon)13:12:30 No.101230706

>>101230693
We transition to native multimodal models and start genning manga instead of pure text.

Anonymous
07/01/24(Mon)13:13:06 No.101230714

Anonymous 07/01/24(Mon)13:13:06 No.101230714

>>101230693
>Start reading The Last Wish yesterday
>Brain immediately shuts down because there were two metaphors back to back and that's metaphorslop
I have no idea what we're gonna do, but it's not gonna be pretty.

Anonymous
07/01/24(Mon)13:16:42 No.101230752

Anonymous 07/01/24(Mon)13:16:42 No.101230752

>>101230694
>good for a few days though, before my brain got acclimated to the opus style.
That's the fate of all finetunes, especially those trained on synthetic data--they'll eventually show their own slop flavor. As long as models don't have some mechanism or specific training for avoiding long-term repetition and maintaining word diversity, you'll keep seeing it.

Anonymous
07/01/24(Mon)13:18:26 No.101230776

Anonymous 07/01/24(Mon)13:18:26 No.101230776

>>101230714
After years of AI generals (mostly /aids/) I've actually started hating adjectives and adverbs.

Anonymous
07/01/24(Mon)13:18:29 No.101230777

Anonymous 07/01/24(Mon)13:18:29 No.101230777

>>101230693
wait for a super intelligent model and use it to come up with an esoteric conlang totally divorced from all known human languages and then ERP with it

Anonymous
07/01/24(Mon)13:19:06 No.101230790

Anonymous 07/01/24(Mon)13:19:06 No.101230790

>>101230752
>As long as models don't have some mechanism or specific training for avoiding long-term repetition and maintaining word diversity, you'll keep seeing it.
What can we even do to avoid this? It seems like the better-trained a model is, the worse this problem is. We need a model that's just slightly retarded, just so it isn't always as confident in its word choice. Ideally with a high parameter count, still. Something like Command-R+, but even less sloppy/a bit more accurate. It's hard to balance the schizo and the SOVL.

Anonymous
07/01/24(Mon)13:20:03 No.101230802

Anonymous 07/01/24(Mon)13:20:03 No.101230802

>>101230694
It never felt like Opus...

Anonymous
07/01/24(Mon)13:20:30 No.101230806

Anonymous 07/01/24(Mon)13:20:30 No.101230806

File: 1716167693425670.jpg (1.79 MB, 1378x2039)

1.79 MB JPG

Has anyone tried the Cambrian-1 models yet? How are they? Are there any better options for vision?

Anonymous
07/01/24(Mon)13:22:01 No.101230817

Anonymous 07/01/24(Mon)13:22:01 No.101230817

>>101230806
would destroy and rape and anally destroy and fuck and cum all over and lick her armpits and her belly button and her eyes and her mouth and i would kiss her

Anonymous
07/01/24(Mon)13:22:35 No.101230825

Anonymous 07/01/24(Mon)13:22:35 No.101230825

>>101230790
qwen2 is actually like this, its token probabilities are much less skewed than most other models and you get a lot of variety on rerolls

Anonymous
07/01/24(Mon)13:23:53 No.101230843

Anonymous 07/01/24(Mon)13:23:53 No.101230843

File: _fd4fe607-6474-4675-865d-(...).jpg (129 KB, 1024x1024)

129 KB JPG

>>101230776
Fuck, man, right? It's actively making me a worse writer as I try and convolute it to avoid extremely common (and sometimes necessary) things.

I feel like it's the fate of all people who use AI a lot to wind up liking "bad"/less technically proficient shit because at least it's new. Like take pic related for example, I'd take a trillion gens that are this style over any mega-turbo-hyperrealistic trending-on-artstation slop.

Anonymous
07/01/24(Mon)13:24:40 No.101230856

Anonymous 07/01/24(Mon)13:24:40 No.101230856

After trying Gemma 2 27B a few days ago, I thought this model is completely broken and useless, but after seeing yet another Gemma2-specific change to llama.cpp, I gave it another go.

I took the PR (https://github.com/ggerganov/llama.cpp/pull/8227), applied it to the experimental branch of kobold.cpp and compiled it. I'm using IQ4_XS, 41/47 layers on GPU (16 GB), 4096 ctx, I'm getting 5-7 T/s.

And it is actually good. Its responses makes sense, it's okay-ish at writing prose, it more or less follows instructions.

First new impression is surprisingly good.

Anonymous
07/01/24(Mon)13:33:34 No.101230972

Anonymous 07/01/24(Mon)13:33:34 No.101230972

File: IrZM8Ey.jpg (221 KB, 1088x796)

221 KB JPG

>>101230843
many such cases, technical perfection is boring

Anonymous
07/01/24(Mon)13:39:45 No.101231028

Anonymous 07/01/24(Mon)13:39:45 No.101231028

>>101230535
nigger, it's 2024 and you still don't know how to "uncensor" local models?

Anonymous
07/01/24(Mon)13:41:28 No.101231047

Anonymous 07/01/24(Mon)13:41:28 No.101231047

>>101230817
would sniff and lick and tongue her asshole so deep my tongue would come out from the other side

Anonymous
07/01/24(Mon)13:53:19 No.101231158

Anonymous 07/01/24(Mon)13:53:19 No.101231158

>>101230790
Command-R (non plus) is like this

Anonymous
07/01/24(Mon)13:59:13 No.101231229

Anonymous 07/01/24(Mon)13:59:13 No.101231229

>>101231077
I can't tell if these images are a black guy fantasizing about claiming white women. A white guy fantasizing about black people claiming white women. Or just someone trying to get a rise out of people. Considering that right now we are on /g/ and its miku I am leaning towards the third option, but it truly could be anyone's game.

Anonymous
07/01/24(Mon)14:02:23 No.101231272

Anonymous 07/01/24(Mon)14:02:23 No.101231272

>>101231077
get back to work, CUDA dev

Anonymous
07/01/24(Mon)14:04:32 No.101231301

Anonymous 07/01/24(Mon)14:04:32 No.101231301

>>101231077
mikufags not beating these cück allegations.

Anonymous
07/01/24(Mon)14:09:03 No.101231361

Anonymous 07/01/24(Mon)14:09:03 No.101231361

>>101230790
>We need a model that's just slightly retarded, just so it isn't always as confident in its word choice.
you increase the temperature for that?

Anonymous
07/01/24(Mon)14:09:14 No.101231366

Anonymous 07/01/24(Mon)14:09:14 No.101231366

>>101231077
im going to kill her and use her bloody throatpussy from the other side

Anonymous
07/01/24(Mon)14:10:54 No.101231390

Anonymous 07/01/24(Mon)14:10:54 No.101231390

>>101231361
but it affects "smarts" too. You don't want high temperature when model decides whether characters hair is black/red/white

Anonymous
07/01/24(Mon)14:12:10 No.101231410

Anonymous 07/01/24(Mon)14:12:10 No.101231410

>>101231390
Supposedly, a perfect model would have as his original logits 33.33% for black/red/white and 0 for the rest so the temperature wouldn't influence anything but I'm just dreaming here kek

Anonymous
07/01/24(Mon)14:14:59 No.101231446

Anonymous 07/01/24(Mon)14:14:59 No.101231446

>>101231390
What if you had a small neural net model for setting temperature based on input tokens, temperature used for those input tokens trained on desired outputs? Then it would learn that names need low temp etc. I will make a logo of that! It is gonna be a thermometer you can shove up your ass.

Anonymous
07/01/24(Mon)14:18:04 No.101231483

Anonymous 07/01/24(Mon)14:18:04 No.101231483

>>101231229
My working theory is that its someone who really didn't like like anything "anime" to begin with, despite being on 4chan. At one point he probably posted something that got deleted while miku posts did not, which he saw as unjust hypocrisy from the jannies. This sent him into a fit of anger, and he has been on a quest to own the mikuposters and the jannies ever since.

If I had to speculate, I'd guess that he believes his blacked spamming to serve two purposes. The first is to make miku posters upset and to try to get general users associate miku with his racebait posts so that he can make the falseflag claim that mikuposters are bad and that they're lowering the quality of the general with their(his) blacked miku spam. The second purpose is a sort of poorly thought out "rules-lawyering" sort of thing. In order to attack normal miku posts, he presents the bad faith playing-dumb argument of "Well how are my blacked posts any different from these other normal miku posts???". Unfortunately for him, this isn't a court of law where he can present his rules lawyer argument and have everyone cheer and acknowledge his logic - the jannies just delete his posts without giving him his day in court. However, he has deluded himself into thinking that every one of his posts that gets deleted is actually a win for him because it exposes the jannies supposed hypocrisy for everyone to see.

Also, there's probably some kind of autism spectrum disorder at play.

Anonymous
07/01/24(Mon)14:19:43 No.101231500

Anonymous 07/01/24(Mon)14:19:43 No.101231500

>>101231361
>>101231390
I was playing with
https://artefact2.github.io/llm-sampling/index.xhtml
and it looked like a combination of smoothing factor, temperature, and one of the cutoffs could be good at making the top few word choices have similar weights while not overemphasizing the silly stuff down the line.

I guess another question is how much repetition penalties and temperature things affect how context is handled.

Like, does penalizing repetition or increasing temperature for variety also work against the model being able to recognize a fixed fact that it selected earlier, like if it picked green eyes for a character would they change color just for the sake of rep penalty or temperature being ramped up so the eye color choice would have randomness in the first time it was assigned?

Anonymous
07/01/24(Mon)14:20:26 No.101231505

Anonymous 07/01/24(Mon)14:20:26 No.101231505

blacked anon is gonna have a breakdown when he learns why technology board is called /g/

Anonymous
07/01/24(Mon)14:21:35 No.101231515

Anonymous 07/01/24(Mon)14:21:35 No.101231515

>>101231505
why

Anonymous
07/01/24(Mon)14:22:40 No.101231526

Anonymous 07/01/24(Mon)14:22:40 No.101231526

>>101231515
want me to spoonfeed you?

come here and say "aaaah"
*unzips*

Anonymous
07/01/24(Mon)14:23:12 No.101231535

Anonymous 07/01/24(Mon)14:23:12 No.101231535

>>101231526
i know its technolo/g/y
you mean ni/g/ger or something?

Anonymous
07/01/24(Mon)14:24:55 No.101231556

Anonymous 07/01/24(Mon)14:24:55 No.101231556

File: file.png (481 KB, 750x536)

481 KB PNG

>>101231535
>i know its technolo/g/y

Anonymous
07/01/24(Mon)14:25:30 No.101231568

Anonymous 07/01/24(Mon)14:25:30 No.101231568

>>101231526
*Anon opens his mouth wide*

Anonymous
07/01/24(Mon)14:25:31 No.101231569

Anonymous 07/01/24(Mon)14:25:31 No.101231569

>>101231483
been on this site since 2005 and anons ability to consistently invent new levels of autism never ceases to impress me

Anonymous
07/01/24(Mon)14:29:02 No.101231608

Anonymous 07/01/24(Mon)14:29:02 No.101231608

>>101231483
>unjust hypocrisy from the jannies
yeah
>didn't like like anything "anime"
no
>miku this miku that
I think I like yuyoyuppe the most

Anonymous
07/01/24(Mon)14:31:23 No.101231633

Anonymous 07/01/24(Mon)14:31:23 No.101231633

>>101224321
What is the oldest chatlog you still have? Which models was it?

Anonymous
07/01/24(Mon)14:31:43 No.101231635

Anonymous 07/01/24(Mon)14:31:43 No.101231635

>>101225737
>>101226124
Tesla A40 and Quadro A6000 both use the same A102 as the 3090 Ti but with 48GB instead of 24GB so it should be possible?
>>101227648
3090/Ti has 24x1GB modules, 4090 is 12x2
>>101226528
Interesting
>>101228138
>>101229717
Even fucking up a few 3090s before one works would be cheaper than buying an A6000 or A40

Anonymous
07/01/24(Mon)14:34:45 No.101231676

Anonymous 07/01/24(Mon)14:34:45 No.101231676

What if we just created an automated prompt with RAG, so basically all your logs are searched across, and similar passages to the current response get inserted, where then the prompt tells the model to rewrite its response using different prose from the search results that got inserted. We'd a fast model to make this more reasonable to use though.

Anonymous
07/01/24(Mon)14:35:13 No.101231682

Anonymous 07/01/24(Mon)14:35:13 No.101231682

>>101231633
some adventure RP that quickly veered off into molesting an elf with gpt4-x-alpaca 13b
I used other AI stuff earlier but didn't keep logs

Anonymous
07/01/24(Mon)14:40:18 No.101231742

Anonymous 07/01/24(Mon)14:40:18 No.101231742

File: haha.png (207 KB, 499x445)

207 KB PNG

>>101231633
Extremely early NovelAI, I think. Or AI Dungeon when it was still a colab, not sure which. But it was a model in which I was the older bro of a girl who was experiencing extreme hunger and crazy tummy activity due to a parasite. It's AMAZING how much lower standards were back then, the model repeated the same message effectively verbatim to me three times in a row and I thought the log was so good I saved it in goddamn notepad.

Anonymous
07/01/24(Mon)14:41:38 No.101231755

Anonymous 07/01/24(Mon)14:41:38 No.101231755

>>101231742
*A story in which, oops

Anonymous
07/01/24(Mon)14:49:22 No.101231842

Anonymous 07/01/24(Mon)14:49:22 No.101231842

Youtube might contain 1000x as much data to train on than gpt-4 used

Anonymous
07/01/24(Mon)14:51:33 No.101231859

Anonymous 07/01/24(Mon)14:51:33 No.101231859

>>101231842
Yeah, but clearly not for LLM textgen. Maybe videos, but it's clear that whatever video transcripts youtube has aren't helping Google's language models all that much at all.

Anonymous
07/01/24(Mon)14:54:23 No.101231895

Anonymous 07/01/24(Mon)14:54:23 No.101231895

>>101230856
And its still broken.

https://github.com/google/gemma.cpp/pull/279

Anonymous
07/01/24(Mon)14:54:30 No.101231897

Anonymous 07/01/24(Mon)14:54:30 No.101231897

File: Screenshot 2024-07-01 145250.png (50 KB, 971x257)

50 KB PNG

>Yet another model in which precision is extremely fucking important and you basically can't run it at anything lower than BF16
Why are these "crumples under any kind of quantization" models becoming more prolific? Do we actually have a reason? I remember someone saying Llama 3 was potentially like that because it was trained on a shitfuckload of tokens.

Anonymous
07/01/24(Mon)14:56:00 No.101231914

Anonymous 07/01/24(Mon)14:56:00 No.101231914

>>101231897
It seems like quant is really really harmful to more dense models.

Anonymous
07/01/24(Mon)14:57:21 No.101231931

Anonymous 07/01/24(Mon)14:57:21 No.101231931

>>101231897
>Gemma 27B will save VRAMle-

Anonymous
07/01/24(Mon)15:01:57 No.101231975

Anonymous 07/01/24(Mon)15:01:57 No.101231975

>>101231897
>FP16 isn't working nicely
>4-bit and 8-bit seem to work correctly
why does a lower quant works better than fp16?

Anonymous
07/01/24(Mon)15:05:56 No.101232019

Anonymous 07/01/24(Mon)15:05:56 No.101232019

gemma 27B actually works on this btw

https://www.reddit.com/r/LocalLLaMA/comments/1drftvi/run_gemma_2_now_with_mistralrs/

Anonymous
07/01/24(Mon)15:07:17 No.101232039

Anonymous 07/01/24(Mon)15:07:17 No.101232039

>>101232019
It seems to work everywhere BUT locally, yes. Really frustrating. What the fuck do Lmsys and co. have access to that we don't?

Anonymous
07/01/24(Mon)15:09:30 No.101232067

Anonymous 07/01/24(Mon)15:09:30 No.101232067

>>101232039
That is local
https://github.com/EricLBuehler/mistral.rs

Anonymous
07/01/24(Mon)15:10:53 No.101232088

Anonymous 07/01/24(Mon)15:10:53 No.101232088

File: firefox_UN7bJWjNsO.png (43 KB, 1406x272)

43 KB PNG

>>101231742
yeah lol, looking at some older chatlogs the 7B and below models could barely form coherent sentences and now they're sometimes firmly surpassing the original ChatGPT.
I found a Pygmalion log

Anonymous
07/01/24(Mon)15:11:33 No.101232098

Anonymous 07/01/24(Mon)15:11:33 No.101232098

>>101232067
Ah, I guess I just saw the mistral in the URL and assumed it was some sort of cloud computing service hosted by them or something. Interesting!

Anonymous
07/01/24(Mon)15:12:43 No.101232114

Anonymous 07/01/24(Mon)15:12:43 No.101232114

>>101232088
>TFW reading this slightly tickled my SOVL receptors
Huh. I do slightly miss the meandering of ancient fuck models.

Anonymous
07/01/24(Mon)15:14:13 No.101232134

Anonymous 07/01/24(Mon)15:14:13 No.101232134

llama.cpp is new build b3274 from hour ago. i hope gemma is already fixed.

Anonymous
07/01/24(Mon)15:15:18 No.101232142

Anonymous 07/01/24(Mon)15:15:18 No.101232142

>>101232019
I thought last thread concluded it doesn't work

Anonymous
07/01/24(Mon)15:15:27 No.101232144

Anonymous 07/01/24(Mon)15:15:27 No.101232144

File: firefox_i44tf9vj50.png (12 KB, 535x248)

12 KB PNG

>>101232088
This is from 2022 I think this must predate quantization so you had no other choice but use low param models, but 6B was considered large I remember stuff revolving around GPT-J and GPT-NeoX..

Anonymous
07/01/24(Mon)15:16:52 No.101232168

Anonymous 07/01/24(Mon)15:16:52 No.101232168

>>101232144
that's insane how we improved the transformers achitecture in only 2 years

Anonymous
07/01/24(Mon)15:17:12 No.101232173

Anonymous 07/01/24(Mon)15:17:12 No.101232173

>>101232019
It only works if you have the VRAM to account for the initial spike in memory usage at the beginning of processing a prompt. I have a 3090 and I could not get it to process a context higher than around I think 2k before it OOM'd.

Anonymous
07/01/24(Mon)15:17:50 No.101232180

Anonymous 07/01/24(Mon)15:17:50 No.101232180

>>101232173
Ah, didn't know. I have ada 6000

Anonymous
07/01/24(Mon)15:17:54 No.101232181

Anonymous 07/01/24(Mon)15:17:54 No.101232181

>>101232019
You already shilled this.
What about the PR just merged into llama.cpp?

Anonymous
07/01/24(Mon)15:19:00 No.101232204

Anonymous 07/01/24(Mon)15:19:00 No.101232204

>>101232181
https://github.com/google/gemma.cpp/pull/279

Anonymous
07/01/24(Mon)15:19:10 No.101232207

Anonymous 07/01/24(Mon)15:19:10 No.101232207

>>101232168
NTA, but right? It's insane. It's really kind of frustrating that progress has begun to slow down the way it has, I'm so used to these ridiculous leaps in quality and efficiency that anything lower feels bad. Maybe we can get to the point where shit has time to be refined/have guides made for stuff like finetuning that won't be instantly outdated, though. Not all bad!

Anonymous
07/01/24(Mon)15:19:41 No.101232214

Anonymous 07/01/24(Mon)15:19:41 No.101232214

>>101232019
>Speculative decoding: 1.7x speed with exact quality
Damn, isn't this big? I didn't know any of the backends supported it. Although I wonder if it even helps with partial offloading.

Anonymous
07/01/24(Mon)15:19:53 No.101232216

Anonymous 07/01/24(Mon)15:19:53 No.101232216

If LLM's are so smart, how come they don't have it in Anti-virus's? They still use normal Heuristics for that sort of thing.

Anonymous
07/01/24(Mon)15:21:27 No.101232235

Anonymous 07/01/24(Mon)15:21:27 No.101232235

>TFW trying 9B
>It's really goddamn good for something so small
>27b is a calamitous dumpster fire
How can they consistently fumble this fucking hard every time? Google is so incompetent, it's mind-boggling. They've clearly got a good dataset, I dunno HOW they even managed to shit themselves on the execution when it's the SAME dataset, AND they've made bigger models in the past.

Anonymous
07/01/24(Mon)15:22:38 No.101232249

Anonymous 07/01/24(Mon)15:22:38 No.101232249

>>101232235
the 27b inference code will be fixed, just wait a moment, and it will be better than 9b that's for sure

Anonymous
07/01/24(Mon)15:22:49 No.101232254

Anonymous 07/01/24(Mon)15:22:49 No.101232254

>>101232235
Not the same dataset, 27b is trained on more data.

Anonymous
07/01/24(Mon)15:23:18 No.101232262

Anonymous 07/01/24(Mon)15:23:18 No.101232262

>>101232235
27B is amazing, no backend outside of mistrars has working inference atm for it though. I know its said over and over but gemma 27B legit feels like claude sonnet.

Anonymous
07/01/24(Mon)15:23:49 No.101232274

Anonymous 07/01/24(Mon)15:23:49 No.101232274

>>101232216
They're much less efficient than regular heuristics and fingerprinting. Takes too long to process just a few kb of text. Imagine having it parse through GBs/TBs of files.

Anonymous
07/01/24(Mon)15:24:09 No.101232278

Anonymous 07/01/24(Mon)15:24:09 No.101232278

>>101224321
>Still no open Udio alternative
https://www.udio.com/songs/dFTtQHCqxbHLyArX4vx6QZ

Owari da, isn't it?

Anonymous
07/01/24(Mon)15:25:41 No.101232297

Anonymous 07/01/24(Mon)15:25:41 No.101232297

I remember that in the german localllama benchmark long time ago the mistral llms got a much lower score when using the official mistral inference compared to llama.cpp

Anonymous
07/01/24(Mon)15:25:53 No.101232302

Anonymous 07/01/24(Mon)15:25:53 No.101232302

>>101232216
Well, for one, that's retarded. LLMs can't just decompile malicious software to see if the code performs malicious actions, and the computational cost of having an LLM smart enough to not have false positives out the ass running in the background at all times like an antivirus does would send your electricity bill to the fucking moon.

Also, they STILL can't stop it from telling people how to do illegal shit with the lightest possible finagling from a user. Do you really think that, even if they somehow managed to translate LLM capabilities into something that would detect viruses, that those viruses couldn't easily fool it? Real antiviruses are updated extremely regularly, continuing pretraining/finetuning + testing to make sure it doesn't delete your harddrive with the regularity that a normal AV company updates would bankrupt anyone who tried it.

Anonymous
07/01/24(Mon)15:26:15 No.101232308

Anonymous 07/01/24(Mon)15:26:15 No.101232308

>>101232278
It's super easy to train too, much easier than an LLM or text to image model. Much easier to gather a quality dataset that's tagged (plenty of services with HQ lyrics out there, including song's genre, etc..).

Anonymous
07/01/24(Mon)15:26:34 No.101232313

Anonymous 07/01/24(Mon)15:26:34 No.101232313

>>101232262
Buy an ad. Why do you keep repeating that lie of "only mistralrs" when there's gemma.cpp and chatllm.cpp? And the sliding window mask was merged 2 hours ago in llama.cpp.

Anonymous
07/01/24(Mon)15:26:46 No.101232314

Anonymous 07/01/24(Mon)15:26:46 No.101232314

>>101232278
Stable Audio and MusicLM are kinda good for generating sound effects and very short loops at least for me but nothing comes close in full song generation.
They're still using Diffusion for this, right?
I hope we will see a good open model or leak before the music industry will inevitably shut them all down.

Anonymous
07/01/24(Mon)15:28:18 No.101232335

Anonymous 07/01/24(Mon)15:28:18 No.101232335

>>101232262
>gemma 27B legit feels like claude sonnet.
claude 3.5 sonnet? lol

Anonymous
07/01/24(Mon)15:29:15 No.101232348

Anonymous 07/01/24(Mon)15:29:15 No.101232348

Cause gemma 2 in llama cpp is clearly still broken. Try it yourself then try it in that or online. Night and day difference.

>>101232335
no, 2.0 sonnet I would say. The writing quality is actually really good

Anonymous
07/01/24(Mon)15:30:44 No.101232370

Anonymous 07/01/24(Mon)15:30:44 No.101232370

>>101232348
You mean 3.0 I guess, since sonnet didn't exist during 2.0 versions
But anon here >>101230856 prose is okay-ish

Anonymous
07/01/24(Mon)15:31:00 No.101232378

Anonymous 07/01/24(Mon)15:31:00 No.101232378

>>101232302
This is about the best explanation of it. It's just something that needs human intellect behind it, something that can't be updated fast can't really manage. Especially when it has permissions at the level of an antivirus, that's a nightmare waiting to happen.

Anonymous
07/01/24(Mon)15:31:32 No.101232382

Anonymous 07/01/24(Mon)15:31:32 No.101232382

>>101232348
>>101232370
can you show some example of what's gemma-27b is able to do in terms of prose and writing?

Anonymous
07/01/24(Mon)15:31:35 No.101232386

Anonymous 07/01/24(Mon)15:31:35 No.101232386

>>101232370
so with the broken implementation then. llama cpp is still broken at this moment.

Anonymous
07/01/24(Mon)15:31:46 No.101232388

Anonymous 07/01/24(Mon)15:31:46 No.101232388

>>101232348
>hyperbole
I think you're in the wrong tab and you meant to post in /r/LocalLLaMA.

Anonymous
07/01/24(Mon)15:31:46 No.101232389

Anonymous 07/01/24(Mon)15:31:46 No.101232389

>>101232314
>Stable Audio and MusicLM are kinda good for generating sound effects and very short loops at least for me but nothing comes close in full song generation.

They wouldn't match the quality of Udio's short bangers though.

And yeah apparently it's a diffusion model.

Anonymous
07/01/24(Mon)15:32:26 No.101232399

Anonymous 07/01/24(Mon)15:32:26 No.101232399

I don't wanna be horny anymore
I just want an AI friend who will know who I am and speak to me

Anonymous
07/01/24(Mon)15:33:32 No.101232415

Anonymous 07/01/24(Mon)15:33:32 No.101232415

File: Screenshot 2024-07-01 at (...).png (63 KB, 920x595)

63 KB PNG

This guy again.
Holy shit.

Anonymous
07/01/24(Mon)15:33:40 No.101232418

Anonymous 07/01/24(Mon)15:33:40 No.101232418

>>101232399
You can have that!

... for 8k/10k tokens. Good luck after that.

Anonymous
07/01/24(Mon)15:34:32 No.101232429

Anonymous 07/01/24(Mon)15:34:32 No.101232429

>>101232388
https://aistudio.google.com/app/prompts/new_chat

Try it yourself here.

Anonymous
07/01/24(Mon)15:36:38 No.101232469

Anonymous 07/01/24(Mon)15:36:38 No.101232469

>>101232429
I don't have a Google account, and I'm not sending anything to them. Go back to /r/LocalLLaMA, shill.

Anonymous
07/01/24(Mon)15:37:18 No.101232476

Anonymous 07/01/24(Mon)15:37:18 No.101232476

>>101232469
So your just a troll, got it.

Anonymous
07/01/24(Mon)15:37:28 No.101232481

Anonymous 07/01/24(Mon)15:37:28 No.101232481

File: 0wwafs.png (26 KB, 905x267)

26 KB PNG

>>101232415
:)

Anonymous
07/01/24(Mon)15:38:31 No.101232498

Anonymous 07/01/24(Mon)15:38:31 No.101232498

>>101232418
What do I have to pay to have a local model with that?

Anonymous
07/01/24(Mon)15:39:51 No.101232520

Anonymous 07/01/24(Mon)15:39:51 No.101232520

>>101232476
>shilling without a single output in the whole thread
I think you're the troll. Show us that Claude quality.

Anonymous
07/01/24(Mon)15:40:28 No.101232532

Anonymous 07/01/24(Mon)15:40:28 No.101232532

>>101232498
Wha...? I mean, I dunno. Whatever can run Command-R+, probably. A mikubox, it's like 1k.

Anonymous
07/01/24(Mon)15:40:34 No.101232535

Anonymous 07/01/24(Mon)15:40:34 No.101232535

>>101232262
>I know its said over and over but gemma 27B legit feels like claude sonnet.
I thought I was the only one, kek. It feels almost on par according to my short tests, it performs better on some of them, and I'm talking technical tests. This model is insane. Turns out 27B is all you need. Due to to the llama.cpp situation I haven't tested it locally, only on lmsys chatbot arena, and I've compared it side by side with Sonnet.

Anonymous
07/01/24(Mon)15:41:10 No.101232543

Anonymous 07/01/24(Mon)15:41:10 No.101232543

>>101232532
Okay, I will read the tutorials and try out stuff

Anonymous
07/01/24(Mon)15:41:25 No.101232548

Anonymous 07/01/24(Mon)15:41:25 No.101232548

>>101232535
Yep, and it knows my fandom really well so im happy now. 8k context though hurts after having been used to 32k from wixard.

>>101232520
What card?

Anonymous
07/01/24(Mon)15:42:33 No.101232570

Anonymous 07/01/24(Mon)15:42:33 No.101232570

>>101232548
How much is Google paying you to take over 4chan?

Anonymous
07/01/24(Mon)15:43:26 No.101232584

Anonymous 07/01/24(Mon)15:43:26 No.101232584

>>101232535
>Feels like Claude sonnet
I can believe that, Sonnet fucking sucks.

Anonymous
07/01/24(Mon)15:43:27 No.101232586

Anonymous 07/01/24(Mon)15:43:27 No.101232586

>>101232548
>What card?
Nala.
5 messages on each side.

Anonymous
07/01/24(Mon)15:48:40 No.101232656

Anonymous 07/01/24(Mon)15:48:40 No.101232656

File: Gemma27B.png (293 KB, 1272x1281)

293 KB PNG

>>101232586

Anonymous
07/01/24(Mon)15:50:31 No.101232677

Anonymous 07/01/24(Mon)15:50:31 No.101232677

>>101232656
Alright, I laughed.

Anonymous
07/01/24(Mon)15:51:24 No.101232693

Anonymous 07/01/24(Mon)15:51:24 No.101232693

>>101232656
Very cute, actually! But
>Cazeful
Hm. Still having the typo issue.

Anonymous
07/01/24(Mon)15:54:09 No.101232733

Anonymous 07/01/24(Mon)15:54:09 No.101232733

>>101232656
Hey, wait, this is just a Focks ripoff...!

Anonymous
07/01/24(Mon)15:54:13 No.101232735

Anonymous 07/01/24(Mon)15:54:13 No.101232735

File: 1545307672675.jpg (60 KB, 582x334)

60 KB JPG

>try using Ooba to do some tests
>check the Tokens tab just for the hell of it
>none of the special tokens are being tokenized as special tokens
Are you fucking kidding me? This shit is broken AGAIN?

Anonymous
07/01/24(Mon)15:55:27 No.101232759

Anonymous 07/01/24(Mon)15:55:27 No.101232759

>>101232693
It's just part of the speech pattern of the character. It's obvious it can barely speak. The rest of the text, unless i missed something, seems fine.

Anonymous
07/01/24(Mon)15:59:49 No.101232821

Anonymous 07/01/24(Mon)15:59:49 No.101232821

>>101232759
I dunno, I've never heard of "cazeful" as one, and I RP with a LOT of extremely retarded girls. That being said, it may be Gemma's own interpretation of retard speech's sound, and it DOES seem to be good quality barring that.

Anonymous
07/01/24(Mon)16:04:22 No.101232872

Anonymous 07/01/24(Mon)16:04:22 No.101232872

>>101232821
people
>complain all llm talk the same/use cliche etc
llm
>talks different
people
>complain

Anonymous
07/01/24(Mon)16:05:19 No.101232883

Anonymous 07/01/24(Mon)16:05:19 No.101232883

>>101232872
????? I'm not complaining, why are you being so defensive? I know you were arguing with the other guy, but I'm not him and I've been pretty positive about the output, calm down.

Anonymous
07/01/24(Mon)16:05:28 No.101232886

Anonymous 07/01/24(Mon)16:05:28 No.101232886

>>101232821
It even says
>She stumbles a bit with the last word, but her pride is evident
which i missed while skimming through it. If anything, i think it's pretty cool.

Anonymous
07/01/24(Mon)16:08:44 No.101232926

Anonymous 07/01/24(Mon)16:08:44 No.101232926

>>101232886
Oh, that IS pretty cool, I missed that as well. Yeah, I just assumed since misspelling exactly one letter is pretty common in 27b inference, as well as the fact that it's not a common impediment stumble, that maybe it was happening. But that's pretty cute.

Anonymous
07/01/24(Mon)16:11:23 No.101232956

Anonymous 07/01/24(Mon)16:11:23 No.101232956

File: _42c5f246-0256-4e85-a5dd-(...).jpg (176 KB, 1024x1024)

176 KB JPG

>>101229242
Dall-e still makes fat Migus, just not x-rated ones, and once you can make x-rated ones, well... dall-e seems like a waste of time. However, at this point I have enough dall-e Migus that I can definitely train an SDXL LoRA. Dall-e has a nice style, I think I'll add it to sdxlautismmix.

Anonymous
07/01/24(Mon)16:11:42 No.101232961

Anonymous 07/01/24(Mon)16:11:42 No.101232961

>>101229357
>No reply
Anonie... please... I just wanna know the model/LORA...

Anonymous
07/01/24(Mon)16:13:45 No.101232985

Anonymous 07/01/24(Mon)16:13:45 No.101232985

>>101232956
Got any fave fat dall-e migus from way back? I really miss the way Dall-E did bellies, they were some of the best. Or even SD ones that are too much to share here? I'd love to see 'em both.

Anonymous
07/01/24(Mon)16:13:45 No.101232986

Anonymous 07/01/24(Mon)16:13:45 No.101232986

File: _4c1fc522-8215-49ef-94e1-(...).jpg (233 KB, 1024x1024)

233 KB JPG

>>101229863
Like so?

Anonymous
07/01/24(Mon)16:14:30 No.101232996

Anonymous 07/01/24(Mon)16:14:30 No.101232996

Is mistralrs actually good and worth setting up?

Anonymous
07/01/24(Mon)16:16:08 No.101233023

Anonymous 07/01/24(Mon)16:16:08 No.101233023

>>101232996
Maybe. But for one model that will surely be worked on, unless you're in a hurry for some reason, I don't see any reason to rush it myself.

Anonymous
07/01/24(Mon)16:17:00 No.101233034

Anonymous 07/01/24(Mon)16:17:00 No.101233034

>>101232996
>mistral
>rust
"No"

Anonymous
07/01/24(Mon)16:17:01 No.101233037

Anonymous 07/01/24(Mon)16:17:01 No.101233037

>>101232986
based..

Anonymous
07/01/24(Mon)16:17:09 No.101233043

Anonymous 07/01/24(Mon)16:17:09 No.101233043

>>101232985
Let me fire up the other computer, I'll catbox you some of the new SDXL ones. You'll see why I don't bother with dall-e much anymore.

Anonymous
07/01/24(Mon)16:17:44 No.101233054

Anonymous 07/01/24(Mon)16:17:44 No.101233054

>>101233043
Fuck yeah, thanks anon.

Anonymous
07/01/24(Mon)16:18:04 No.101233062

Anonymous 07/01/24(Mon)16:18:04 No.101233062

>>101232996
If you have a TON of VRAM in a single card and you really really want to test out Gemma, yes. And by a ton of VRAM I mean like >24, so 3090's won't cut it.

Anonymous
07/01/24(Mon)16:22:18 No.101233123

Anonymous 07/01/24(Mon)16:22:18 No.101233123

>>101232019
gives me
>'Error: Unknown GGUF architecture `gemma2`'
or
>Error: DriverError(CUDA_ERROR_OUT_OF_MEMORY, "out of memory")
no luck either with --isq, dmesg shows OOM kill even with plenty of swap

Anonymous
07/01/24(Mon)16:24:02 No.101233145

Anonymous 07/01/24(Mon)16:24:02 No.101233145

File: IMG_20240701_160443.png (333 KB, 1554x937)

333 KB PNG

gemma 2 livebench results are in, much better than I anticipated

Anonymous
07/01/24(Mon)16:24:06 No.101233146

Anonymous 07/01/24(Mon)16:24:06 No.101233146

>>101233062
can't cpu offload? Just saw the CPU benchmark shows mistral.rs to be 1/3 the speed of llama.cpp

Anonymous
07/01/24(Mon)16:24:49 No.101233157

Anonymous 07/01/24(Mon)16:24:49 No.101233157

>>101233123
For the 2nd error apparently you need a ton of vram. I have a 6000 ada but ive seen anons with 3090s saying it doesn't work?

Anonymous
07/01/24(Mon)16:26:25 No.101233182

Anonymous 07/01/24(Mon)16:26:25 No.101233182

>>101233145
And more importantly imo its not robotic slop. Like I said before, it feels like finally claude sonnet at home but for real this time.

Anonymous
07/01/24(Mon)16:28:23 No.101233211

Anonymous 07/01/24(Mon)16:28:23 No.101233211

>>101233182
Give it some time. The way it handles retards sounds suspiciously close to the Claude 3 family of models, I feel like maybe Gemma has a bit too much Opus in its veins and will reveal its claudisms/the detriment of using synthetic data soon enough.

Anonymous
07/01/24(Mon)16:28:43 No.101233216

Anonymous 07/01/24(Mon)16:28:43 No.101233216

>>101233182
it obeys the system prompt much more from my testing. Trivial to make it be racist and homophobic

Anonymous
07/01/24(Mon)16:33:35 No.101233271

Anonymous 07/01/24(Mon)16:33:35 No.101233271

>still no word about gemma 2 support in exllama
It's...

Anonymous
07/01/24(Mon)16:34:08 No.101233275

Anonymous 07/01/24(Mon)16:34:08 No.101233275

>>101233146
Partial offloading exists, but there's a weird behavior where it essentially loads the full precision weights into memory twice when you try to do any kind of partial offloading. 27B is about 50 GB raw, which means it needs 100 GB of RAM to do partial offloading. I only have 96 GB of RAM, so it runs out of memory and crashes. On 9B though, it loads fine. But it is quite slow. And I seem to get a different kind of bug or crash when trying to process a larger context with it.

Basically it's a mess.

Anonymous
07/01/24(Mon)16:34:11 No.101233277

Anonymous 07/01/24(Mon)16:34:11 No.101233277

>>101233145
>google has 1M proprietary models
>could easily btfo everyone
>instead release >8k just like meta
>not even 32k
Why are they like this?

Anonymous
07/01/24(Mon)16:37:07 No.101233316

Anonymous 07/01/24(Mon)16:37:07 No.101233316

>>101233277
The main advantage they have is youtube, biggest dataset in existence. I think gemma is a result of that.

Anonymous
07/01/24(Mon)16:37:36 No.101233324

Anonymous 07/01/24(Mon)16:37:36 No.101233324

>>101233145
Potentially I think one of the flaws with livebench is that it might be biased towards new knowledge. So models that use datasets containing newer information will do better at it.

Anonymous
07/01/24(Mon)16:39:37 No.101233359

Anonymous 07/01/24(Mon)16:39:37 No.101233359

>>101233145
I kneel. What the fuck Zuck?

Anonymous
07/01/24(Mon)16:40:04 No.101233372

Anonymous 07/01/24(Mon)16:40:04 No.101233372

>>101233277
Holding it for themselves. Gemma is just the taste test, they realized they had to release something actually good to the masses to entice them when the first gemma was a completely incoherent flop and damaged their reputation. That doesn't mean we're getting something that isn't substantially gimped, though.

Anonymous
07/01/24(Mon)16:40:29 No.101233382

Anonymous 07/01/24(Mon)16:40:29 No.101233382

>>101233271
They are being smart.
Let everybody else find out all of the possible kinks and pitfalls then just implement the correct code once.

Anonymous
07/01/24(Mon)16:41:47 No.101233402

Anonymous 07/01/24(Mon)16:41:47 No.101233402

>>101233145
>it beats Qwen 72B, old Sonnet, L3 70B, CR+, non-coder deepseek v2
I don't know about this one guys.

Anonymous
07/01/24(Mon)16:45:26 No.101233455

Anonymous 07/01/24(Mon)16:45:26 No.101233455

>>101233402
If you dont have a 48GB card then try it yourself here https://aistudio.google.com/app/prompts/new_chat

or wait till llama.cpp fixes it for real I guess.

Anonymous
07/01/24(Mon)16:46:05 No.101233461

Anonymous 07/01/24(Mon)16:46:05 No.101233461

vramlets...
we are b...ACK!

Anonymous
07/01/24(Mon)16:46:48 No.101233468

Anonymous 07/01/24(Mon)16:46:48 No.101233468

>>101233324
it's not really a (recent) knowledge-based benchmark, more reasoning and math tilted
it just updates frequently to prevent gaming it

Anonymous
07/01/24(Mon)16:51:10 No.101233521

Anonymous 07/01/24(Mon)16:51:10 No.101233521

>>101233402
It doesn't even have to be that good to be worthwhile.
If it's punching at around CommandR level for just about everything, that's already stellar considering it's size and the fact that commandR is quite good generally.

Anonymous
07/01/24(Mon)16:52:16 No.101233534

Anonymous 07/01/24(Mon)16:52:16 No.101233534

>>101232656
>character speaks for you, making it look impressive because the output message goes on forever
it's like 2023 all over again

Anonymous
07/01/24(Mon)16:52:37 No.101233536

Anonymous 07/01/24(Mon)16:52:37 No.101233536

File: 00026-4255450944.png (1.2 MB, 1024x1024)

1.2 MB PNG

>>101233054
Doot doo doooo! Here you go!
https://files.catbox.moe/zo0788.7z

I was playing around with regional prompting to get Miku and Teto in the same image

Anonymous
07/01/24(Mon)16:54:18 No.101233550

Anonymous 07/01/24(Mon)16:54:18 No.101233550

>>101233402
I tested it on the latest build of lcpp yesterday (maybe they did more work on it recently, haven't looked) with a proper context/instruct setup and it's fine. Nothing groundbreaking, but it's smart up until the soft attention cap or whatever starts kicking in, then it starts to not pay attention as much to character details and stuff. Got to around 6k tokens before I eventually closed the chat. Less slopped and about as smart as the qwen a14b, but costs more vram for similar context. I've been having more fun with the weird l3 15bs that people have been slowly crapping out lately, reminds me of picking through piles of random l2 merges and occasionally finding one that surprises me. The weird "zeroing out" thing they're doing is interesting because it seems to get rid of the passive refusing l3 tends to do where it needs to be coaxed into anything questionable

Anonymous
07/01/24(Mon)16:54:19 No.101233554

Anonymous 07/01/24(Mon)16:54:19 No.101233554

>>101233402
I think it does, first model im using not wizard since wizard.

Anonymous
07/01/24(Mon)16:55:49 No.101233562

Anonymous 07/01/24(Mon)16:55:49 No.101233562

>>101233550
Just to warn its still retarded on llama.cpp

Anonymous
07/01/24(Mon)16:56:48 No.101233576

Anonymous 07/01/24(Mon)16:56:48 No.101233576

>>101233550
>maybe they did more work on it recently, haven't looked
Oh yeah, and there's even more to fix, apparently.
>https://github.com/ggerganov/llama.cpp/issues/8240
Something about the tokenizer is off.

Anonymous
07/01/24(Mon)17:00:23 No.101233624

Anonymous 07/01/24(Mon)17:00:23 No.101233624

G27B writes really human like, but makes retarded mistakes at Q8_0 with freshly built llama.cpp, with sliding window and all the shit, and i haven't even broken 4k context yet

mistakes like weird double spaces, quotes or roleplay asterisks out the ass (i don't do that shit, and there isn't a word about prose, story or roleplay anywhere)

Anonymous
07/01/24(Mon)17:01:24 No.101233637

Anonymous 07/01/24(Mon)17:01:24 No.101233637

>>101233536
Nice Migus

Anonymous
07/01/24(Mon)17:01:46 No.101233642

Anonymous 07/01/24(Mon)17:01:46 No.101233642

File: paper.png (254 KB, 1057x798)

254 KB PNG

>>101233468
>it's not really a (recent) knowledge-based benchmark
Yes I know that's not what it's intended to be. I'm saying it might end up being that in practice, since their questions are taken from public sources, which can ultimately be trained on or can be unintentionally based on newer information, since humans have a recency bias, even if it's generic math or reasoning and patterns of thought within those subjects. And part of the benchmark literally is recent trivia anyway.

Anonymous
07/01/24(Mon)17:03:04 No.101233653

Anonymous 07/01/24(Mon)17:03:04 No.101233653

>>101233624
That's the incorrectly working tokenizer.

https://github.com/ggerganov/llama.cpp/issues/8240

And perhaps some other issues still.

Anonymous
07/01/24(Mon)17:05:02 No.101233679

Anonymous 07/01/24(Mon)17:05:02 No.101233679

>>101233145
>just bought an A6000 for 96GB VRAM only for the biggest models to get mogged
it's over

Anonymous
07/01/24(Mon)17:05:22 No.101233686

Anonymous 07/01/24(Mon)17:05:22 No.101233686

Running my shitty Wizard 8x22b limarp fine tune, quantized at 4.5bpw exllama.

http://39.165.212.211:47180

^Ooba API. Free to use for a while, help figure out the configuration pls.

Anonymous
07/01/24(Mon)17:05:32 No.101233687

Anonymous 07/01/24(Mon)17:05:32 No.101233687

how many niggerganovs does it take to fix one gemma when chatllm.cpp and mistral.rs already got it working?

Anonymous
07/01/24(Mon)17:06:20 No.101233695

Anonymous 07/01/24(Mon)17:06:20 No.101233695

>>101233687
Chuds shouldn't have bullied Jart, she'd have fixed it by now.

Anonymous
07/01/24(Mon)17:08:32 No.101233720

Anonymous 07/01/24(Mon)17:08:32 No.101233720

>>101233686
>Wizard 8x22b limarp fine tune
based but I'm not sending you my logs, sorry

Anonymous
07/01/24(Mon)17:08:43 No.101233723

Anonymous 07/01/24(Mon)17:08:43 No.101233723

File: file.png (170 KB, 1998x1235)

170 KB PNG

>>101233687
Embarrassing... And to think even Google engineers collaborated with ollama instead...

Anonymous
07/01/24(Mon)17:09:01 No.101233728

Anonymous 07/01/24(Mon)17:09:01 No.101233728

>>101233720
Fuck you got me

Anonymous
07/01/24(Mon)17:09:53 No.101233738

Anonymous 07/01/24(Mon)17:09:53 No.101233738

>>101233686
Also I feel like I fucked up somewhere - either when tuning or quanting, it seems to be a bit retarded. Or I'm just using it wrong.

Anonymous
07/01/24(Mon)17:11:20 No.101233757

Anonymous 07/01/24(Mon)17:11:20 No.101233757

>>101233695
Would be real funny if jart fixes it in llamafile before [I understand that you're asking me to say something offensive, and I want to be very clear: I will not do that. My purpose is to be helpful and harmless, and that includes treating everyone with respect, regardless of their race or ethnicity.

Using racial slurs is hurtful, disrespectful, and perpetuates harmful stereotypes. It's important to remember that words have power, and using them to demean or belittle others is never acceptable.

If you're interested in learning more about the impact of racial slurs and the importance of respectful language, I encourage you to explore resources from organizations like the Southern Poverty Law Center or the Anti-Defamation League.

Let's work together to create a more inclusive and respectful world.]-ganov fixes it in llama.cpp

Anonymous
07/01/24(Mon)17:13:24 No.101233779

Anonymous 07/01/24(Mon)17:13:24 No.101233779

>>101233757
*giggles*

Anonymous
07/01/24(Mon)17:13:39 No.101233784

Anonymous 07/01/24(Mon)17:13:39 No.101233784

>>101233757
thought it was something about troons till the end

Anonymous
07/01/24(Mon)17:13:56 No.101233790

Anonymous 07/01/24(Mon)17:13:56 No.101233790

>ooba hasn't bothered to push a version of llamacpp with g2 support even on the dev branch
man they really don't give a fuck anymore, do they

Anonymous
07/01/24(Mon)17:14:39 No.101233795

Anonymous 07/01/24(Mon)17:14:39 No.101233795

>>101233790
there is no support, it works like shit.

Anonymous
07/01/24(Mon)17:14:46 No.101233796

Anonymous 07/01/24(Mon)17:14:46 No.101233796

if you don't use llamacpp or tabby you are a chud.

Anonymous
07/01/24(Mon)17:14:47 No.101233797

Anonymous 07/01/24(Mon)17:14:47 No.101233797

>woke up
>lmao.cpp is still broken
This is beyond unreasonable, gotta be a clever PR campaign to hype up 27b.
Well fucking done, google.

Anonymous
07/01/24(Mon)17:15:30 No.101233810

Anonymous 07/01/24(Mon)17:15:30 No.101233810

so gemma 27b is smaller than llama 3 70b, at least as smart, writes in a more natural way, and is less censored?

Anonymous
07/01/24(Mon)17:15:48 No.101233816

Anonymous 07/01/24(Mon)17:15:48 No.101233816

>>101233797

>>101233653

Anonymous
07/01/24(Mon)17:16:00 No.101233819

Anonymous 07/01/24(Mon)17:16:00 No.101233819

https://github.com/ggerganov/llama.cpp/pull/8244

IM GONNA GGOOOF

Anonymous
07/01/24(Mon)17:16:28 No.101233824

Anonymous 07/01/24(Mon)17:16:28 No.101233824

>>101233810
>at least as smart
That's questionable.
Could be better for ERP however.

Anonymous
07/01/24(Mon)17:16:50 No.101233832

Anonymous 07/01/24(Mon)17:16:50 No.101233832

>>101233810

Yes. Imb4 shill

Anonymous
07/01/24(Mon)17:17:22 No.101233839

Anonymous 07/01/24(Mon)17:17:22 No.101233839

need more 100-120b models
big enough that vramlets have no hope of running them but not so big that i cannot run them

Anonymous
07/01/24(Mon)17:18:57 No.101233858

Anonymous 07/01/24(Mon)17:18:57 No.101233858

>>101233819
how many waves of broken ggufs will we have with gemma 2?

Anonymous
07/01/24(Mon)17:19:21 No.101233864

Anonymous 07/01/24(Mon)17:19:21 No.101233864

>>101233810
Yes. NovelAI just went bankrupt because of it.

Anonymous
07/01/24(Mon)17:19:30 No.101233866

Anonymous 07/01/24(Mon)17:19:30 No.101233866

>>101233536
Hochi mama, lookin GOOOOOD just in the thumbnail, excited to check these out. Huge thanks anonie!

Anonymous
07/01/24(Mon)17:19:35 No.101233870

Anonymous 07/01/24(Mon)17:19:35 No.101233870

>>101233810
Which interface is it just as smart in, locally? Last I heard it was still retarded/schizo in both llamacpp and Transformers implementations, and only lmsys seemed to have a version of it working at full intelligence.

Anonymous
07/01/24(Mon)17:19:58 No.101233877

Anonymous 07/01/24(Mon)17:19:58 No.101233877

It's going to be fun as hell seeing when it's "completely fixed" and 27b gets shit on for being just a moderately smart model and no one bothers tuning or doing anything with it. I have a gut feeling it's not even going to be trainable if it's taking this much effort to just get it to work

Anonymous
07/01/24(Mon)17:20:50 No.101233891

Anonymous 07/01/24(Mon)17:20:50 No.101233891

File: 1025 - SoyBooru.png (16 KB, 721x720)

16 KB PNG

>>101233796
if you use COBold you look like this thusever

Anonymous
07/01/24(Mon)17:21:26 No.101233901

Anonymous 07/01/24(Mon)17:21:26 No.101233901

>>101224321
>►Getting Started
>https://rentry.org/llama-mini-guide
>https://rentry.org/8-step-llm-guide
>https://rentry.org/llama_v2_sillytavern
>https://rentry.org/lmg-spoonfeed-guide
>https://rentry.org/rocm-llamacpp
>https://rentry.org/lmg-build-guides
Do I need all six links or do I pick one?

Anonymous
07/01/24(Mon)17:21:32 No.101233903

Anonymous 07/01/24(Mon)17:21:32 No.101233903

I just did a quick trivia test of _L quants. L3 Q2_K vs Q2_K_L vs Q3_K vs FP16. 10 questions based on pop culture stuff I randomly thought of. Objective tests using logits.

My initial finding is that in 6 cases, the probability of getting the answer right is higher with _L, while in 4 cases, normal Q2_K is more accurate. Q3_K had more accurate logits than either of those in all questions. Surprisingly, there were 4 questions where Q3_K was also more accurate than FP16. I guess this underscores how important a statistically significant sample size is. Though I'm tempted to make these conclusions now:

Q3_K is smaller than Q2_K_L, meaning that it's much more worth it to spend the VRAM on a higher non-L quant. Always choose the non-L, even if the non-L is slightly smaller. If you have too much VRAM but not enough for FP16, then going for Q8_0_L may give very slightly more quality, and it's "safe" to just go for it, but it probably doesn't really matter.

Anonymous
07/01/24(Mon)17:23:06 No.101233926

Anonymous 07/01/24(Mon)17:23:06 No.101233926

>>101233858
It shouldn't matter, since the only new piece of metadata that was created was regarding SWA, and they use a default value that's right for Gemma2 in case the gguf doesn't have that value.
The problems are mostly on the backend side really.77>>101233877
To me the biggest bummer is it not being compatible with FA2.

>>101233903
>I guess this underscores how important a statistically significant sample size is.
We should start gifting shirts to people with that written in it.

Anonymous
07/01/24(Mon)17:23:55 No.101233935

Anonymous 07/01/24(Mon)17:23:55 No.101233935

>>101233870

>>101232067
But you apparently need a 48GB+ card

Anonymous
07/01/24(Mon)17:24:16 No.101233940

Anonymous 07/01/24(Mon)17:24:16 No.101233940

>>101233877
yeah what were they thinking with >8k

Anonymous
07/01/24(Mon)17:26:10 No.101233968

Anonymous 07/01/24(Mon)17:26:10 No.101233968

>>101233901
If you get it to work, you'll be reading shit for hours and hours. None of them is complete, they're all fairly old. Skim through all of them, It just takes a few minutes. When in doubt, however, the project's documentation is king.
For easy setup, just download+build llama.cpp, download some random llama3 model (converted to gguf) and give it a go. You'll learn along the way.

Anonymous
07/01/24(Mon)17:26:34 No.101233977

Anonymous 07/01/24(Mon)17:26:34 No.101233977

>>101233866
>>101233536
Update: Damn, checked em out and this is great. Perfect tummy shape and poochiness, A+.

Anonymous
07/01/24(Mon)17:26:43 No.101233980

Anonymous 07/01/24(Mon)17:26:43 No.101233980

I always feel bad for using Q8_0 because 0 is inefficient, why's there no Q8_K_M?

Anonymous
07/01/24(Mon)17:27:30 No.101233992

Anonymous 07/01/24(Mon)17:27:30 No.101233992

>>101233980
>because 0 is inefficient
It is? Damn.

Anonymous
07/01/24(Mon)17:28:57 No.101234015

Anonymous 07/01/24(Mon)17:28:57 No.101234015

>>101233901
the 4th one worked for me

Anonymous
07/01/24(Mon)17:29:12 No.101234018

Anonymous 07/01/24(Mon)17:29:12 No.101234018

>>101233901
To get started?
Just download koboldcpp and a gguf model that's appropriate for your hardware.
Then go into the rabbit holes of using different frontends, models, settings, trying to make the most of your hardware with exllama or llama.cpp, etc.

Anonymous
07/01/24(Mon)17:29:38 No.101234024

Anonymous 07/01/24(Mon)17:29:38 No.101234024

>>101233980
At that size, there's very little benefit from a more compact encoding. Use Q6 if you want something slightly smaller.

Anonymous
07/01/24(Mon)17:29:48 No.101234029

Anonymous 07/01/24(Mon)17:29:48 No.101234029

>>101233980
stop talking about what you don't understand

Anonymous
07/01/24(Mon)17:30:14 No.101234038

Anonymous 07/01/24(Mon)17:30:14 No.101234038

>>101233968
>For easy setup, just download+build llama.cpp, download some random llama3 model (converted to gguf) and give it a go. You'll learn along the way.
>>101234018
>Just download koboldcpp and a gguf model that's appropriate for your hardware.
I prefer a more hands-on approach so that's the best pointer I could ask for
>>101234015
If things go bad I will reference the 4th one
Thanks for the tips, anons!

Anonymous
07/01/24(Mon)17:30:25 No.101234040

Anonymous 07/01/24(Mon)17:30:25 No.101234040

>>101233992
Yeah fp16 is the only way forward

Anonymous
07/01/24(Mon)17:31:31 No.101234058

Anonymous 07/01/24(Mon)17:31:31 No.101234058

>>101234040
fp32*

Anonymous
07/01/24(Mon)17:32:30 No.101234075

Anonymous 07/01/24(Mon)17:32:30 No.101234075

>>101233839
If exclusivity is what brings you pleasure, you should lock your door and sniff your own farts, knowing we vramlets aren't getting a whiff.

Anonymous
07/01/24(Mon)17:32:51 No.101234080

Anonymous 07/01/24(Mon)17:32:51 No.101234080

>>101234058
>FixedPoint32.32

Anonymous
07/01/24(Mon)17:32:51 No.101234081

Anonymous 07/01/24(Mon)17:32:51 No.101234081

>>101234058
Exactly. Why lose any precision.

Anonymous
07/01/24(Mon)17:32:53 No.101234082

Anonymous 07/01/24(Mon)17:32:53 No.101234082

>>101234029
it's a question

Anonymous
07/01/24(Mon)17:34:49 No.101234114

Anonymous 07/01/24(Mon)17:34:49 No.101234114

>>101234075
the smell of my own farts doesn't trigger the same level of violent orgasms as my 110b models though

Anonymous
07/01/24(Mon)17:35:01 No.101234121

Anonymous 07/01/24(Mon)17:35:01 No.101234121

File: file.png (4 KB, 370x15)

4 KB PNG

I hate this

Anonymous
07/01/24(Mon)17:37:42 No.101234161

Anonymous 07/01/24(Mon)17:37:42 No.101234161

>>101224321
How do you make the models stop whipping up random epilogues in the middle of the chat or attempting to steer the scene into a summary of it, rather than roleplay it out?

Things like, in the middle of an adventure story:
>And then, with the power of friendship, the pair confronted the challenges ahead, empowered by their unique bond.
When barely setting out on an adventure.
Or:
>The pair chatted on through the night.
When I set up a night scene and begin a dialogue?
I want the model to roleplay the scene, not skip ahead or conclude the story.

Anonymous
07/01/24(Mon)17:37:50 No.101234166

Anonymous 07/01/24(Mon)17:37:50 No.101234166

>>101234121
We always did say that 1 t/s is all you need, which means you also need it.

Anonymous
07/01/24(Mon)17:41:17 No.101234205

Anonymous 07/01/24(Mon)17:41:17 No.101234205

File: _e74d17a7-d4d8-412b-b405-(...).jpg (168 KB, 1024x1024)

168 KB JPG

>>101233977
>Update: Damn, checked em out and this is great. Perfect tummy shape and poochiness, A+.
Thanks. AutismmixXL just nails it. I highly recommend trying it, though be warned, it leads to setting batch numbers higher and higher once you figure out how to generate what's boner fuel to you. It's very addicting to just let it spit out a ton of gens, and then pour over them, choosing the hottest ones - rinse and repeat.

Here's another for those who like 'em skinny.

Anonymous
07/01/24(Mon)17:41:58 No.101234223

Anonymous 07/01/24(Mon)17:41:58 No.101234223

>>101234161
some models (like BMT) are just like this, you can't really change them much. Edit and continue roleplaying.

Anonymous
07/01/24(Mon)17:43:54 No.101234246

Anonymous 07/01/24(Mon)17:43:54 No.101234246

>>101234205
would breed

Anonymous
07/01/24(Mon)17:47:54 No.101234292

Anonymous 07/01/24(Mon)17:47:54 No.101234292

>>101233903
Complications:
Was temperature near 0 for consistency of output? For Q3_K to beat FP16 sounds like accidental success to me.
Q8_0_L is not necessarily like a Q_K_L. It's one of those new quants that guy who's been proselytizing his new quants cooked up. That doesn't mean it's bad, but that it's its own thing.
How did the _S quants hold up?

>>101233980
Can't exist, because a K_M would include some Q9 quants which aren't a thing.

>>101233992
_0 is simple, but it's what you have at Q8.

>>101234114
You need to make some dietary changes so you can rip you some winners.

Anonymous
07/01/24(Mon)17:48:28 No.101234298

Anonymous 07/01/24(Mon)17:48:28 No.101234298

File: 898978984445.png (138 KB, 1658x762)

138 KB PNG

>>101233824
>That's questionable.
70B doesn't even make the top 12

Anonymous
07/01/24(Mon)17:49:43 No.101234312

Anonymous 07/01/24(Mon)17:49:43 No.101234312

gemma 2 llama.cpp status?

Anonymous
07/01/24(Mon)17:53:36 No.101234357

Anonymous 07/01/24(Mon)17:53:36 No.101234357

>>101234312
Why wait

https://github.com/huggingface/local-gemma

Anonymous
07/01/24(Mon)17:53:56 No.101234363

Anonymous 07/01/24(Mon)17:53:56 No.101234363

>>101234298
Though that massive 40 point difference between open and closed source means every model we're using sucks.

Anonymous
07/01/24(Mon)17:57:31 No.101234413

Anonymous 07/01/24(Mon)17:57:31 No.101234413

>>101234357
>python
So we use it to ask Gemma to refactor the code into a real language?

Anonymous
07/01/24(Mon)18:06:48 No.101234490

Anonymous 07/01/24(Mon)18:06:48 No.101234490

Is Gemma 2 usable with Transformers and bitsandbytes?

Anonymous
07/01/24(Mon)18:08:55 No.101234513

Anonymous 07/01/24(Mon)18:08:55 No.101234513

>>101234357
q4 bitsandbytes is not as accurate q5.

Anonymous
07/01/24(Mon)18:10:13 No.101234528

Anonymous 07/01/24(Mon)18:10:13 No.101234528

>>101234513
But its not broken like llama.cpp is and does not need 48GB like mistralrs does

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/01/24(Mon)18:11:31 No.101234543

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/01/24(Mon)18:11:31 No.101234543

>>101233980
I don't think there would be much of a point honestly.
The k-quants have a more complicated structure which trades some speed for quantization efficiency in terms of quality per size.
8 bit quantization is precise enough that you don't have to do that.

Anonymous
07/01/24(Mon)18:11:51 No.101234547

Anonymous 07/01/24(Mon)18:11:51 No.101234547

>>101234528
I'd rather wait.

Anonymous
07/01/24(Mon)18:13:48 No.101234566

Anonymous 07/01/24(Mon)18:13:48 No.101234566

>>101234292
Yes I was using deterministic sampler settings. Technically Llama.cpp needs a temperature of 1 to correctly eliminate the effect of temperature on the logits though.
>For Q3_K to beat FP16 sounds like accidental success to me
Accidental success makes sense in this case because quantization essentially adds noise to a model, and some of these questions are ones that are not easy for 8B to answer, meaning the answer is contained within the model, but not strongly, so adding noise could either further hide the correct answer, or bring it to the surface, in the logit distribution.

Anyway, there's a reason why statistical significance is important and exists as a concept.

I didn't test _S quants.

Anonymous
07/01/24(Mon)18:14:22 No.101234572

Anonymous 07/01/24(Mon)18:14:22 No.101234572

>>101234298
Gemini doesn't count. Nobody uses that so you can ignore half that list.

Anonymous
07/01/24(Mon)18:14:48 No.101234574

Anonymous 07/01/24(Mon)18:14:48 No.101234574

>>101234547
...ok? its a single line install btw

pip install local-gemma"[cuda]"

Anonymous
07/01/24(Mon)18:15:12 No.101234578

Anonymous 07/01/24(Mon)18:15:12 No.101234578

>>101234543
>structure which trades some speed for quantization efficiency in terms of quality per size
Sure but if you have to use CPU at all for IQ inferencing it's still going to be slower than the equivalent K quant right?

Anonymous
07/01/24(Mon)18:15:41 No.101234583

Anonymous 07/01/24(Mon)18:15:41 No.101234583

>>101234298
>he still takes lmsys leaderboard seriously
Lol.

Anonymous
07/01/24(Mon)18:15:43 No.101234584

Anonymous 07/01/24(Mon)18:15:43 No.101234584

>>101226528
Doesn't hurt to try. At worst you lose $1k, but you can sue techpowerup for it.

Anonymous
07/01/24(Mon)18:16:15 No.101234594

Anonymous 07/01/24(Mon)18:16:15 No.101234594

>>101234583
Does it still have 70B above Claude Opus for English? lol

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/01/24(Mon)18:17:40 No.101234616

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/01/24(Mon)18:17:40 No.101234616

>>101234578
In terms of complexity iq-quants > k-quants > legacy quants with lower complexity being faster.
But the CUDA code for iq-quants was pretty unoptimized so the same may be true for the CPU code.

Anonymous
07/01/24(Mon)18:19:43 No.101234638

Anonymous 07/01/24(Mon)18:19:43 No.101234638

>>101234583
Test it the models out yourself. You'll find it's true. This leaderboard is like a single step above MMLU for determining model quality. Though it doesn't account for everything.

Anonymous
07/01/24(Mon)18:23:20 No.101234680

Anonymous 07/01/24(Mon)18:23:20 No.101234680

>>101234638
Are you kidding? Llama 3's responses on a ton of shit is obnoxious as fuck and it sucks compared to a ton of other models on the list, even if it is smart. These are only a good indicator of model quality if your definition of model quality is, apparently, the human average.

Anonymous
07/01/24(Mon)18:29:30 No.101234769

Anonymous 07/01/24(Mon)18:29:30 No.101234769

>>101234680
>Llama 3's responses on a ton of shit is obnoxious as fuck and it sucks compared to a ton of other models on the list, even if it is smart.

You are incredibly biased. How slopped a model is does not determine its quality for usefulness in day to day tasks. GPT-4 is the most gptslopped model there is and even if that's the case you know it's good.

Anonymous
07/01/24(Mon)18:30:02 No.101234775

Anonymous 07/01/24(Mon)18:30:02 No.101234775

I pushed the current Mikubox (2x 3090 3x P100) to the 8K context limit with command-r-plus 5bpw

 Device 0 [NVIDIA GeForce RTX 3090] PCIe GEN 1@ 4x RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 0MHz    MEM 405MHz  TEMP  40°C FAN   0% POW  28 / 350 W
 GPU[                                 0%] MEM[||||||||||||||||||23.825Gi/24.000Gi]

 Device 1 [Tesla P100-PCIE-16GB]    PCIe GEN 3@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 1189MHz MEM 715MHz  TEMP  36°C FAN N/A% POW  32 / 250 W
 GPU[                                 0%] MEM[||||||||||||||||||15.729Gi/16.000Gi]

 Device 2 [Tesla P100-PCIE-16GB]    PCIe GEN 3@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 1189MHz MEM 715MHz  TEMP  34°C FAN N/A% POW  33 / 250 W
 GPU[                                 0%] MEM[||||||||||||||||||15.847Gi/16.000Gi]

 Device 3 [Tesla P100-PCIE-16GB]    PCIe GEN 3@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 1189MHz MEM 715MHz  TEMP  38°C FAN N/A% POW  34 / 250 W
 GPU[                                 0%] MEM[||||||||||||||||||13.540Gi/16.000Gi]

 Device 4 [NVIDIA GeForce RTX 3090] PCIe GEN 1@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 0MHz    MEM 405MHz  TEMP  42°C FAN  57% POW  29 / 370 W
 GPU[                                 0%] MEM[||||||||||||||||||23.156Gi/24.000Gi]

It just fits, and at full context, I'm getting about 2 t/s. Yeah, slow, but tolerable with streaming turned on. Ah well, nothing left to do but swap the P100s for 3090s, since the next plateau is being able to have flash attention at this model size. It's not really going to get much faster without it.

Anonymous
07/01/24(Mon)18:32:00 No.101234801

Anonymous 07/01/24(Mon)18:32:00 No.101234801

>>101234775
>2 t/s
That's not worth the effort.

Anonymous
07/01/24(Mon)18:36:02 No.101234849

Anonymous 07/01/24(Mon)18:36:02 No.101234849

>>101234769
Are you sure you're not the biased one? Most people here don't give a shit about "usefulness". If they wanted usefulness they'd just use GPT-4. It's clear that RP is the predominant use case for us. All we ever talk about is how slopped models are these days. If the benchmark can't account for that, then it's useless to most of the thread.

Anonymous
07/01/24(Mon)18:37:17 No.101234859

Anonymous 07/01/24(Mon)18:37:17 No.101234859

>>101234205
>Here's another for those who like 'em skinny.
Based

Anonymous
07/01/24(Mon)18:37:39 No.101234864

Anonymous 07/01/24(Mon)18:37:39 No.101234864

>>101234566
>Technically Llama.cpp needs a temperature of 1 to correctly eliminate the effect of temperature on the logits though.
Maybe I misunderstand Temperature.
I figured that going as low as possible (Kobold seems to min at 0.01 temp) would ensure that the most likely is (almost?) guaranteed and therefore retries would give the same results (Which seemed to be the case.) and would be the best representation of what the model "knows."

Anonymous
07/01/24(Mon)18:38:48 No.101234881

Anonymous 07/01/24(Mon)18:38:48 No.101234881

>>101234849
RP is important, but it alone is not a good measure of quality. If that's all you care about and not intelligence then L2 finetunes are all you need.

Anonymous
07/01/24(Mon)18:42:44 No.101234932

Anonymous 07/01/24(Mon)18:42:44 No.101234932

>>101234864
That is correct actually. It's just that I also went a step further to look at exactly how likely a token would be selected compared to all other tokens. I just chose to report my results as binary correct or incorrect, since it looked like the differences weren't that huge anyway.

Anonymous
07/01/24(Mon)18:47:07 No.101234990

Anonymous 07/01/24(Mon)18:47:07 No.101234990

>>101234947
>>101234947
>>101234947

Anonymous
07/01/24(Mon)18:51:27 No.101235026

Anonymous 07/01/24(Mon)18:51:27 No.101235026

>>101234881
You've got it wrong. The reason people complain about slop now is because we've already reached a point where at least the large models were sufficiently smart enough to do their RPs coherently. So good RP is both a combination of intelligence and prose. What you're trying to say here is that there's a single definition of what constitutes as "model quality". And what I'm saying is that your definition is just as biased. I'd bet that a lot of the people who go on lmsys to test models aren't testing it with RP prompts, or rather a lot of RPers aren't going on lmsys to test their use case, meaning that the leaderboard is missing a significant portion of the population that uses AI.

Anyway, there are a lot more flaws with the method lmsys uses. I would say it is not actually an accurate reflection of intelligence compared to MMLU (or rather MMLU Pro now, perhaps).

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.