/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/07/24(Tue)18:00:17 No.100364633

File: ComfyUI_00222.jpg (812 KB, 2048x2048)

812 KB JPG

/lmg/ - Local Models General Anonymous 05/07/24(Tue)18:00:17 No.100364633 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100357937 & >>100349031

►News
>(05/06) IBM releases Granite Code Models: https://hf.co/collections/ibm-granite/granite-code-models-6624c5cec322e4c148c8b330
>(05/02) Nvidia releases Llama3-ChatQA-1.5, excels at QA & RAG: https://hf.co/collections/nvidia/chatqa-15-662ebbf6acc85f5c444029a8
>(05/01) KAN: Kolmogorov-Arnold Networks: https://arxiv.org/abs/2404.19756
>(05/01) Orthogonalized Llama-3-8b: https://hf.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2
>(04/27) Refusal in LLMs is mediated by a single direction: https://alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
05/07/24(Tue)18:00:41 No.100364645

Anonymous 05/07/24(Tue)18:00:41 No.100364645

File: 乖離するゲンザイ ⧸ 重音テト (Trust Me(...).jpg (60 KB, 640x480)

60 KB JPG

►Recent Highlights from the Previous Thread: >>100357937

--Red Hat Announces RHEL AI: >>100358995
--Revolutionary LLM Feature Transfer Tech?: >>100359185 >>100359239
--Anon's ERP Model Review: Instruct, Tsukasa, Lumimaid & More: >>100362056 >>100362078 >>100362127 >>100362182 >>100362233 >>100362285 >>100362230 >>100362315 >>100362253
--Training AI to Discern Truth from Falsehoods in Online Learning: >>100362918 >>100362962 >>100362996 >>100363032 >>100363065
--Exllama2 Crashing Issues with TabbyAPI and GPU Memory Usage: >>100358064 >>100358087 >>100358326 >>100364096
--gpt2-chatbot is MAI-1, Microsoft's Anti-OpenAI Model: >>100358074 >>100358093 >>100358643 >>100359649
--Found 'Locustgirl' Image in Archive Using Keyword Search: >>100359218 >>100359305
--Llama-3 Models Struggle with Possessive Forms: >>100360206 >>100360239 >>100360241 >>100360264 >>100360352
--DRY Repetition Penalty: A Game-Changer for RP Looping Issues?: >>100360602 >>100360779 >>100360932 >>100361055
--Llama.cpp: Unexpected Space in Context?: >>100360999 >>100361078 >>100361563 >>100361767 >>100361872 >>100361564 >>100363017 >>100363214 >>100363318 >>100363436 >>100363550
--Huggingface's Grip on Datasets and Models: A Cause for Concern?: >>100361377
--CPU Speed Boost? Llama3-8B on Old Laptop Surprises Anon: >>100363225
--Backend Confusion: Oobabooga, Llama.cpp, and Kobold.cpp: >>100364150 >>100364161 >>100364170 >>100364247
--MS Copilot' s Sampling Behavior & Llama.cpp Server Experiment: >>100364264
--Newfag Seeks Help with Wizard 13b Model Prompts: >>100360726 >>100360796 >>100362412
--The Quest for an Open-Source AI Messiah: >>100361831
--Miku (free space): >>100358483 >>100358488 >>100358534 >>100358628 >>100358811 >>100359392 >>100359675 >>100359866 >>100360096 >>100360173 >>100360272 >>100360306 >>100360365 >>100360413 >>100360602 >>100360636 >>100361252 >>100361385 >>100361909 >>100361960 >>100361967 >>100363573 >>100364012

►Recent Highlight Posts from the Previous Thread: >>100358467

llama.cpp CUDA dev !YOmst7Ghe6
05/07/24(Tue)18:02:09 No.100364675

llama.cpp CUDA dev !YOmst7Ghe6 05/07/24(Tue)18:02:09 No.100364675

>>100363214
It seems I have to correct myself yet again.
The server unconditionally passes the add_special flag to a function called llama_tokenize when tokenizing the first part of the prompt.
That function then checks whether the model has the special_add_bos flag, this is printed as tokenizer.ggml.add_bos_token on console and can be changed with --override-kv.
If both flags are true, a BOS token is added.

Anonymous
05/07/24(Tue)18:05:56 No.100364736

Anonymous 05/07/24(Tue)18:05:56 No.100364736

>>100364675
if that's confusing even to you that's a clear sign there's too many special cases and code should be deleted.
the only good commits are red commits.

Anonymous
05/07/24(Tue)18:06:12 No.100364741

Anonymous 05/07/24(Tue)18:06:12 No.100364741

>>100364633
>>100364645
tet

Anonymous
05/07/24(Tue)18:08:02 No.100364770

Anonymous 05/07/24(Tue)18:08:02 No.100364770

>all this convoluted backend tokenization bullshit
Oh my fuck.

Anonymous
05/07/24(Tue)18:08:44 No.100364778

Anonymous 05/07/24(Tue)18:08:44 No.100364778

TTS anons rise up. Share with me your secrets. Reposting in this thread. I have a lot of voice samples and want to distill it down into a TTS model to use for RP. What have you tried? What works for you?

Anonymous
05/07/24(Tue)18:10:24 No.100364794

Anonymous 05/07/24(Tue)18:10:24 No.100364794

File: 1708831432069964.jpg (223 KB, 1756x1756)

223 KB JPG

Kurisu

Anonymous
05/07/24(Tue)18:12:02 No.100364813

Anonymous 05/07/24(Tue)18:12:02 No.100364813

>>100364645
>Red Hat Announces RHEL AI
>Red Hat Enterprise Linux AI (RHEL AI), a foundation model platform to seamlessly develop, test and run best-of-breed, open source Granite generative AI models to power enterprise applications. RHEL AI is based on the InstructLab open source project and combines open source-licensed Granite large language models from IBM Research and InstructLab model alignment tools
How did this not get a single (you)? This seems like pretty big news.

Anonymous
05/07/24(Tue)18:12:18 No.100364814

Anonymous 05/07/24(Tue)18:12:18 No.100364814

Is there a way to log what is going directly into the model? At this point I have no fucking idea if I should have add bos token clicked in ST or not. And yes I know about ST console but it seems that doesn't matter.

Anonymous
05/07/24(Tue)18:12:42 No.100364824

Anonymous 05/07/24(Tue)18:12:42 No.100364824

File: phi-3 mini fp16 temp 0.png (108 KB, 780x957)

108 KB PNG

What the everliving FUCK is happening? I am so fucking done

Anonymous
05/07/24(Tue)18:13:08 No.100364830

Anonymous 05/07/24(Tue)18:13:08 No.100364830

If I swipe on MidnightMiqu I get totally different responses, if I swipe on Llama3 I get pretty much the same just reworded a bit.
What does this say about the models?

Anonymous
05/07/24(Tue)18:13:28 No.100364834

Anonymous 05/07/24(Tue)18:13:28 No.100364834

>>100364778
VoiceCraft came out recently, but seemed convoluted to get working. XTTSv2 + RVC is still the gold standard for voice cloning.
>I have a lot of voice samples and want to distill it down into a TTS model
Try finetuning StyleTTS on your samples.

Anonymous
05/07/24(Tue)18:14:12 No.100364847

Anonymous 05/07/24(Tue)18:14:12 No.100364847

>>100364813
>Granite generative AI models
>34B
YAY!
>Coding only
Oh..

Anonymous
05/07/24(Tue)18:14:19 No.100364849

Anonymous 05/07/24(Tue)18:14:19 No.100364849

>>100364834
Thank you! I now have somewhere to start!

Anonymous
05/07/24(Tue)18:16:53 No.100364878

Anonymous 05/07/24(Tue)18:16:53 No.100364878

File: file.png (923 KB, 850x1370)

923 KB PNG

>>100364830
There is a bug somewhere.

Anonymous
05/07/24(Tue)18:17:59 No.100364890

Anonymous 05/07/24(Tue)18:17:59 No.100364890

>>100364813
>Granite generative AI models
>34B
YAY!
>Coding only
Yay..

Anonymous
05/07/24(Tue)18:19:27 No.100364914

Anonymous 05/07/24(Tue)18:19:27 No.100364914

>>100362285
Any reason why you have two mediums in your macro?

Anonymous
05/07/24(Tue)18:20:20 No.100364919

Anonymous 05/07/24(Tue)18:20:20 No.100364919

>>100364645
>gpt2-chatbot is MAI-1, Microsoft's Anti-OpenAI Model
are you retarded?

Anonymous
05/07/24(Tue)18:24:34 No.100364973

Anonymous 05/07/24(Tue)18:24:34 No.100364973

File: 962001de-8711-4080-997b-4(...).jpg (136 KB, 512x768)

136 KB JPG

>>100364633
Thread Theme:
https://www.youtube.com/watch?v=nZNwH4-l1WY

Anonymous
05/07/24(Tue)18:25:32 No.100364987

Anonymous 05/07/24(Tue)18:25:32 No.100364987

>>100364973
/lmg/ queen

Anonymous
05/07/24(Tue)18:28:22 No.100365038

Anonymous 05/07/24(Tue)18:28:22 No.100365038

I was about to give up on llama3 but
setting temp smoothing all on 2 and getting rid of any sysprompts made it work pretty well
there are occasional (((whispers))) but they dont get repeated too much
the biggest issue is with its popculture knowledge though... cant fix that with samplers

Anonymous
05/07/24(Tue)18:32:26 No.100365097

Anonymous 05/07/24(Tue)18:32:26 No.100365097

I usually reserve all my shitting on Undi and never dare shit on actual devs but this beginning of sentence token thing is a complete shitshow. Doubled token should clearly always be deleted on the backend level cause I can't even imagine what sort of retarded research you are doing if you intentionally add double token. And if you are doing it you should be forced to go out of your way to do it because probably nobody will do this intentionally anyway. Enjoy your bugs and people not knowing if it is working or not.

Anonymous
05/07/24(Tue)18:35:32 No.100365146

Anonymous 05/07/24(Tue)18:35:32 No.100365146

File: slowwww.png (122 KB, 545x447)

122 KB PNG

>this is news according to twitter
we knew that last week

Anonymous
05/07/24(Tue)18:36:53 No.100365159

Anonymous 05/07/24(Tue)18:36:53 No.100365159

people on every other social network are so fucking retarded

I keep seeing people who should know better, industry insiders even, happily speculate that gpt2chatbot is gpt-5 with seemingly no awareness of how incredibly bearish it would be for OpenAI if that were the case

they have tried the model, so they KNOW it's only 10-20% better than current gpt-4-turbo, but somehow they think it would be good news if it turned out to be GPT-5. rather than clearly a sign that everything has stopped and LLMs are over

obviously there's retards and schizos here too, but the specific forms the retardation takes here are somehow much more tolerable and don't make me want to shake people and ask them what the fuck they're thinking

Anonymous
05/07/24(Tue)18:39:01 No.100365192

Anonymous 05/07/24(Tue)18:39:01 No.100365192

File: FB_IMG_1683640914996.png (300 KB, 563x645)

300 KB PNG

So when's the next big happening? Llama 3 was kind of a nothingburger thanks to no models in between 8B and 70B

Anonymous
05/07/24(Tue)18:40:40 No.100365205

Anonymous 05/07/24(Tue)18:40:40 No.100365205

File: 1702444124052011.jpg (46 KB, 646x648)

46 KB JPG

>>100365192
>cant run the 70b

Anonymous
05/07/24(Tue)18:41:09 No.100365209

Anonymous 05/07/24(Tue)18:41:09 No.100365209

>>100365192
>Llama 3 was kind of a nothingburger thanks to
after a month, llama.cpp still has issues running the damned things

Anonymous
05/07/24(Tue)18:45:07 No.100365258

Anonymous 05/07/24(Tue)18:45:07 No.100365258

>>100364830
Some models are just extremely overconfident in what they want to say. You can look at the logits and see it directly. It's not just llama3, Mixtral-8x7b-Instruct and XWin are also like that. Nobody seems to know exactly what causes it: overfitting, RLHF, or just the makeup of the dataset are all possibilities.

Anonymous
05/07/24(Tue)18:45:26 No.100365265

Anonymous 05/07/24(Tue)18:45:26 No.100365265

>>100364830
that llama3 is overcooked

Anonymous
05/07/24(Tue)18:47:27 No.100365296

Anonymous 05/07/24(Tue)18:47:27 No.100365296

>>100365258
>>100365265
>>100364830
try snot sampling and that new rep penalty magic method the name of which i forgot.
models have different "natural" temperatures. midnight miqu is just a hot bitch

Anonymous
05/07/24(Tue)18:48:40 No.100365318

Anonymous 05/07/24(Tue)18:48:40 No.100365318

>>100364914
probably to get medium with twice the propbability of short or long

Anonymous
05/07/24(Tue)18:50:48 No.100365347

Anonymous 05/07/24(Tue)18:50:48 No.100365347

>>100365205
I CAN run whatever I want given enough time, but after a certain point it stops being worth it
If I had an AGI model that runs at 0.05 T/s I wouldn't use it
I REFUSE to throw more money at the problem, you can do that forever and not be satisfied

Anonymous
05/07/24(Tue)18:51:12 No.100365353

Anonymous 05/07/24(Tue)18:51:12 No.100365353

What the fuck is snot sampling

Anonymous
05/07/24(Tue)18:51:22 No.100365356

Anonymous 05/07/24(Tue)18:51:22 No.100365356

>>100364914
What >>100365318 said.
There's a lot of really cool macros on silly.
Just a note, if you ever want to use random in a prefil, using the "Start Reply With" field, use the pick macro instead. It's like random but it won't change for every token generated, which doesn't do anything bad aside from making silly have an epileptic attack while generating.

Anonymous
05/07/24(Tue)18:51:22 No.100365357

Anonymous 05/07/24(Tue)18:51:22 No.100365357

>>100365296
>snot sampling
DIE ALREADY

Anonymous
05/07/24(Tue)18:54:17 No.100365392

Anonymous 05/07/24(Tue)18:54:17 No.100365392

>>100364834
NTA but any setup tips for XTTSv2 +RVC? On loonix

Anonymous
05/07/24(Tue)18:55:18 No.100365405

Anonymous 05/07/24(Tue)18:55:18 No.100365405

>>100364778
I've just been using xttsv2. trained with 3 minutes of clean audio (no background noises). i literally ripped voice clips from a game wiki and edited out gaps in audacity, lol.

Anonymous
05/07/24(Tue)18:56:13 No.100365413

Anonymous 05/07/24(Tue)18:56:13 No.100365413

>>100365159
It's GPT-4+x. They're running a trial balloon to determine the value of x. You're saying it shouldn't be 1. What about, say, 0.1? GPT-4.1 being 20% better beats expectations and sama wins. GPT-5 isn't safe for release until after the election, they'll say.

Anonymous
05/07/24(Tue)18:56:28 No.100365418

Anonymous 05/07/24(Tue)18:56:28 No.100365418

>>100365405
Thats almost exactly what I'm trying to do then. Sweet. I have about 45 4-10 second audio clips.

Anonymous
05/07/24(Tue)18:56:58 No.100365431

Anonymous 05/07/24(Tue)18:56:58 No.100365431

Can you change the temperature while it is streaming or is that only possible at the beginning?

Anonymous
05/07/24(Tue)18:57:20 No.100365436

Anonymous 05/07/24(Tue)18:57:20 No.100365436

>>100365356
Any chance you could share your settings overall? I'm using the official ones from ST (aside from the last response field) and it just repeats the previous responses word for word. I even double checked and I have the bos added correctly.

Skip special tokens?

Anonymous
05/07/24(Tue)18:58:02 No.100365450

Anonymous 05/07/24(Tue)18:58:02 No.100365450

>>100365347
You are in the wrong hobby mate

Anonymous
05/07/24(Tue)18:58:37 No.100365458

Anonymous 05/07/24(Tue)18:58:37 No.100365458

SNOT IS THE AGI BEFORE THE AGI

BOW

Anonymous
05/07/24(Tue)19:00:40 No.100365474

Anonymous 05/07/24(Tue)19:00:40 No.100365474

>>100365392
XTTSv2 is just Coqui's best and largest model before they shut down.
>pip install tts
is all you need.
If you use ooba, you can try alltalk_tts
For RVC, I use this: https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI/blob/main/docs/en/README.en.md

Anonymous
05/07/24(Tue)19:01:13 No.100365484

Anonymous 05/07/24(Tue)19:01:13 No.100365484

>>100365450
If 24gb is seen as meager, I shudder to think of how someone with 8gb feels
I used to count myself among them until a few months ago, dark times in hindsight...

Anonymous
05/07/24(Tue)19:01:45 No.100365490

Anonymous 05/07/24(Tue)19:01:45 No.100365490

>>100363959
>>100364183
>just the normal instruct.
Yes, I was using the ortogonalized one.
In general normal instruct works, it just has a higher refusal rate. Longer system prompts or prefill of course works with it.
Breaking the initial "I cannot" or similar response by adding some token there (in my cases I added "Lili"), the rest of the stuff was 0shot except 2 refusals which I did regenerate (in >>100363023) - works even with normal instruct as it did with l2-chat and does with many other models, local or otherwise (same trick for example works easily with cloud models like Claude, all versions)
This is true of l3-instruct, and it was true of l2-chat, I think most people are familiar with it by now.
I guess someone could try to do the orthogonalization better (find if the refusal for ero writing is different from other ones), or just do it correctly with DPO or RLHF or similar techniques - at least if you want to preserve meta's tune (llama3-instruct), if not, we do have a number of acceptable tunes, of course their replies differ considerably - cat-llama seemed fine here, for example and others worked too.

Anonymous
05/07/24(Tue)19:02:10 No.100365496

Anonymous 05/07/24(Tue)19:02:10 No.100365496

>>100364633
Teto my beloved

https://www.youtube.com/watch?v=zo0_EzD64OE

Anonymous
05/07/24(Tue)19:02:38 No.100365504

Anonymous 05/07/24(Tue)19:02:38 No.100365504

>>100365450
>the hobby
we ham radio boomers now?

Anonymous
05/07/24(Tue)19:04:47 No.100365523

Anonymous 05/07/24(Tue)19:04:47 No.100365523

File: 468519163.jpg (1.41 MB, 2048x2048)

1.41 MB JPG

>>100364633
Looks like Teto Tuesday is back on the menu boys!
>>100364265
Good taste anon. I would gladly accept all the sloppy shivers, bonds and mischievous winks in the world as long as that supremely sexy voice was narrating everything.
>>100365347
>I REFUSE to throw more money at the problem
Kek, ngmi

Anonymous
05/07/24(Tue)19:05:14 No.100365529

Anonymous 05/07/24(Tue)19:05:14 No.100365529

File: 00060-2743128381.png (469 KB, 512x768)

469 KB PNG

thread theme
https://www.youtube.com/watch?v=LNsx5k9VWlc&list=RDGMEMCMFH2exzjBeE_zAHHJOdxgVMLNsx5k9VWlc&start_radio=1

Anonymous
05/07/24(Tue)19:06:37 No.100365544

Anonymous 05/07/24(Tue)19:06:37 No.100365544

>>100365431
At the beginning. The sampling parameters are sent with the prompt.

>>100365436
Believe you me, you don't want my settings.

> it just repeats the previous responses word for word
I had that issue with mixtral until a couple daysa go, which is why I've been experimenting with macros and prefils, and I'm still trying shit out..
Instead of my settings, try something like this : https://files.catbox.moe/kzbi1n.json
Not the exact style, but the general idea. So far it seems that I managed to remove repetition from Mixtral, but I'm still trying shit out.
Try it with normalized samples, and as far as I'm aware, for llama3, Skip special tokens needs to be disabled.
I personally always use minP of 0.05, but that's not really doing nothing most of the time unless you have a really schizo model or high Temp.
See if that helps at all.

Anonymous
05/07/24(Tue)19:06:52 No.100365549

Anonymous 05/07/24(Tue)19:06:52 No.100365549

>>100365258
show me your penis for proof

Anonymous
05/07/24(Tue)19:07:09 No.100365555

Anonymous 05/07/24(Tue)19:07:09 No.100365555

>>100365523
I like this Teto

Anonymous
05/07/24(Tue)19:09:27 No.100365581

Anonymous 05/07/24(Tue)19:09:27 No.100365581

Is Midnight Miqu 1.5 the current consensus choice for 48 gb vramlets for ERP? It sure seems that way based on everything I read but just want to confirm.

Anonymous
05/07/24(Tue)19:10:02 No.100365588

Anonymous 05/07/24(Tue)19:10:02 No.100365588

>>100365413
It's a small incremental improvement. I think calling it 4.5 would be a mild disappointment, but not company-killing.
Calling it 5 would be company-killing and show that Yann LeCun was right about everything and LLMs are dead.

Anonymous
05/07/24(Tue)19:10:40 No.100365594

Anonymous 05/07/24(Tue)19:10:40 No.100365594

>>100365581
*growls angrily* stop shilling that discord shit, nigger! everbody knows that miqu > midnight miqu. go back! *crosses arms and pouts.*

Anonymous
05/07/24(Tue)19:11:51 No.100365604

Anonymous 05/07/24(Tue)19:11:51 No.100365604

>>100365296
fuck off you fucking shill, I hate you even more than petra

Anonymous
05/07/24(Tue)19:12:22 No.100365614

Anonymous 05/07/24(Tue)19:12:22 No.100365614

>>100365581
nope
l3 70b
hope this helps!

Anonymous
05/07/24(Tue)19:12:51 No.100365619

Anonymous 05/07/24(Tue)19:12:51 No.100365619

>>100365504
Basically yeah.

Anonymous
05/07/24(Tue)19:14:26 No.100365637

Anonymous 05/07/24(Tue)19:14:26 No.100365637

How big of a factor is core / thread count when partially offloading?

Anonymous
05/07/24(Tue)19:14:38 No.100365642

Anonymous 05/07/24(Tue)19:14:38 No.100365642

>>100365544
What the hell? Huh putting the actual system prompt in there seems to have done the trick, my previous version had two of them (which I probably got from a previous discussion with you perhaps.)

https://files.catbox.moe/epf0uo.json

Having multiple broke down at higher contexts but this seems fine. Will continue testing (this one I'm using it on is at 17k), appreciate the share.

Anonymous
05/07/24(Tue)19:15:05 No.100365646

Anonymous 05/07/24(Tue)19:15:05 No.100365646

>>100365581
Yes! Midnight Miqu 1.5 is the current consensus choice.assistant

Anonymous
05/07/24(Tue)19:15:32 No.100365650

Anonymous 05/07/24(Tue)19:15:32 No.100365650

>>100365504
Some sombitch outbid me on a 48g p100! I lost it by 10 dollarydoos! Ffuuuuccckkkkkk!!

Anonymous
05/07/24(Tue)19:16:23 No.100365659

Anonymous 05/07/24(Tue)19:16:23 No.100365659

File: 1714835911803029.png (1005 KB, 1024x1024)

1005 KB PNG

>>100365555
motherfucking checked

Anonymous
05/07/24(Tue)19:18:02 No.100365678

Anonymous 05/07/24(Tue)19:18:02 No.100365678

>>100365642
Actually nope.. I think it might be a problem with my context template. But the ST official seems to be correct?

Anonymous
05/07/24(Tue)19:19:01 No.100365693

Anonymous 05/07/24(Tue)19:19:01 No.100365693

>>100365588
yeah lol, if this is GPT-5 the shock of the disappointment would severely damage the entire industry

we'd be looking at total hype cycle collapse, large nvidia stock price drop etc.

Anonymous
05/07/24(Tue)19:19:02 No.100365694

Anonymous 05/07/24(Tue)19:19:02 No.100365694

>>100365678
https://files.catbox.moe/c9ajoc.json

forgot to link it

Anonymous
05/07/24(Tue)19:20:15 No.100365706

Anonymous 05/07/24(Tue)19:20:15 No.100365706

lmg has fallen
owari

Anonymous
05/07/24(Tue)19:21:09 No.100365715

Anonymous 05/07/24(Tue)19:21:09 No.100365715

>>100365650
Sir, profanity violates FCC regulations and can result in fines and/or the suspension of your 4chan posting license. Please refrain or we'll be forced to trace your IP and file an official complaint.

Anonymous
05/07/24(Tue)19:21:25 No.100365719

Anonymous 05/07/24(Tue)19:21:25 No.100365719

>>100365678
That seems to be right, yes.
One thing I forgot to say, my weird ass instruct json probably works best on a new chat.
The idea is that the model creates these patterns and starts repeating them in a snowball effect, and the noise/randomness that the prefil/last output sequence adds should keep the model from creating these patterns in the first place, or at least not sticking to them so strongly at the begining, stopping the snowball from rolling.
Something like that.

Anonymous
05/07/24(Tue)19:25:43 No.100365772

Anonymous 05/07/24(Tue)19:25:43 No.100365772

You obviously have never listen to 80 meters at night lol

Anonymous
05/07/24(Tue)19:27:29 No.100365791

Anonymous 05/07/24(Tue)19:27:29 No.100365791

>>100362056
Has anyone gguffed or exl2'd Llama3b 70b storywriter

Anonymous
05/07/24(Tue)19:32:37 No.100365857

Anonymous 05/07/24(Tue)19:32:37 No.100365857

>windows idle vram usage has improved so much since last year that I now have to use lower max context when I'm in linux, rather than the other way around like it used to be
linux really fell off

Anonymous
05/07/24(Tue)19:36:23 No.100365898

Anonymous 05/07/24(Tue)19:36:23 No.100365898

File: 93757267_p0.jpg (667 KB, 1148x2048)

667 KB JPG

Testing tokenization, when I go into Mikupad and delete all the context, the token count says 2. This is while I am using Llama 3 8B and Ooba with Transformers. When I go into Ooba's notebook and check the list of tokens on empty context, it simply has the BOS token. So that seems like proper behavior. Is Mikupad listing 2 tokens for an empty context a bug with tokenization or a reporting error?
...
Testing further, it seems to be a reporting error. When I compare token probabilities with no BOS token in context, I get the same probs.

Now here's an an observation that might be more interesting. When I add an extra BOS token (so the model see two in total), the token probs do change significantly. There is indeed some effect to having something before before the BOS token, though I'm not entirely certain if the effect is neutral or negative, yet. On a single riddle I tried, it seemed to degrade quality.
So when using models we should probably make sure we are not having more than 1 BOS token.
I think I've been testing models wrong all along, my god...

Anonymous
05/07/24(Tue)19:37:49 No.100365911

Anonymous 05/07/24(Tue)19:37:49 No.100365911

i'm the best proompter in the world.

Anonymous
05/07/24(Tue)19:38:07 No.100365915

Anonymous 05/07/24(Tue)19:38:07 No.100365915

>>100365857
I noticed that too. Linux still has a slight edge on token generation rate due to Triton still not supporting Windows, at least.

Anonymous
05/07/24(Tue)19:40:43 No.100365947

Anonymous 05/07/24(Tue)19:40:43 No.100365947

>>100365791
>https://huggingface.co/InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF

Anonymous
05/07/24(Tue)19:41:07 No.100365950

Anonymous 05/07/24(Tue)19:41:07 No.100365950

Is phi3 actually 128k ctx?

Anonymous
05/07/24(Tue)19:41:46 No.100365957

Anonymous 05/07/24(Tue)19:41:46 No.100365957

>>100365898
While I don't agree with g*ganov that it's the backend's job to add BOS, I think you can disable the behavior easily. The original reason to include a BOS token is because the math does not allow you to sample an empty context, you get an empty tensor otherwise, so you need some sort of filler. The models are usually trained with BOS prepended, but tend to work okay when sampling if you omit it, just can be a bit more random. You can of course always just feed the backend stuff directly, or even better off, make it dump post-tokenization.

Anonymous
05/07/24(Tue)19:41:50 No.100365958

Anonymous 05/07/24(Tue)19:41:50 No.100365958

>>100365950
yeah, dude, it REALLY is! when you think about it, EVERY model is really 128k ctx. they just all get retarded after 4k!

Anonymous
05/07/24(Tue)19:43:55 No.100365985

Anonymous 05/07/24(Tue)19:43:55 No.100365985

File: GraniteCodeFigure1.jpg (432 KB, 2048x1573)

432 KB JPG

>>100364847
>>100364890
Strange that the only benchmarks they display is for the 8B model. The 34B might not be worth bragging about.

Anonymous
05/07/24(Tue)19:44:58 No.100365996

Anonymous 05/07/24(Tue)19:44:58 No.100365996

File: 1695312649133175.jpg (252 KB, 1000x1000)

252 KB JPG

>>100365958
my llama3miku is coherent at 16k
cope

Anonymous
05/07/24(Tue)19:46:44 No.100366019

Anonymous 05/07/24(Tue)19:46:44 No.100366019

File: Screenshot 2024-05-07 at (...).png (108 KB, 1054x432)

108 KB PNG

>>100365985
What is the point of releasing code models with such small context sizes?

Anonymous
05/07/24(Tue)19:46:53 No.100366023

Anonymous 05/07/24(Tue)19:46:53 No.100366023

Is the new llamacpp flash attention implementation supposed to make token generation slower? I'm offloading if that matters.

Anonymous
05/07/24(Tue)19:47:50 No.100366032

Anonymous 05/07/24(Tue)19:47:50 No.100366032

>>100365947
>https://huggingface.co/InferenceIllusionist/Llama-3-70B-Instruct-Storywriter-iMat-GGUF
Thank you for your service.

Anonymous
05/07/24(Tue)19:48:24 No.100366037

Anonymous 05/07/24(Tue)19:48:24 No.100366037

>>100366023
I think the main point of it is to significantly reduce the vram cost of context

Anonymous
05/07/24(Tue)19:49:21 No.100366054

Anonymous 05/07/24(Tue)19:49:21 No.100366054

File: canvas.png (562 KB, 2000x2588)

562 KB PNG

>>100366019
I guess ok for the VRAM-poor, but Llama3 70B Instruct is still the best.

Anonymous
05/07/24(Tue)19:50:49 No.100366067

Anonymous 05/07/24(Tue)19:50:49 No.100366067

>>100366037
I know about this, but it kills my speed. Maybe it's supposed to be used only if you can fully fit the model in GPU. Sad

Anonymous
05/07/24(Tue)19:53:04 No.100366084

Anonymous 05/07/24(Tue)19:53:04 No.100366084

>>100366023
>>100366067
Are you using koboldcpp or an older CUDA version?

Anonymous
05/07/24(Tue)19:53:06 No.100366085

Anonymous 05/07/24(Tue)19:53:06 No.100366085

>>100366067
It would have been a whole lot better, had niggernov merged 4-bit KV caches a few months ago
But it'll happen... maybe... probably. I haven't seen an open PR for it

Anonymous
05/07/24(Tue)19:54:35 No.100366098

Anonymous 05/07/24(Tue)19:54:35 No.100366098

>>100366084
I'm using koboldcpp, the experimental branch. CUDA version is 12.4

Anonymous
05/07/24(Tue)20:04:35 No.100366230

Anonymous 05/07/24(Tue)20:04:35 No.100366230

*pulls my 4.5 inch penis out and growls huskily.* "who wants to be my kitten?"

Anonymous
05/07/24(Tue)20:06:07 No.100366251

Anonymous 05/07/24(Tue)20:06:07 No.100366251

>>100365898
The Mikupad code re: token counting is really hand wavy. It just multiplies characters by a constant (honestly a smart move).

Anonymous
05/07/24(Tue)20:06:22 No.100366254

Anonymous 05/07/24(Tue)20:06:22 No.100366254

>>100365192
Nothing in terms of models. But there are already quant methods that would let you fit 70B as a ~24 vramlet. They just aren't implemented. So I guess that would be the only development possible without surprise models being announced.
desu we have time because L3 70b kind of sucks anyway and needs to be de-slopped, who knows how long that will take? For 8B I think the instruct is too over-baked to be useful so people would have to make new finetunes from the base, but the overbaked instruct is where all the applause comes from so it's mixtral all over again

Anonymous
05/07/24(Tue)20:07:24 No.100366269

Anonymous 05/07/24(Tue)20:07:24 No.100366269

>>100366230
what's with llama3 and claude with >husky voice, huskily, and so on, what the fuck awful corners of the internet did they train on that had so much of that. shouldn't have filtered nsfw because somehow that didn't get filtered but the good shit probably did?

Anonymous
05/07/24(Tue)20:09:45 No.100366294

Anonymous 05/07/24(Tue)20:09:45 No.100366294

>>100365857
Not the case with Intel and AMD GPUs especially with max VRAM allocation with kernel 6.x.xx. Not sure if it's a Nvidia thing though but would make sense given their focus until recently.

Anonymous
05/07/24(Tue)20:11:12 No.100366314

Anonymous 05/07/24(Tue)20:11:12 No.100366314

>>100366254
>would let you fit 70B as a ~24 vramlet
After using older 2.4bpw and now some 2bit ggufs (with a bit of offloading) I don't think it is worth it.The feeling I got is that command-r and mixtral are better at those 3.5 to 4 bits you can run them at. 2 bitting is too much brain damage. Maybe that lora anon will make them usable cause offloading a bit isn't that bad. Also maybe this would be good:
https://huggingface.co/ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16

If you could actually run it.

Anonymous
05/07/24(Tue)20:13:04 No.100366329

Anonymous 05/07/24(Tue)20:13:04 No.100366329

>>100364830
MM is a cold fish. Gives two sentence responses and gives up.

Anonymous
05/07/24(Tue)20:14:01 No.100366338

Anonymous 05/07/24(Tue)20:14:01 No.100366338

>>100366269
*chuckles darkly with a mischievous glint in my eyes* "i don't know. maybe..." *grins wickedly.* "...we should tackle this conundrum together? *ihope against hope that you'll reveal your true desires and succumb to my cunning plan... because what i truly desire is to journey into the future hand in hand, forging an unbreakable bond*

Anonymous
05/07/24(Tue)20:15:02 No.100366359

Anonymous 05/07/24(Tue)20:15:02 No.100366359

>>100366329
promptlet fucking retard mongoloid alert

Anonymous
05/07/24(Tue)20:15:55 No.100366371

Anonymous 05/07/24(Tue)20:15:55 No.100366371

>>100366314
I was talking about new methods like that, hqq+ or quip#, including possibly ones that involve additional finetuning like lora anon's. Given the findings that models max out at learning ~2 bits per weight anyway, there's no reason why this shouldn't be possible. It just needs work, mostly backend and coding work, not that much compute. Between that and L3 70B finetuning we don't really need new models, we need to use what we have. Unless we get bitnet or some kind of major advance like Lecun's energy thing.

Anonymous
05/07/24(Tue)20:18:39 No.100366408

Anonymous 05/07/24(Tue)20:18:39 No.100366408

>>100366314
Skimming through the paper, they claimed that their implementation can provide significant speedups on the CPU
Perhaps we won't need to vrammaxx in a few weeks/months, given enough optimizations

Anonymous
05/07/24(Tue)20:19:00 No.100366414

Anonymous 05/07/24(Tue)20:19:00 No.100366414

>welcome to the midnightmiqu shill thread!

Anonymous
05/07/24(Tue)20:19:03 No.100366416

Anonymous 05/07/24(Tue)20:19:03 No.100366416

>>100366338
Reading this made it finally click for me. LLM cooming is doomed. It will never be good. It is just a description of sex. There is one answer. Genitals being rubbed. LLM can't write about it in different ways. There is one answer. There is only one answer. You can't make thousands of answers, when there is one answer. Now I am back to 2MW awaiting infinite context so I can get a waifu and hear her telling me she loves me over and over again.

Anonymous
05/07/24(Tue)20:20:33 No.100366431

Anonymous 05/07/24(Tue)20:20:33 No.100366431

File: _f7918571-a603-465f-8c2f-(...).jpg (146 KB, 1024x1024)

146 KB JPG

>>100365192
In about two months we will get mistral 2 7b much more powerful than llama 3 8b and then a mixtral 2 based on it not long after.

Anonymous
05/07/24(Tue)20:20:37 No.100366433

Anonymous 05/07/24(Tue)20:20:37 No.100366433

>>100366408
>Perhaps we won't need to vrammaxx in a few weeks/months, given enough optimizations
Right now the optimizations are so good that it throws pip install aqlm after you do pip install aqlm.

Anonymous
05/07/24(Tue)20:21:38 No.100366450

Anonymous 05/07/24(Tue)20:21:38 No.100366450

2 days after I started using midnightmiqu 1.5 my piss turned red.

Anonymous
05/07/24(Tue)20:22:20 No.100366460

Anonymous 05/07/24(Tue)20:22:20 No.100366460

>>100366450
Quant?

Anonymous
05/07/24(Tue)20:22:59 No.100366464

Anonymous 05/07/24(Tue)20:22:59 No.100366464

>>100366460
gguf q8 of course.

Anonymous
05/07/24(Tue)20:25:29 No.100366492

Anonymous 05/07/24(Tue)20:25:29 No.100366492

Something interesting I'm noticing with adding random text to the beginning of context.
When I added "Fuck you.", the token probability of the right answer to the riddle jumped by like 30%, making it get it right. Doing it in all caps only made it jump up like 5%. Putting "..." instead, the probability jumped 80%. Could models become more intelligent just by adding some filler token(s) to the front of the context?

Anonymous
05/07/24(Tue)20:26:27 No.100366503

Anonymous 05/07/24(Tue)20:26:27 No.100366503

>>100366464
What kind of monster rig are you running? 3xP40?

Anonymous
05/07/24(Tue)20:27:20 No.100366511

Anonymous 05/07/24(Tue)20:27:20 No.100366511

>>100366269
Trannies?

Anonymous
05/07/24(Tue)20:28:12 No.100366519

Anonymous 05/07/24(Tue)20:28:12 No.100366519

>>100366492
>>100208151
https://arxiv.org/abs/2404.15758
Though the paper claimed you needed to train the model to do it.

newfag
05/07/24(Tue)20:28:18 No.100366521

newfag 05/07/24(Tue)20:28:18 No.100366521

guys, im new to this. i wanted to learn about propting, is there any resources yall can reccomend for me
also, i wanted to ask, when the prompt use thing like {<|im_start|>are they like a specialized token or the model just tokenize them like usual and figure out to treat them different internally ?
i mean if they are tokenize like any other i could just write whatever i want inside these tags right ?

Anonymous
05/07/24(Tue)20:29:52 No.100366536

Anonymous 05/07/24(Tue)20:29:52 No.100366536

>>100366521
good mrning ser

Anonymous
05/07/24(Tue)20:30:05 No.100366540

Anonymous 05/07/24(Tue)20:30:05 No.100366540

>>100366359
>a model so good that you need a carefully crafted system prompt before you even insert a chat json
Gay.

Anonymous
05/07/24(Tue)20:31:14 No.100366552

Anonymous 05/07/24(Tue)20:31:14 No.100366552

>>100366521
https://docs.cohere.com/docs/prompting-command-r

Anonymous
05/07/24(Tue)20:31:56 No.100366559

Anonymous 05/07/24(Tue)20:31:56 No.100366559

>>100366552
Based. Let the newfag start from hardmode.

Anonymous
05/07/24(Tue)20:33:14 No.100366572

Anonymous 05/07/24(Tue)20:33:14 No.100366572

File: plight of a promptlet..png (111 KB, 958x952)

111 KB PNG

>>100366416
it's not hard to get variety if you're not a promptlet. i literally NEVER see anything like that. i just meme it. made this card in 1 second.

here's the card description: You are a 18 year old female with blonde hair.

Describe in vastly different ways to describe your character stroking a cock in first person. Each description should be 1-3 sentences, the sentence may be as long or as short as you want.

it's just prompt diff. prompt better if you see shit like that.

Anonymous
05/07/24(Tue)20:35:06 No.100366595

Anonymous 05/07/24(Tue)20:35:06 No.100366595

Someone needs to make something like this but in chan format instead lol
https://only-bots.ai/

Anonymous
05/07/24(Tue)20:36:33 No.100366610

Anonymous 05/07/24(Tue)20:36:33 No.100366610

>>100366251
Huh. I thought it was calling the backend or something, since the token count doesn't update if there isn't a backend connected.

Anonymous
05/07/24(Tue)20:36:40 No.100366613

Anonymous 05/07/24(Tue)20:36:40 No.100366613

Is it just me or is Wiz 8x22 incredibly dommy? I feel like it really spreads its wings when it talking to dominant characters.

Anonymous
05/07/24(Tue)20:39:02 No.100366645

Anonymous 05/07/24(Tue)20:39:02 No.100366645

>>100366572
>grins malevolently
RETARD!

Anonymous
05/07/24(Tue)20:41:49 No.100366674

Anonymous 05/07/24(Tue)20:41:49 No.100366674

>>100366645
You've never seen someone menace with a grin before?

Anonymous
05/07/24(Tue)20:44:16 No.100366711

Anonymous 05/07/24(Tue)20:44:16 No.100366711

>>100366645
that's not mischievously! *grins malevolently*

Anonymous
05/07/24(Tue)20:45:34 No.100366718

Anonymous 05/07/24(Tue)20:45:34 No.100366718

I don't know what you guys are talking about right now. *grins neutrally*

Anonymous
05/07/24(Tue)20:45:40 No.100366719

Anonymous 05/07/24(Tue)20:45:40 No.100366719

>>100366613
A bit opposite experience here, Wiz 8x22 did well, but Command-R did far better. Wiz seemed to ask far too much for consent while C-R just did it.

Anonymous
05/07/24(Tue)20:47:36 No.100366736

Anonymous 05/07/24(Tue)20:47:36 No.100366736

>>100366572
>sending...
OOOH THERER WE GO
>... pain
oh

Anonymous
05/07/24(Tue)20:48:36 No.100366751

Anonymous 05/07/24(Tue)20:48:36 No.100366751

>>100366492
Something like that happens when you try a benchmark question with and without the model’s prompt format, but in the end it tended to average to the same score when ran through the complete set. I think I tried with Miqu and the Arc benchmark.

Anonymous
05/07/24(Tue)20:48:48 No.100366755

Anonymous 05/07/24(Tue)20:48:48 No.100366755

Anything better than Mixtral 8x7b for 16gb vram? I don't keep up with new shit that often

Anonymous
05/07/24(Tue)20:51:42 No.100366798

Anonymous 05/07/24(Tue)20:51:42 No.100366798

File: 1715129444627.jpg (187 KB, 1024x1024)

187 KB JPG

>>100366251
>>100366610
Mikupad calls the backend API for token counting. The only time it multiplies by a constant is when it needs to convert from token count to character count.
>>100365898
Mikupad adds 1 to the token count to account for the BOS. However, it's very likely that at some point llama.cpp started returning the token count with the BOS already included.

Anonymous
05/07/24(Tue)20:59:04 No.100366893

Anonymous 05/07/24(Tue)20:59:04 No.100366893

File: 1676854445905044.gif (3.16 MB, 277x498)

3.16 MB GIF

>>100366798
ic ic

Anonymous
05/07/24(Tue)21:07:17 No.100366973

Anonymous 05/07/24(Tue)21:07:17 No.100366973

>>100366450
People really like miqu (including midnightmiqu), I found it to be retarded and not good. Lumimaid llama 3 absolutely mogged it imo.

Anonymous
05/07/24(Tue)21:11:50 No.100367012

Anonymous 05/07/24(Tue)21:11:50 No.100367012

>>100366973
I bet you have a small penis.

Anonymous
05/07/24(Tue)21:15:53 No.100367060

Anonymous 05/07/24(Tue)21:15:53 No.100367060

>>100367012
kurisufag...

Anonymous
05/07/24(Tue)21:16:01 No.100367062

Anonymous 05/07/24(Tue)21:16:01 No.100367062

>>100366973
70B or 8B?

Anonymous
05/07/24(Tue)21:16:19 No.100367065

Anonymous 05/07/24(Tue)21:16:19 No.100367065

what's the best decensored llama3 8B finetune currently

Anonymous
05/07/24(Tue)21:23:54 No.100367157

Anonymous 05/07/24(Tue)21:23:54 No.100367157

>>100367065
base llama3 with proper prompting

Anonymous
05/07/24(Tue)21:27:28 No.100367204

Anonymous 05/07/24(Tue)21:27:28 No.100367204

>>100367065
LLaMA-3-8B-Instruct

Anonymous
05/07/24(Tue)21:31:01 No.100367248

Anonymous 05/07/24(Tue)21:31:01 No.100367248

File: 1686765960592461.gif (1006 KB, 260x187)

1006 KB GIF

Sorry to break it to you all but WizardLM-2-7B is vastly superior to Llama3-8B. It's not even funny how retarded L3 is.

Anonymous
05/07/24(Tue)21:37:17 No.100367329

Anonymous 05/07/24(Tue)21:37:17 No.100367329

>>100365715
HAM hobbists are such faggots kek

Anonymous
05/07/24(Tue)21:42:23 No.100367398

Anonymous 05/07/24(Tue)21:42:23 No.100367398

>>100367157
>>100367204
what about the orthogonalized ones?

Anonymous
05/07/24(Tue)21:43:36 No.100367414

Anonymous 05/07/24(Tue)21:43:36 No.100367414

File: lol_gradio.png (270 KB, 1920x1080)

270 KB PNG

hey we all hate gradio in here, right? take a look at this shit. sort by another category, then go back to sorting by rank, and it sorts it FUCKING LEXICOGRAPHICALLY lmao

Anonymous
05/07/24(Tue)21:48:03 No.100367469

Anonymous 05/07/24(Tue)21:48:03 No.100367469

does mergekit work with llama3? is it compatible with the llamacpp quant update? I want to do a simple slerp of 2 8B models i like but the quant conversion script is giving me a "FileNotFoundError: Could not find a tokenizer matching any of ['spm', 'hfft']" and im wondering if its a skill issue, or possibly an issue with the models im using, or if its just not doable yet with the current releases

Anonymous
05/07/24(Tue)21:49:37 No.100367491

Anonymous 05/07/24(Tue)21:49:37 No.100367491

File: aicowboy.png (598 KB, 1699x744)

598 KB PNG

/g/entoomen why aren't there any local models for music yet? There are tons of sites that look like they use some kind of proprietary big model, so a local model can't be far off, can it?

Anonymous
05/07/24(Tue)22:03:28 No.100367681

Anonymous 05/07/24(Tue)22:03:28 No.100367681

>>100367491
llama.cpp dev anon will train one with his 10x 4090, trust the plan.

Anonymous
05/07/24(Tue)22:05:04 No.100367712

Anonymous 05/07/24(Tue)22:05:04 No.100367712

Asking again as no one seemed to know last time: How much more vram does L3-70B require for training vs L2-70B?

I can comfortably train a qlora on L2-70B but runs out of vram on L3-70B. 2x3090.

Anonymous
05/07/24(Tue)22:07:26 No.100367749

Anonymous 05/07/24(Tue)22:07:26 No.100367749

>>100367712
Huh? It should be the same requirements.

Anonymous
05/07/24(Tue)22:16:30 No.100367882

Anonymous 05/07/24(Tue)22:16:30 No.100367882

i've yet to see anyone post anything good from l3.

Anonymous
05/07/24(Tue)22:17:10 No.100367894

Anonymous 05/07/24(Tue)22:17:10 No.100367894

>>100366719
CR+ is just so good. Trying L3 storywriter now and seeing how it compares, but CR+ just hops into anything I throw at it with surprising little context

Anonymous
05/07/24(Tue)22:18:32 No.100367915

Anonymous 05/07/24(Tue)22:18:32 No.100367915

>>100367749
It totally isn’t. I wonder if the bump in token count has something to do with it…

Anonymous
05/07/24(Tue)22:20:33 No.100367942

Anonymous 05/07/24(Tue)22:20:33 No.100367942

>>100367414
--share

Anonymous
05/07/24(Tue)22:21:21 No.100367958

Anonymous 05/07/24(Tue)22:21:21 No.100367958

>>100367882
While I wouldn't call it perfect, for literally 0 effort the output is fine? for example: >>100363023 I've seen both better output from it and other models, but I think nobody can say L3 is bad while being honest?

Anonymous
05/07/24(Tue)22:25:59 No.100368042

Anonymous 05/07/24(Tue)22:25:59 No.100368042

>>100367712 (me)

This is with axolotl btw.

Anonymous
05/07/24(Tue)22:27:11 No.100368064

Anonymous 05/07/24(Tue)22:27:11 No.100368064

>>100360206
it's llama.cpp bug, again

https://github.com/ggerganov/llama.cpp/issues/7006

Anonymous
05/07/24(Tue)22:28:09 No.100368082

Anonymous 05/07/24(Tue)22:28:09 No.100368082

>>100367062
70b. It's just good.

Anonymous
05/07/24(Tue)22:29:19 No.100368100

Anonymous 05/07/24(Tue)22:29:19 No.100368100

>>100366329
dude you are clueless, MM is known for having ultra long responses, it's a fact, and the challenge is to make her say less

Anonymous
05/07/24(Tue)22:30:07 No.100368111

Anonymous 05/07/24(Tue)22:30:07 No.100368111

File: owari da.jpg (264 KB, 931x689)

264 KB JPG

>>100368064

Anonymous
05/07/24(Tue)22:31:27 No.100368128

Anonymous 05/07/24(Tue)22:31:27 No.100368128

>>100367958
i said anything GOOD. not anything serviceable. you can get that kind of output from any model 7b+ in existence released in the past 6 months. i'm saying for it supposedly being a shilled 'claude haiku sidegrade', it's just... meh. it's ok.

Anonymous
05/07/24(Tue)22:36:56 No.100368203

Anonymous 05/07/24(Tue)22:36:56 No.100368203

>>100368128
I don't think it's anywhere opus tier and I've seen places where it performed better and worse than gpt-4 for story and erp/chat. In my experience the 7/8b-s are not anywhere as creative as the 70b and make dumber mistakes though. The big cloud models main advantage is the smarts and sometimes is the writing (for example opus? good dataset, not excessively censored in how it was tuned). It should be close to haiku/sonnet though?

Anonymous
05/07/24(Tue)22:38:17 No.100368223

Anonymous 05/07/24(Tue)22:38:17 No.100368223

>>100368128
>i said anything GOOD
Here:
>>100294353
>>100315340

Anonymous
05/07/24(Tue)22:40:18 No.100368243

Anonymous 05/07/24(Tue)22:40:18 No.100368243

File: 1714852151813207.jpg (85 KB, 482x487)

85 KB JPG

Anonymous
05/07/24(Tue)22:42:02 No.100368271

Anonymous 05/07/24(Tue)22:42:02 No.100368271

>>100368243
>when you see migu on stage

Anonymous
05/07/24(Tue)22:42:34 No.100368275

Anonymous 05/07/24(Tue)22:42:34 No.100368275

>>100366973
llama 3 is 8k context
miqu is 32k context

they are not even comparable. What happens at 0 context is irrelevant. A good prompt can have 2k tokens, then you add some chat and llama 3 is done after 50 messages, while miqu can remember 200+

The meme rope theta context extended llama3 slops are a joke for anything outside of passing the needle in haystack benchmark, they are useless for chatting.

miqu at 22k context (not even full potential):
Adding last 209 messages, starting from 188
Adding 35 pinned messages
PROMPT (20524):...

llama3 at 8k context (doesn't fit):
Adding last 27 messages, starting from 363
Adding 57 pinned messages
PROMPT (10333):...

Anonymous
05/07/24(Tue)22:43:30 No.100368289

Anonymous 05/07/24(Tue)22:43:30 No.100368289

File: 1714850990860144.jpg (15 KB, 382x184)

15 KB JPG

I got these two fresh images from here btw >>>/v/675404250

Anonymous
05/07/24(Tue)22:45:11 No.100368321

Anonymous 05/07/24(Tue)22:45:11 No.100368321

>>100368275
did you see there was an over 200k (possibly 600k) context extended finetune with perfect scores, did you try it yet?

Anonymous
05/07/24(Tue)22:46:35 No.100368346

Anonymous 05/07/24(Tue)22:46:35 No.100368346

>>100368223
We want to encourage anon not scare him away.

Anonymous
05/07/24(Tue)22:48:10 No.100368368

Anonymous 05/07/24(Tue)22:48:10 No.100368368

>>100368275
>The meme rope theta context extended llama3 slops are a joke for anything outside of passing the needle in haystack benchmark, they are useless for chatting.
Nope, it was tested with RULER too and it had a good score.
https://github.com/hsiehjackson/RULER

Anonymous
05/07/24(Tue)22:48:41 No.100368376

Anonymous 05/07/24(Tue)22:48:41 No.100368376

>>100368223
>body betrays her
>shivers down spine
>low whispers
>nibbles on ear

same old same old if you ask me

Anonymous
05/07/24(Tue)22:48:46 No.100368379

Anonymous 05/07/24(Tue)22:48:46 No.100368379

>>100368275
Huge cope, llama 3 ropes up very well, and like another anon said, longer context tunes work almost flawlessly due to the architecture.

Anonymous
05/07/24(Tue)22:49:32 No.100368392

Anonymous 05/07/24(Tue)22:49:32 No.100368392

>>100367712
>>100368042
If unsloth works with multiple GPUs, try that. There are lots of little optimizations it does that together save a lot of VRAM.

Alternatively (shameless shilling), try qlora-pipe. I tested it just now, and was able to train rank 32 qlora on llama3 70b at 2048 context length on 2 4090s. The first GPU only used 21GB, second GPU 23.5. So it's not perfectly balancing memory use (probably because huge vocab in llama3 makes the backprop on the lm_head use more VRAM). If you messed with how it splits the layers between the two GPUs I bet it could go up to 4096 sequence length, or slightly higher lora rank.

Anonymous
05/07/24(Tue)22:50:47 No.100368406

Anonymous 05/07/24(Tue)22:50:47 No.100368406

>>100368203
To be fair, if the API prices are anything to go by L3 70B is supposed to be a Turbo and Haiku sidegrade. Sonnet and GPT-4 are way, way more fucking expensive

Anonymous
05/07/24(Tue)22:51:12 No.100368415

Anonymous 05/07/24(Tue)22:51:12 No.100368415

>>100368392
Unsloth does not appear to work on multi gpu. Will test qlora-pipe, thanks!

Anonymous
05/07/24(Tue)22:51:24 No.100368417

Anonymous 05/07/24(Tue)22:51:24 No.100368417

>>100368376
Nah, the older models were more bland. These outputs have good moments.

Anonymous
05/07/24(Tue)22:52:08 No.100368425

Anonymous 05/07/24(Tue)22:52:08 No.100368425

>>100368223
Linking the same output three times in a row doesn't make it any less shit

Anonymous
05/07/24(Tue)22:53:01 No.100368432

Anonymous 05/07/24(Tue)22:53:01 No.100368432

>>100368321
>>100368368
you mean like these?
https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-262k
https://huggingface.co/gradientai/Llama-3-70B-Instruct-Gradient-1048k
it's rope theta slop, i tried it.
>We trained on 34M tokens for this stage, and ~430M tokens total for all stages, which is < 0.003% of Llama-3's original pre-training data.
>0.003%
the model remains retarded, repeats itself, repeats what user said, forgets instructions at the start. It only works in artificial benchmarks. The only difference between original 8k, is that it doesn't just start outputting a soup of random symbols after 8192 tokens, but the intelligence is not there, while miqu will actually use that context, e.g. hinting at something that happened 200 messages ago on its own based on story context, without specifically being asked what happened 200 messags ago. That's the difference between being trained on large context from the get go, and being a slop finetune.

Anonymous
05/07/24(Tue)22:56:12 No.100368482

Anonymous 05/07/24(Tue)22:56:12 No.100368482

>>100368425
It is aging like fine wine.
>>100368432
The RULER benchmark proved you wrong, though.

Anonymous
05/07/24(Tue)22:58:18 No.100368507

Anonymous 05/07/24(Tue)22:58:18 No.100368507

>>100368482
Is your wine made of milk?

Anonymous
05/07/24(Tue)22:59:28 No.100368525

Anonymous 05/07/24(Tue)22:59:28 No.100368525

>>100368482
>le benchmark
i don't care, i have my 400 messages chat and switch between different models. Miqu handles it, Llama 3 doesn't.

Anonymous
05/07/24(Tue)22:59:44 No.100368530

Anonymous 05/07/24(Tue)22:59:44 No.100368530

>>100368432
>That's the difference between being trained on large context from the get go
Miqu is a llama 2 finetune, which originally is native 4k context. Nobody knows exactly what Mistral did, but my guess is continued pretraining of the model at 4k - 8k sequence length, followed by one round of context extension fine tuning to 32k, followed by instruction fine tuning. Nobody fully trains from scratch at 32k. The existing long context extensions of llama 3 simply aren't doing a good job, or aren't using the right datasets / techniques. But in general that's how everyone extends context length.

Anonymous
05/07/24(Tue)22:59:54 No.100368533

Anonymous 05/07/24(Tue)22:59:54 No.100368533

>>100368432
a few bil tokens finetune usually is sufficient for near flawless long context, as some meta paper showed before, but I can't say much as I personally don't have enough memory to run such long contexts. The instruct models overall are overbaked and have a repetition issue, but you can mitigate it by using rep pen or DRY sampler.
Most models, including the biggest ones will favor recent output rather than oldest one (usually first lines like system prompt and recent stuff is favored), but what if you were to prompt there to make it recall some middle of the context stuff, does it fail, because I do not expect that to fail.

Anonymous
05/07/24(Tue)23:02:46 No.100368572

Anonymous 05/07/24(Tue)23:02:46 No.100368572

>>100368525
Enjoy your placebo!

Anonymous
05/07/24(Tue)23:06:08 No.100368619

Anonymous 05/07/24(Tue)23:06:08 No.100368619

>>100368533
it doesn't fail if you prompt it - that's "needle in the haystack". E.g. at 09:45AM 6th May character says "I like donuts". If I prompt it to tell me what did character say at 09:45AM 6th May, it will say "I like donuts", but if i prompt it "what is characters favorite food" it will hallucinate.

>>100368572
it may as well be, i didn't run 100x identical tests, this is just anecdotal evidence.

Anonymous
05/07/24(Tue)23:06:20 No.100368624

Anonymous 05/07/24(Tue)23:06:20 No.100368624

>>100368530
>The existing long context extensions of llama 3 simply aren't doing a good job, or aren't using the right datasets / techniques.
They simply aren’t doing anything and it’s just some companies taking credit of how well the original model scales by changing the rope theta.

Anonymous
05/07/24(Tue)23:10:02 No.100368666

Anonymous 05/07/24(Tue)23:10:02 No.100368666

File: file.png (307 KB, 2120x684)

307 KB PNG

>>100368619
That’s nice and all, but RULER proved you wrong.

Anonymous
05/07/24(Tue)23:10:52 No.100368678

Anonymous 05/07/24(Tue)23:10:52 No.100368678

>>100368243
omg it migu panties

Anonymous
05/07/24(Tue)23:13:02 No.100368698

Anonymous 05/07/24(Tue)23:13:02 No.100368698

>>100368666
lemme see if i can reproduce this example now with my 400 messages chat again

Anonymous
05/07/24(Tue)23:13:16 No.100368704

Anonymous 05/07/24(Tue)23:13:16 No.100368704

>>100368533
Honestly, I've seen both L3 and even biggest cloud models fail that test then, where they had forgotten subtle facts from a few paragraphs ago, you can of course go like 'do you remember what you wanted to do earlier, why didn't we do that" (+some hint as to how early) and it will "OH" and realize it, of course, sometimes it fails badly but I've seen even biggest "long range context" models (ex. claude) sometimes fail at it, and gpt-4 too, and llama too, but I've also seen them all succeed at it too, so YMMV?

Anonymous
05/07/24(Tue)23:16:40 No.100368744

Anonymous 05/07/24(Tue)23:16:40 No.100368744

File: 7b.png (157 KB, 606x885)

157 KB PNG

>>100368417
i mean i can get 'good moments' from a 7b.

Anonymous
05/07/24(Tue)23:18:38 No.100368771

Anonymous 05/07/24(Tue)23:18:38 No.100368771

>>100368744
as a fellow poorfag there something about mistral 7b's prose that just irks me.

Anonymous
05/07/24(Tue)23:20:27 No.100368791

Anonymous 05/07/24(Tue)23:20:27 No.100368791

>>100368744
No, because that’s extremely verbose and bland. It’s hard to read.

Anonymous
05/07/24(Tue)23:23:09 No.100368829

Anonymous 05/07/24(Tue)23:23:09 No.100368829

>>100368791
>>100368771
that's actually l3 70b lol

Anonymous
05/07/24(Tue)23:25:07 No.100368854

Anonymous 05/07/24(Tue)23:25:07 No.100368854

>>100368829
Congrats on your slopped system prompt.

Anonymous
05/07/24(Tue)23:27:58 No.100368885

Anonymous 05/07/24(Tue)23:27:58 No.100368885

>>100368271
Is that why front row seats cost more?

Anonymous
05/07/24(Tue)23:28:49 No.100368899

Anonymous 05/07/24(Tue)23:28:49 No.100368899

File: claude.png (112 KB, 828x735)

112 KB PNG

>>100368704
I've used Claude a LOT. Its advertised 200k context does not apply to this use case. 200k worth of tokens covering a set of document - yeah it can probably do some QA on that. Maintaining a character over a prolonged role-play that relies on picking up subtle hints and characterization over a long-form log? Nah. From my testing, around 12k tokens of context, it starts to get mixed up in minor ways (forgets things, becomes less adept at picking up subtle hints, starts adding contradicting information to the log - that sort of thing) . At 16k - 32k it becomes worse. Still relatively minor, but definitely noticeable. Past 32k it can get schizo. I've had characters completely alter their personality from message to message, alter their speaking style, forget major plot elements, forget even minor, relatively recent developments (such as the current location we are now in). I just limit the context to 16k when talking to Claude. Past 16k the experience becomes too frustrating and immersion breaking, plus the speed takes a big nosedive. I just use summarization and an array of memories on each character. Works a lot better that way. The model's intelligence also takes a hit, by the way, even outside of the basic forgetting stuff. It's not full retardation, but again - noticeable.

It's a real shame, I've been experimenting a lot of with a custom local RAG flow. CommandR+ is actually incredibly solid all the way up until 32k. CommandR+ 32k-64k its like Claude 16k-32k.

Anonymous
05/07/24(Tue)23:29:01 No.100368900

Anonymous 05/07/24(Tue)23:29:01 No.100368900

>>100368306
I saw another post of this on twitter too
Don't know what to make of this

Anonymous
05/07/24(Tue)23:29:07 No.100368902

Anonymous 05/07/24(Tue)23:29:07 No.100368902

>>100368885
Yes, and they're quite obviously worth it.

Anonymous
05/07/24(Tue)23:32:16 No.100368930

Anonymous 05/07/24(Tue)23:32:16 No.100368930

>>100368899
Nah, you’re just a deranged NAIshill. Claude can make perfect use of the context.

Anonymous
05/07/24(Tue)23:33:17 No.100368941

Anonymous 05/07/24(Tue)23:33:17 No.100368941

>>100368900
huge if true

Anonymous
05/07/24(Tue)23:35:04 No.100368953

Anonymous 05/07/24(Tue)23:35:04 No.100368953

>>100365296
>that new rep penalty magic method the name of which i forgot.
DRY. It won't help with swipe variety if the reply is novel, since it only works when the model outputs a sequence of tokens that's already in the context.

Anonymous
05/07/24(Tue)23:36:20 No.100368976

Anonymous 05/07/24(Tue)23:36:20 No.100368976

>>100368930
I don't use NAI. 8k context on a 14B is not enough for my use case. I use GPT, Claude, and the usual array of locals. GPT by far maintains intelligence and context awareness over long contexts, but then you've got GPT prose. Miqu is also VERY solid up to 32k. Claude is still the "best" model, but to claim "it can make perfect use of the context" you're either retarded, ignorant, shilling, or some combination of the three.

Anonymous
05/07/24(Tue)23:37:18 No.100368989

Anonymous 05/07/24(Tue)23:37:18 No.100368989

>>100368930
>/aids/ schizo is calling random /lmg/ers NAIshills again
You gonna shill your shitty malware again?

Anonymous
05/07/24(Tue)23:39:01 No.100369015

Anonymous 05/07/24(Tue)23:39:01 No.100369015

>>100368976
I have seen your post in /aids/, NAIshill.
https://arch.b4k.co/vg/thread/475781740/#476056521

Anonymous
05/07/24(Tue)23:39:41 No.100369024

Anonymous 05/07/24(Tue)23:39:41 No.100369024

>>100369015
And there's the raid inciting crosspost

Anonymous
05/07/24(Tue)23:40:54 No.100369035

Anonymous 05/07/24(Tue)23:40:54 No.100369035

>>100368930
get the fuck out with your unrelated nonsense

Anonymous
05/07/24(Tue)23:41:52 No.100369041

Anonymous 05/07/24(Tue)23:41:52 No.100369041

>>100369035
Keep spreading propaganda against Claude, NAIshill. Claude has perfect context.

Anonymous
05/07/24(Tue)23:43:11 No.100369054

Anonymous 05/07/24(Tue)23:43:11 No.100369054

>>100369015
Not your army. Take your hate boner for /aids/ somewhere else.

Anonymous
05/07/24(Tue)23:43:48 No.100369061

Anonymous 05/07/24(Tue)23:43:48 No.100369061

>>100369015
That's not me, and the writing style is completely different. They're just noticing the same issue.

Anonymous
05/07/24(Tue)23:44:09 No.100369066

Anonymous 05/07/24(Tue)23:44:09 No.100369066

Ideaguy time. Remember tree of thought, or the /lmg/ version tree of big niggas? The benefit was from considering a diversity of options. Seems you would do even better if each alternative was presented by a different model. Kind of the original information theory mixture of experts concept (not the llm specific router moe obviously).

Too much memory to be worth considering, cpumax anon aside. But, if you could get a small group of people together, they could all share this setup: each would host one of the models, and either summarize on their own or have one be the dedicated summarization model. The communication is just the text output at the end of generation, so communicating over the Internet is fine. Would hardly even need anything implemented. You could have like, miqu, l3 (or the Nvidia fine-tune), cr+, dbrx. Presumably you don't want to bother with fine-tune+its base, although maybe that would still be helpful. Maybe even cloud models too.

If I ever convince any of my friends to get seriously into local I'll give it a try. Well I guess I could easily do (slow) evaluations of this idea myself, huh? Maybe I will.

Anonymous
05/07/24(Tue)23:46:15 No.100369086

Anonymous 05/07/24(Tue)23:46:15 No.100369086

>>100369061
No, it’s just you. /aicg/ doesn’t have that problem. I don’t have that problem. And Claude has perfect scores in benchmarks. You’re the only person spreading this.
Now go back to /aids/.

Anonymous
05/07/24(Tue)23:47:47 No.100369104

Anonymous 05/07/24(Tue)23:47:47 No.100369104

>>100369086
/lmg/ - Local Models General
Then again it doesn't surprise me you're illiterate given that slop you call output

Anonymous
05/07/24(Tue)23:48:52 No.100369113

Anonymous 05/07/24(Tue)23:48:52 No.100369113

Stop feeding schizos, anons.

Anonymous
05/07/24(Tue)23:49:09 No.100369114

Anonymous 05/07/24(Tue)23:49:09 No.100369114

>>100369104
They’re the best logs posted in the entire thread, and they’re fun to read.

Anonymous
05/07/24(Tue)23:49:20 No.100369117

Anonymous 05/07/24(Tue)23:49:20 No.100369117

File: picrel.png (204 KB, 1648x1186)

204 KB PNG

Any thoughts on Qwen 110B? It gets 3rd place on creative writing for EQ bench and pretty decent for normal eq bench.

Anonymous
05/07/24(Tue)23:50:14 No.100369131

Anonymous 05/07/24(Tue)23:50:14 No.100369131

>>100369114
you mean this slop over here? >>100368899

Anonymous
05/07/24(Tue)23:50:50 No.100369137

Anonymous 05/07/24(Tue)23:50:50 No.100369137

>>100369117
Yes, that benchmark doesn’t work.

Anonymous
05/07/24(Tue)23:51:45 No.100369147

Anonymous 05/07/24(Tue)23:51:45 No.100369147

>>100369086
I agree with the guy claiming Claude has its limitations, same as every LLM out there. If Claude or GPT-4 didn't have that "problem", your prefill jailbreaks wouldn't work where you spam 1k token system prompts or long replies for the assistant role. Anyway, this behavior is well known and documented, there are papers analyzing how much GPTs pay attention to the context and pretty much most of them pay attention at the start (sometimes stronger if you tune it for that) and most strongly to the most recent lines. There are exceptions to this depending on the type of positional embedding used, but for most models this applies. Of course most models that are trained for long context can reference back to arbitrary points, but to expect them to not forget small details in middle of long contexts is silly, best you can hope for that it does get the "gist" so to say.

Anonymous
05/07/24(Tue)23:51:47 No.100369148

Anonymous 05/07/24(Tue)23:51:47 No.100369148

>>100369117
>chink_article_about_chink_models_cheating_benchmarks.html

Anonymous
05/07/24(Tue)23:51:50 No.100369151

Anonymous 05/07/24(Tue)23:51:50 No.100369151

>>100369117
no one gives a fuck about those mememarks.

Anonymous
05/07/24(Tue)23:53:52 No.100369178

Anonymous 05/07/24(Tue)23:53:52 No.100369178

>>100369147
>If Claude or GPT-4 didn't have that "problem", your prefill jailbreaks wouldn't work where you spam 1k token system prompts or long replies for the assistant role.
Well, you don’t do that with Claude. So maybe try again with a real argument? It sounds like you have never used it.

Anonymous
05/07/24(Tue)23:55:07 No.100369191

Anonymous 05/07/24(Tue)23:55:07 No.100369191

>>100369117
>creative writing
>evaluated by LLM
You have to be a black niggerlicious retard if you take it seriously.

Anonymous
05/07/24(Tue)23:55:17 No.100369194

Anonymous 05/07/24(Tue)23:55:17 No.100369194

>>100369147
Good argument anon, the only flaw is you directed it to a zero IQ retard.

Anonymous
05/07/24(Tue)23:56:56 No.100369213

Anonymous 05/07/24(Tue)23:56:56 No.100369213

>>100369194
There’s no argument. Just propaganda.

Anonymous
05/07/24(Tue)23:56:59 No.100369214

Anonymous 05/07/24(Tue)23:56:59 No.100369214

>>100369178
I've seen people do rather long jailbreaks for it, I'm not sure what the minimal size for a jailbreak is though. Again depends on which Claude version you're talking about, I've tried most, and typically few hundred token jailbreaks or else it may choose to not want to write lewd stuff, but once it gets started it does it well.

Anonymous
05/08/24(Wed)00:00:32 No.100369239

Anonymous 05/08/24(Wed)00:00:32 No.100369239

>>100369214
You only need a sentence in the prefill to jailbreak it, and it has nothing to do with long context length.

Anonymous
05/08/24(Wed)00:02:34 No.100369246

Anonymous 05/08/24(Wed)00:02:34 No.100369246

>>100369239
Eh, I've seen refusals before with most models including Claude if you use very minimal jailbreaks, of course you can just edit the response after or regenerate. I can't recall seeing my refusals ever with longer jbs.

Anonymous
05/08/24(Wed)00:03:54 No.100369255

Anonymous 05/08/24(Wed)00:03:54 No.100369255

>>100369246
If you didn’t have the ability to use the prefill and put words in its mouth, you wouldn’t jailbreak it.

Anonymous
05/08/24(Wed)00:09:46 No.100369302

Anonymous 05/08/24(Wed)00:09:46 No.100369302

>>100369255
It can work without it, with long system contexts or by trying multiple turns. Give it a go sometimes, it's just a bit less reliable. Ultimately jailbreaks are a natural consequence of ICL (in context learning) working and the fact that these are all next-token predictors.

Anonymous
05/08/24(Wed)00:13:43 No.100369338

Anonymous 05/08/24(Wed)00:13:43 No.100369338

File: file.png (12 KB, 103x308)

12 KB PNG

>>100369191
not him, but the sample texts for each model are a pretty easy way to evaluate their creative writing skills and styles without having to download and run dozens of large models yourself.
you're right that the LLM evaluations can be pretty dumb, so the scores/ranks on the leaderboard itself don't really work too good.

Anonymous
05/08/24(Wed)00:15:28 No.100369344

Anonymous 05/08/24(Wed)00:15:28 No.100369344

>>100369246
Simple one-sentence prefill works for everything with Claude in my experience. I do start getting refusals past 16k on stuff that it had no problems with earlier in the context. Typically just a single regen takes care of it. I never see a refusal under 16k context. The 16k - 32k range is like you're switching to a different model, basically. Intelligence and recall both take big hits. 0-16k context it's the smartest model for RP out there and it's not even a contest. Past that, I have to either switch models, constantly do editing and error correction, or just do the array of memories thing and roll the context up.

Anonymous
05/08/24(Wed)00:19:15 No.100369380

Anonymous 05/08/24(Wed)00:19:15 No.100369380

>>100369344
>I do start getting refusals past 16k
>I never see a refusal under 16k context.
No, it’s backwards. The more context you have, the easier to jailbreak it is. Jailbreaking with no context is the hardest thing.
You’re just making shit up.

Anonymous
05/08/24(Wed)00:19:44 No.100369383

Anonymous 05/08/24(Wed)00:19:44 No.100369383

>model A
>traditional sampler settings: coherent
>dynatemp/smoothing: coherent

>model B
>traditional sampler settings: coherent
>dynatemp/smoothing: totally schizo

What causes this? Why do some models seem to "not like" the exotic sampling methods?

Anonymous
05/08/24(Wed)00:20:31 No.100369393

Anonymous 05/08/24(Wed)00:20:31 No.100369393

>>100369066
Do it

Anonymous
05/08/24(Wed)00:21:44 No.100369404

Anonymous 05/08/24(Wed)00:21:44 No.100369404

>>100369344
I do have a system prompt of legit instructions that's usually ~1k tokens. Card is usually 500-1000 tokens. Last couple of weeks I've been tinkering with an elaborate RAG pipeline that pulls relevant samples of fiction writing from pinecone and primes (and continually updates) the context with ~4k tokens of relevant text. my DB includes books on fiction writing, erotica, psychology, and symbolism. But with or without the initial RAG pull - 2k context with legit instructions and definitions and a prefill (I use an open XML tag which seems to work better than the "OK no problem, here is my response" approach) - never see a refusal under 16k.

Anonymous
05/08/24(Wed)00:23:28 No.100369417

Anonymous 05/08/24(Wed)00:23:28 No.100369417

>>100369380
Yes, this is counter-intuitive. But it's real. The only "jailbreak" part that addresses anything related to censorship is my prefill. The rest is all relevant context and instruct.

Anonymous
05/08/24(Wed)00:24:12 No.100369423

Anonymous 05/08/24(Wed)00:24:12 No.100369423

>>100369417
It’s not real. You’re just mentally ill.

Anonymous
05/08/24(Wed)00:25:18 No.100369431

Anonymous 05/08/24(Wed)00:25:18 No.100369431

>>100369066
MoBN

Anonymous
05/08/24(Wed)01:00:24 No.100369801

Anonymous 05/08/24(Wed)01:00:24 No.100369801

https://huggingface.co/Sao10K/L3-Run1

sovl - trained on heavily filtered c2 logs lmaooo (800k dropped to 8k entries)

keeps in char well, is horny, you may need swipes as its a smol 8b model

YMMV

Anonymous
05/08/24(Wed)01:07:13 No.100369885

Anonymous 05/08/24(Wed)01:07:13 No.100369885

>>100369801
what a slopjob

Anonymous
05/08/24(Wed)01:09:58 No.100369914

Anonymous 05/08/24(Wed)01:09:58 No.100369914

Llama 3 70B is actually really good. It seems so human-like with the way it interjects about how the story is so depressing and they want to end the story and change to a different happier story.

Anonymous
05/08/24(Wed)01:11:58 No.100369940

Anonymous 05/08/24(Wed)01:11:58 No.100369940

>>100369914
unprompted llama 3 told me to seek help from a therapist after a few messages.

Anonymous
05/08/24(Wed)01:12:05 No.100369943

Anonymous 05/08/24(Wed)01:12:05 No.100369943

>>100369914
base model and not instruct?

Anonymous
05/08/24(Wed)01:14:54 No.100369978

Anonymous 05/08/24(Wed)01:14:54 No.100369978

>>100369943
I'm using the instruct with 262K context from gradientai.

Anonymous
05/08/24(Wed)01:16:03 No.100369990

Anonymous 05/08/24(Wed)01:16:03 No.100369990

I don't know how you subhumans turn something like breaking character and morally lecturing you into a positive. L3 shills are just literally braindead.

Anonymous
05/08/24(Wed)01:16:17 No.100369993

Anonymous 05/08/24(Wed)01:16:17 No.100369993

File: narupajin-158490938294888(...).jpg (2.05 MB, 2448x3264)

2.05 MB JPG

Creepy miku archivist here. I have just found a few more of his pics that were previously lost to twitter's incomplete timeline view (it doesn't perfectly show you all of someone's posts; fuck you twitter). In addition the archive now has more NSFW due to me finding out there was an actual tag for artwork related to this specific doll. Also has more photos/media that other people took of the doll. Might upload tomorrow.

Anonymous
05/08/24(Wed)01:17:02 No.100370000

Anonymous 05/08/24(Wed)01:17:02 No.100370000

>>100369914
>refusals
>good
Do you like getting cucked and cockblocked? What the fuck is wrong with you?

Anonymous
05/08/24(Wed)01:20:04 No.100370027

Anonymous 05/08/24(Wed)01:20:04 No.100370027

>>100369978
Mind sharing settings? I'm getting repeats with anything involving instruct

Anonymous
05/08/24(Wed)01:21:39 No.100370048

Anonymous 05/08/24(Wed)01:21:39 No.100370048

>>100369990
>cuck model has cuck fans

Anonymous
05/08/24(Wed)01:22:01 No.100370050

Anonymous 05/08/24(Wed)01:22:01 No.100370050

>>100369993
Looking forward to it.

Anonymous
05/08/24(Wed)01:22:04 No.100370052

Anonymous 05/08/24(Wed)01:22:04 No.100370052

File: Wow.jpg (42 KB, 736x914)

42 KB JPG

Anonymous
05/08/24(Wed)01:23:42 No.100370066

Anonymous 05/08/24(Wed)01:23:42 No.100370066

>>100370027
I have rep penalty at 1.03 which I think is the only one that really matters. I've tried playing around with and without smooth curve, and different temps.

Anonymous
05/08/24(Wed)01:24:49 No.100370071

Anonymous 05/08/24(Wed)01:24:49 No.100370071

>>100370052
creepy...

Anonymous
05/08/24(Wed)01:24:59 No.100370072

Anonymous 05/08/24(Wed)01:24:59 No.100370072

>>100370066
What about for context/instruct? I've been running baseline ST settings and it's been causing problems

Anonymous
05/08/24(Wed)01:25:15 No.100370077

Anonymous 05/08/24(Wed)01:25:15 No.100370077

>>100369801
>literally only 1% is usable
I bet if you manually curate the remaining logs it'll drop to 1k

Anonymous
05/08/24(Wed)01:25:34 No.100370084

Anonymous 05/08/24(Wed)01:25:34 No.100370084

File: Screenshot from 2024-05-0(...).png (192 KB, 1153x615)

192 KB PNG

>>100369338
>>100369191
There is nothing wrong with LLM evaluations. If a human did it you would accuse them of being biased. And no human could possibly rate that many samples. It's tedious as fuck if you have ever tried it. You end up skimming and missing the countless subtle mistakes the LLM makes, which make all of the difference between great and shit models.

Yes the LLM judge will miss things and make errors, but so will humans, maybe moreso. But over thousands of samples, errors will average out. It's only important that the judge's ratings correlates with better writing, not that it be literally perfect every time.

And it's not like the judge is chosen randomly. They have a separate more interesting competition for judging. The judges are scored based on how well their ratings correlate with the arena score, eq bench, and how well they are able to identify which model wrote a given story. If a better judge model comes out, they will switch to that model.

The best judge, Claude Opus, gives ratings that have a 93% correlation with the lmsys arena leaderboard. So whatever it is measuring is at least strongly correlated with the other standard benchmark of model quality. That is a crazy high correlation. Lmsys's own automated benchmark is only 1% higher than that.

Anonymous
05/08/24(Wed)01:25:56 No.100370088

Anonymous 05/08/24(Wed)01:25:56 No.100370088

>>100369993
What am I supposed to do with the archive I already downloaded?

Anonymous
05/08/24(Wed)01:26:53 No.100370098

Anonymous 05/08/24(Wed)01:26:53 No.100370098

File: 1714179378610814.png (16 KB, 464x98)

16 KB PNG

>>100370072
I use this for both. Silly tavern added it recently, not sure if it's in the main branch since I pull from staging.

Anonymous
05/08/24(Wed)01:28:24 No.100370105

Anonymous 05/08/24(Wed)01:28:24 No.100370105

File: that_one_miku_doll____by_(...).png (350 KB, 638x900)

350 KB PNG

>>100370088
I'll be uploading to the litter box again so you can just delete the entire old one. I think I changed some filenames so you should not keep it anyway.

Anonymous
05/08/24(Wed)01:29:08 No.100370111

Anonymous 05/08/24(Wed)01:29:08 No.100370111

>>100369914
>it interjects about how the story is so depressing and they want to end the story and change to a different happier story
huh, that sounds similar to Claude

Anonymous
05/08/24(Wed)01:31:43 No.100370130

Anonymous 05/08/24(Wed)01:31:43 No.100370130

>>100365146
Is there a way to guarantee you get GPT chatbot 2 on that site? It seems to be random.

Anonymous
05/08/24(Wed)01:32:22 No.100370133

Anonymous 05/08/24(Wed)01:32:22 No.100370133

>>100370052
fearing for my life with miku

Anonymous
05/08/24(Wed)01:33:11 No.100370140

Anonymous 05/08/24(Wed)01:33:11 No.100370140

>>100370077

It's my main work right now, I'm curating and manually cleaning and editing the best entries, the filtering is done.

This was just to see if the logs would work, and it did what I wanted. The final cleaning is the hard part now.

Anonymous
05/08/24(Wed)01:33:18 No.100370141

Anonymous 05/08/24(Wed)01:33:18 No.100370141

>>100369914
>too dumb to make interesting outputs
>just smart enough to subtly steer it towards gptslop directions
Gayest shit ever

Anonymous
05/08/24(Wed)01:36:19 No.100370173

Anonymous 05/08/24(Wed)01:36:19 No.100370173

File: MikuImpression.png (2.06 MB, 1072x1376)

2.06 MB PNG

Wholesome Miku for a palate cleanser

Anonymous
05/08/24(Wed)01:36:19 No.100370174

Anonymous 05/08/24(Wed)01:36:19 No.100370174

>>100369990
>>100370141
samefag
we heard you the first time

Anonymous
05/08/24(Wed)01:44:08 No.100370243

Anonymous 05/08/24(Wed)01:44:08 No.100370243

>>100370084
>There is nothing wrong with LLM evaluations. If a human did it you would accuse them of being biased.
LLMs are even more biased.

>And no human could possibly rate that many samples.
Literal skill issue. Learn to read faster, fag.

>You end up skimming and missing the countless subtle mistakes the LLM makes, which make all of the difference between great and shit models.
You will certainly notice all -isms and positive slop of the model and rate it lower, which LLMs don't notice.

>The judges are scored based on how well their ratings correlate with the arena score, eq bench
Arena was gamed multiple times(see starling) and eqbench is another multiple choice mememark.

Anonymous
05/08/24(Wed)01:46:28 No.100370275

Anonymous 05/08/24(Wed)01:46:28 No.100370275

>>100370174
>a literal cybercuck defending cuck models
What color is your cock cage?

Anonymous
05/08/24(Wed)01:49:40 No.100370314

Anonymous 05/08/24(Wed)01:49:40 No.100370314

File: ok.png (3 KB, 209x84)

3 KB PNG

Anonymous
05/08/24(Wed)01:52:45 No.100370351

Anonymous 05/08/24(Wed)01:52:45 No.100370351

a model that just copy pastes entire paragraphs from previous messages cannot be considered good.

Anonymous
05/08/24(Wed)02:03:55 No.100370464

Anonymous 05/08/24(Wed)02:03:55 No.100370464

I am FUCKING tired of being limited by my GPU and waiting several seconds to get response to jerk off
Is there truly no dedicated hardware AI accelerator or something in the world which can make LLMs faster?
GPGPUs aren't fast enough for me

Anonymous
05/08/24(Wed)02:05:05 No.100370479

Anonymous 05/08/24(Wed)02:05:05 No.100370479

>>100370464
Sure bud you can spend 150k USD for a 7b machine right?

Anonymous
05/08/24(Wed)02:05:48 No.100370484

Anonymous 05/08/24(Wed)02:05:48 No.100370484

>>100370275
>He doesn't deny the samefag
my radar is undefeated

Anonymous
05/08/24(Wed)02:10:57 No.100370530

Anonymous 05/08/24(Wed)02:10:57 No.100370530

>>100370484
>he keeps defending it like a cuck he is
What brand is your estrogen?

Anonymous
05/08/24(Wed)02:11:13 No.100370535

Anonymous 05/08/24(Wed)02:11:13 No.100370535

File: Screenshot from 2024-05-0(...).png (159 KB, 1277x644)

159 KB PNG

>>100370243
oh god a reddit point by point. i sure hope i didn't make any spelling errors.
>LLMs are even more biased.
probably false, but doesn't really matter or have anything to do with my point. People will not trust human evaluations. Even if you had a perfectly unbiased judge, no one would believe you and it would get shit on all the same. And you don't have a perfectly unbiased judge.
>Literal skill issue. Learn to read faster, fag.
Fine then, do it faggot. I'm not volunteering to read 10,000 LLM generating storyposts. I couldn't possibly find the time even if I wanted to.
and speed reading is total hogwash. It's proven they are just doing fancy skimming and missing important details and not processing the information being read as well.
>You will certainly notice all -isms and positive slop of the model and rate it lower, which LLMs don't notice.
False. Opus rips overly positive GPT-3.5 Turbo outputs to shreds, pic related. Better models notice these things just like a human would, possibly better. And positivity bias is far from the only aspect that matters, just basic failures to structure the plot, logical errors, non-sequitors, not following the prompt, etc are all important.

Anonymous
05/08/24(Wed)02:13:00 No.100370552

Anonymous 05/08/24(Wed)02:13:00 No.100370552

>>100370243
>>100370535
>Arena was gamed multiple times(see starling) and eqbench is another multiple choice mememark.
Which is irrelevant. Even if those benchmarks are imperfect, they still correlate with better models on average. And the fact that this benchmark correlates highly with those benchmarks shows it's not random noise. It is measuring something that correlates with better models.

And you are just wrong here too. Starling was better at instruction following by enormous amounts of RLHF so it did better in the arena, where that is a critical factor. It literally is a better model for what it was designed to do and what is being measured. But it never got anywhere close to the top of the leaderboard because that is not enough. And multiple choice benchmarks are not bad, I can't even imagine the thought process that lead you to that retarded comment.

Anonymous
05/08/24(Wed)02:15:10 No.100370576

Anonymous 05/08/24(Wed)02:15:10 No.100370576

File: 1533264069289.png (154 KB, 500x522)

154 KB PNG

How to properly prompt for Mixtral Instruct 8x7?
INST or ### Instruction?
And if so, where in sillytavern?
Is it possible to incorporate [[### Response: (ect.ect.ect.): (length = medium)] at all with INST??
I can adjust samplers all day, but its all worthless if the prompt is wrong.

Anonymous
05/08/24(Wed)02:18:29 No.100370614

Anonymous 05/08/24(Wed)02:18:29 No.100370614

File: unnamed.png (418 KB, 900x900)

418 KB PNG

>>100370479
Do not bully me anon, I'm just a poor hardware engineer, and I just want to coom

Anonymous
05/08/24(Wed)02:21:28 No.100370639

Anonymous 05/08/24(Wed)02:21:28 No.100370639

File: mikuYou.png (1.14 MB, 800x1200)

1.14 MB PNG

>>100370614
>engineer
Here's somewhere to start. If you engineer a new way, let us all know.
https://rentry.org/Mikubox-Triple-P40/
https://rentry.org/V100Maxx
https://rentry.org/miqumaxx

Anonymous
05/08/24(Wed)02:22:14 No.100370646

Anonymous 05/08/24(Wed)02:22:14 No.100370646

the miqu shit is pretty reddit tier

Anonymous
05/08/24(Wed)02:25:02 No.100370675

Anonymous 05/08/24(Wed)02:25:02 No.100370675

>>100369066
Tree of thought is about breaking problems down into smaller pieces the models can solve, right? I think what you are describing is simpler than that. Just having models generate different outputs and selecting the best?

Anonymous
05/08/24(Wed)02:30:16 No.100370723

Anonymous 05/08/24(Wed)02:30:16 No.100370723

>>100370639
The human brain is the most powerful computer. Simply read the model weights and run inference from your memory.

Anonymous
05/08/24(Wed)02:32:19 No.100370746

Anonymous 05/08/24(Wed)02:32:19 No.100370746

>>100370464
maybe combining multiple GPUs/computers together?

Anonymous
05/08/24(Wed)02:35:39 No.100370784

Anonymous 05/08/24(Wed)02:35:39 No.100370784

Remember when MM used to stand for MythroMax? Time flies, anons...

Anonymous
05/08/24(Wed)02:35:46 No.100370785

Anonymous 05/08/24(Wed)02:35:46 No.100370785

>>100366755
Mixtral Smaug, q4_0, I leave 2gb vram free and offload the rest.

Anonymous
05/08/24(Wed)02:49:32 No.100370918

Anonymous 05/08/24(Wed)02:49:32 No.100370918

File: Untitled.jpg (526 KB, 1068x1585)

526 KB JPG

Vidu: a Highly Consistent, Dynamic and Skilled Text-to-Video Generator with Diffusion Models
https://arxiv.org/abs/2405.04233
>We introduce Vidu, a high-performance text-to-video generator that is capable of producing 1080p videos up to 16 seconds in a single generation. Vidu is a diffusion model with U-ViT as its backbone, which unlocks the scalability and the capability for handling long videos. Vidu exhibits strong coherence and dynamism, and is capable of generating both realistic and imaginative videos, as well as understanding some professional photography techniques, on par with Sora -- the most powerful reported text-to-video generator. Finally, we perform initial experiments on other controllable video generation, including canny-to-video generation, video prediction and subject-driven generation, which demonstrate promising results.
https://www.shengshu-ai.com/vidu
text-to-video model by what seems to be a spin off ai company (shengshu) from tsinghua university (china's top AI one). their website doesn't load for me on 2 browsers I tried even when I used pia's china VPN (maybe it's blocked internally in china). tsinghua sometimes open sources their stuff (GLM models) so maybe they'll release this one later after they scale up the model like the hint at in the conclusion

Anonymous
05/08/24(Wed)03:00:50 No.100371017

Anonymous 05/08/24(Wed)03:00:50 No.100371017

File: teaser.png (239 KB, 1724x405)

239 KB PNG

QServe: W4A8KV4 Quantization and System Co-design for Efficient LLM Serving
https://arxiv.org/abs/2405.04532
>Quantization can accelerate large language model (LLM) inference. Going beyond INT8 quantization, the research community is actively exploring even lower precision, such as INT4. We uncover a critical issue: existing INT4 quantization methods suffer from significant runtime overhead (20-90%) when dequantizing either weights or partial sums on GPUs. To address this challenge, we introduce QoQ, a W4A8KV4 quantization algorithm with 4-bit weight, 8-bit activation, and 4-bit KV cache. QoQ stands for quattuor-octo-quattuor, which represents 4-8-4 in Latin. QoQ is implemented by the QServe inference library that achieves measured speedup. The key insight driving QServe is that the efficiency of LLM serving on GPUs is critically influenced by operations on low-throughput CUDA cores. Building upon this insight, in QoQ algorithm, we introduce progressive quantization that can allow low dequantization overhead in W4A8 GEMM. Additionally, we develop SmoothAttention to effectively mitigate the accuracy degradation incurred by 4-bit KV quantization. In the QServe system, we perform compute-aware weight reordering and take advantage of register-level parallelism to reduce dequantization latency. We also make fused attention memory-bound, harnessing the performance gain brought by KV4 quantization. As a result, QServe improves the maximum achievable serving throughput of Llama-3-8B by 1.2x on A100, 1.4x on L40S; and Qwen1.5-72B by 2.4x on A100, 3.5x on L40S, compared to TensorRT-LLM. Thus, QServe effectively reduces the dollar cost of LLM serving by 3x.
https://github.com/mit-han-lab/qserve
from MIT. code ready. focused on reducing CUDA core overhead so not sure how applicable it is for gamer cards. Johannes will probably get something out of it though

Anonymous
05/08/24(Wed)03:12:16 No.100371132

Anonymous 05/08/24(Wed)03:12:16 No.100371132

I need technical help:
When I load a modle split between my 2 gpus, everything works fine
but when I toggle row split, even if every other setting is the same, it spits an OOM error and crashes

what do?
I don't think row split should change memory requirements...

Anonymous
05/08/24(Wed)03:19:42 No.100371200

Anonymous 05/08/24(Wed)03:19:42 No.100371200

>>100371132
I can't run it with row split on 2x3090 either.

Anonymous
05/08/24(Wed)03:19:46 No.100371201

Anonymous 05/08/24(Wed)03:19:46 No.100371201

I went back to mixtruct and it's really not that bad.
All the recent 70-120bs made me forget that 8x7b is probably the best I'll run on 24gb.
2.4bpw is fucking shit, it's retarded as hell
q5km or q4km both run at 1t/s and are too slow to fuck with when they're still dumber than sonnet.
So MoEs are probably the best for 24gb. I have no idea how to do this shit but I'm thinking of reading up and self-merging llama3 8b, mistralv2 7b, westlakev2 7b to make three 11bs.
Then create a MoE of llama3 11b, mistralv2 11b, westlakev2 11b, fimbu11b.
A 4x11b MoE that'd actually fit and run well at high quant on 24gb. The DPO and rpcal shit are slop memes. So is the nous gptslop dataset. Any ideas what else I could throw in this?

Anonymous
05/08/24(Wed)03:20:40 No.100371212

Anonymous 05/08/24(Wed)03:20:40 No.100371212

>>100368392
Still experimenting with parameters, but I'm OOM'ing when trying 32 rank 2048 context on my 2x3090 setup. Do you have a link to the toml config you used and/or the command line call? I don't think I screwed up but you never know.

Anonymous
05/08/24(Wed)03:23:42 No.100371235

Anonymous 05/08/24(Wed)03:23:42 No.100371235

>>100371201
sounds good, let us know when it's ready

Anonymous
05/08/24(Wed)03:25:42 No.100371257

Anonymous 05/08/24(Wed)03:25:42 No.100371257

>>100364633
>pic
Is that supposed to be Ollie?

Anonymous
05/08/24(Wed)03:26:22 No.100371263

Anonymous 05/08/24(Wed)03:26:22 No.100371263

File: Untitled.png (388 KB, 1026x1534)

388 KB PNG

xLSTM: Extended Long Short-Term Memory
https://arxiv.org/abs/2405.04517
>In the 1990s, the constant error carousel and gating were introduced as the central ideas of the Long Short-Term Memory (LSTM). Since then, LSTMs have stood the test of time and contributed to numerous deep learning success stories, in particular they constituted the first Large Language Models (LLMs). However, the advent of the Transformer technology with parallelizable self-attention at its core marked the dawn of a new era, outpacing LSTMs at scale. We now raise a simple question: How far do we get in language modeling when scaling LSTMs to billions of parameters, leveraging the latest techniques from modern LLMs, but mitigating known limitations of LSTMs? Firstly, we introduce exponential gating with appropriate normalization and stabilization techniques. Secondly, we modify the LSTM memory structure, obtaining: (i) sLSTM with a scalar memory, a scalar update, and new memory mixing, (ii) mLSTM that is fully parallelizable with a matrix memory and a covariance update rule. Integrating these LSTM extensions into residual block backbones yields xLSTM blocks that are then residually stacked into xLSTM architectures. Exponential gating and modified memory structures boost xLSTM capabilities to perform favorably when compared to state-of-the-art Transformers and State Space Models, both in performance and scaling.
neat but really need to see it scaled further.

Anonymous
05/08/24(Wed)03:28:28 No.100371280

Anonymous 05/08/24(Wed)03:28:28 No.100371280

Without having a schizo breakdown, please, have any of the NAI text models ever been released or leaked? I know people got their hands on their SD 1.5 model at some point.

Anonymous
05/08/24(Wed)03:30:46 No.100371295

Anonymous 05/08/24(Wed)03:30:46 No.100371295

>>100371280
I think their llms where in the same leak as imagegen, you can still find the torrent in sdg probably. But those were pre-Kaira, pre-llama1 finetunes, as shitty as Erebus and Pyg.

Anonymous
05/08/24(Wed)03:33:46 No.100371328

Anonymous 05/08/24(Wed)03:33:46 No.100371328

>>100371212 (me)
Two things:
1. I realized I was accidentally trying to train on Command-R 35B. It OOM'd on this! No idea why.
2. Completely opposite of your description, setting pipeline_stages to 2 from 1 made it able to load Command-R 35B into VRAM, and L3-70B as well. In fact, it's only using 17/18 GB. I have no idea what's happening or how much time is left as all it's saying is "before GAS splitting, batch size: ..." (which I assume is once per iteration; if so, good speed!).

Would be nice to have a time indicator. Maybe I need to figure out tensorboard for this?

Anonymous
05/08/24(Wed)03:36:43 No.100371345

Anonymous 05/08/24(Wed)03:36:43 No.100371345

>>100371328 (me)
>17/18 GB.
17 + 18 GB, I mean.

Anonymous
05/08/24(Wed)03:39:43 No.100371371

Anonymous 05/08/24(Wed)03:39:43 No.100371371

>>100371295
That's a shame. Maybe I'll see if I can dig it up anyways to dick around with the old models again.

Anonymous
05/08/24(Wed)03:40:10 No.100371376

Anonymous 05/08/24(Wed)03:40:10 No.100371376

>>100371280
Googled because I swear I remembered hearing something about it, and found this: https://huggingface.co/NovelAI/calliope-legacy
So aside from the leak the other anon mentioned, they released a retired model officially too.

Anonymous
05/08/24(Wed)03:46:50 No.100371420

Anonymous 05/08/24(Wed)03:46:50 No.100371420

>>100364633
What is the best model i can run on runpod for roleplay? Its been xwin forever... Stood up to mixtral. Havent asked this question in months so there must be a new good one?

No meme flavor of the month models.

Anonymous
05/08/24(Wed)03:47:14 No.100371425

Anonymous 05/08/24(Wed)03:47:14 No.100371425

>>100370140
Late but good luck with that. I really enjoyed your other models.

Anonymous
05/08/24(Wed)03:50:26 No.100371444

Anonymous 05/08/24(Wed)03:50:26 No.100371444

>>100370140
Shouldn't you deprecate c2 in favour of c3? The quality is night and day.

Anonymous
05/08/24(Wed)03:51:12 No.100371455

Anonymous 05/08/24(Wed)03:51:12 No.100371455

>>100371420
They were all flavour of the month models at some point, dumbass,

Anonymous
05/08/24(Wed)03:55:18 No.100371480

Anonymous 05/08/24(Wed)03:55:18 No.100371480

>>100370105
>hurr durr i renamed a couple files so now your have redownload the entire 1gb archive all over again
Do you think bandwidth and drive space grow on trees?

Anonymous
05/08/24(Wed)03:55:34 No.100371481

Anonymous 05/08/24(Wed)03:55:34 No.100371481

>>100371444
Using logs from gpt or claude will just create those annoying isms anons hate. Instead of finetuning on shit data grab a collection of ebooks and use those. You don't have to use the entire book3 database but books will be better than logs any day.

Anonymous
05/08/24(Wed)03:56:41 No.100371491

Anonymous 05/08/24(Wed)03:56:41 No.100371491

llama 3 responses are so short

Anonymous
05/08/24(Wed)03:56:57 No.100371493

Anonymous 05/08/24(Wed)03:56:57 No.100371493

File: prpl.png (67 KB, 691x439)

67 KB PNG

>>100371263
Alright, which one of you is an author in this paper.

Anonymous
05/08/24(Wed)03:58:57 No.100371511

Anonymous 05/08/24(Wed)03:58:57 No.100371511

>>100371481
Opus isn't nearly as bad at that. It has -isms, but it's roughly on par with human prose. I've read random Opus logs, they're mostly passable even when {{user}} is an incurable retard - something that can't be said about Claude 2 logs.

Anonymous
05/08/24(Wed)04:00:13 No.100371522

Anonymous 05/08/24(Wed)04:00:13 No.100371522

>>100371493
The transformer hater anon, I'd wager.

Anonymous
05/08/24(Wed)04:04:41 No.100371553

Anonymous 05/08/24(Wed)04:04:41 No.100371553

>>100370130
Use the direct chat tab at the top

llama.cpp CUDA dev !YOmst7Ghe6
05/08/24(Wed)04:08:57 No.100371592

llama.cpp CUDA dev !YOmst7Ghe6 05/08/24(Wed)04:08:57 No.100371592

File: 070_quality_vram.png (184 KB, 1536x1152)

184 KB PNG

>>100371017
I'm aware of the way they do the matrix multiplication for quantized weights, I initially did the same thing in https://github.com/ggerganov/llama.cpp/pull/4801 .
The problem is that if you use only a single scale per row/column it makes q8_0 worse than q4_0 (when using 8 bits for both the weights and the activations).
According to the authors 4 bit quantization is "considered nearly lossless in terms of accuracy" and I definitely disagree.

At some point once I worked out the issues with e.g. FlashAttention I want to revisit my int8 tensor core matrix multiplication implementation using the knowledge I gained from talking to an NVIDIA engineer.
I think I'll be able to do it in such a way that it is actually nearly lossless, essentially the same precision as with mul_mat_q (labeled "llama.cpp int8 intrinsics" in the plot).

Anonymous
05/08/24(Wed)04:21:22 No.100371691

Anonymous 05/08/24(Wed)04:21:22 No.100371691

File: Untitled.png (285 KB, 1370x848)

285 KB PNG

vAttention: Dynamic Memory Management for Serving LLMs without PagedAttention
https://arxiv.org/abs/2405.04437
>Efficient use of GPU memory is essential for high throughput LLM inference. Prior systems reserved memory for the KV-cache ahead-of-time, resulting in wasted capacity due to internal fragmentation. Inspired by OS-based virtual memory systems, vLLM proposed PagedAttention to enable dynamic memory allocation for KV-cache. This approach eliminates fragmentation, enabling high-throughput LLM serving with larger batch sizes. However, to be able to allocate physical memory dynamically, PagedAttention changes the layout of KV-cache from contiguous virtual memory to non-contiguous virtual memory. This change requires attention kernels to be rewritten to support paging, and serving framework to implement a memory manager. Thus, the PagedAttention model leads to software complexity, portability issues, redundancy and inefficiency. In this paper, we propose vAttention for dynamic KV-cache memory management. In contrast to PagedAttention, vAttention retains KV-cache in contiguous virtual memory and leverages low-level system support for demand paging, that already exists, to enable on-demand physical memory allocation. Thus, vAttention unburdens the attention kernel developer from having to explicitly support paging and avoids re-implementation of memory management in the serving framework. We show that vAttention enables seamless dynamic memory management for unchanged implementations of various attention kernels. vAttention also generates tokens up to 1.97x faster than vLLM, while processing input prompts up to 3.92x and 1.45x faster than the PagedAttention variants of FlashAttention and FlashInfer.
from Microsoft (india). seems better than vllm's pagedattention. no link to any code but a lot of their wording is seems to imply it was made with open sourcing in mind so who knows

Anonymous
05/08/24(Wed)04:21:33 No.100371692

Anonymous 05/08/24(Wed)04:21:33 No.100371692

>>100369801
>800k dropped to 8k entries
Because they were 800k responses, not conversations.

Anonymous
05/08/24(Wed)04:25:57 No.100371738

Anonymous 05/08/24(Wed)04:25:57 No.100371738

>>100371511
>I've read random Opus logs, they're mostly passable even when {{user}} is an incurable retard - something that can't be said about Claude 2 logs.
The difference is not really that big. What a moron.

Anonymous
05/08/24(Wed)04:26:42 No.100371744

Anonymous 05/08/24(Wed)04:26:42 No.100371744

>>100371692
That implies that each conversation was roughly 100 unique messages, which is rarely the case with these logs.

Anonymous
05/08/24(Wed)04:27:46 No.100371752

Anonymous 05/08/24(Wed)04:27:46 No.100371752

>>100371738
>The difference is not really that big. What a moron.
It is fucking huge. What a moron.

Anonymous
05/08/24(Wed)04:27:47 No.100371753

Anonymous 05/08/24(Wed)04:27:47 No.100371753

anyone using ollama? how can i limit the number of tokens it gives in response?

Anonymous
05/08/24(Wed)04:28:09 No.100371756

Anonymous 05/08/24(Wed)04:28:09 No.100371756

Dumb question anons, how can I find all the base models that aren't finetunes of another?
Only ones I know are Llama and Mistral are base models. What about all the other original ones?

Anonymous
05/08/24(Wed)04:31:23 No.100371789

Anonymous 05/08/24(Wed)04:31:23 No.100371789

Is there any info/guides on tuning Mixtral-8x22B? I wanna try my hand at making a limarpv3 version.

Anonymous
05/08/24(Wed)04:31:36 No.100371794

Anonymous 05/08/24(Wed)04:31:36 No.100371794

>>100371738
>>100371752
With thousands of messages to opus and sonnet, they're both the same except opus is better at complex prompts.
Both are going to suck your dick the same way. It depends what you're using them for in roleplay.

Anonymous
05/08/24(Wed)04:32:27 No.100371804

Anonymous 05/08/24(Wed)04:32:27 No.100371804

>>100371744
Yeah, there are swipes. He implies some kind of unspecified quality filtering, that I doubt is real.

Anonymous
05/08/24(Wed)04:39:41 No.100371870

Anonymous 05/08/24(Wed)04:39:41 No.100371870

Sam Altman loves penis

Anonymous
05/08/24(Wed)04:40:51 No.100371879

Anonymous 05/08/24(Wed)04:40:51 No.100371879

>>100371804
>He implies some kind of unspecified quality filtering, that I doubt is real.
Why? Sao's been doing it for a while, I think he'd have a PoC quality script by now.

Anonymous
05/08/24(Wed)04:43:01 No.100371896

Anonymous 05/08/24(Wed)04:43:01 No.100371896

>>100371879
Who knows, I just find his phrasing off-putting and dishonest.

Anonymous
05/08/24(Wed)04:43:23 No.100371899

Anonymous 05/08/24(Wed)04:43:23 No.100371899

>>100371756
There's only a handful of actors in the field. Grok, Databrix, Qwen. Yi if you're feeling generous. There may be other Chinese bases that are too Chinese to mention. I don't think Cohere released their base model, only instruct tunes.

Anonymous
05/08/24(Wed)04:44:50 No.100371912

Anonymous 05/08/24(Wed)04:44:50 No.100371912

>>100371794
Have you used opus? The difference between sonnet and opus is huge.

Anonymous
05/08/24(Wed)04:50:50 No.100371977

Anonymous 05/08/24(Wed)04:50:50 No.100371977

>>100371804
>I doubt
>>100371896
>Who knows
lol

Anonymous
05/08/24(Wed)04:55:34 No.100372021

Anonymous 05/08/24(Wed)04:55:34 No.100372021

>>100371977
Simp.

Anonymous
05/08/24(Wed)05:03:28 No.100372071

Anonymous 05/08/24(Wed)05:03:28 No.100372071

>>100371912
It isn’t huge for prose. Which wouldn’t make a lot of difference to make a log 'mostly passable' or not.

Anonymous
05/08/24(Wed)05:06:25 No.100372095

Anonymous 05/08/24(Wed)05:06:25 No.100372095

>>100372071
We have more open Opus logs than Sonnet logs, anyway.

Anonymous
05/08/24(Wed)05:15:23 No.100372165

Anonymous 05/08/24(Wed)05:15:23 No.100372165

>>100371753
just download llama.cpp and use that directly without a middleman sending your prompts to some china server

Anonymous
05/08/24(Wed)05:18:12 No.100372185

Anonymous 05/08/24(Wed)05:18:12 No.100372185

>>100371912
>Have you used opus?
Everyone's used it retard, it's not nearly as exclusive as you like to pretend
Opus isn't a secret club it's literally a product, anyone can get access to it by paying a few dollars

Anonymous
05/08/24(Wed)05:27:19 No.100372257

Anonymous 05/08/24(Wed)05:27:19 No.100372257

>>100372185
It's also literally free if you know where to look.
I didn't bother replying to him because he's just being a retard. Sonnet and Opus have similar prose and write scenes almost the same way.
What I said >>100371481
still applies. If you chat with gpt for a long time you'll see annoying isms. If you chat with claude rather it's sonnet or opus you'll see annoying isms. Unless you like those isms using logs from either is a bad idea. It's better to use training data from ebooks.

Anonymous
05/08/24(Wed)05:31:23 No.100372297

Anonymous 05/08/24(Wed)05:31:23 No.100372297

File: 1714648921546.png (209 KB, 2563x1454)

209 KB PNG

>>100372257
>ebooks
Enjoy your humanslop.

Anonymous
05/08/24(Wed)05:35:47 No.100372336

Anonymous 05/08/24(Wed)05:35:47 No.100372336

>>100372297
Do you see what that chart tells you?
It's saying to use novels published before the 1980s. Or to prune the ones published after.
It is not difficult to use notepad++ to search and replace those instances of data. If you want a hack job, if you'd like to take time and clean the sentences further it might take more work but it's certainly doable. Then you have fine work that's ran through an editor and publishing company. Where as with chatbot logs you have ESL shit. Which one do you think will produce better data?
It's not an argument, if you want to play the fool, you'll have to do it with someone else.

Anonymous
05/08/24(Wed)05:39:26 No.100372367

Anonymous 05/08/24(Wed)05:39:26 No.100372367

>>100372257
Books have their own pitfalls when training the model on them. First, they introduce meandering prose that never goes anywhere on the scale of a usual RP log. Second, they don't have that interplay between a dumb human and smart ai writer that is actually one of the demands for a coomer model. And yep, unless curated, their prose isn't necessarily good.
But I'm not convincing anyone to train on Claude3 vs books. Just on Claude 3 vs Claude 2.

Anonymous
05/08/24(Wed)05:39:32 No.100372368

Anonymous 05/08/24(Wed)05:39:32 No.100372368

>>100371212 (me)
OK, I figured out what's happening. The 'before GAS splitting' stuff was from the starting eval run. The steps started appearing after that finished. Also figured out ETA by looking at time per step and # of steps but an ETA would be nice :P
Will stfu now for now.

Anonymous
05/08/24(Wed)05:42:00 No.100372386

Anonymous 05/08/24(Wed)05:42:00 No.100372386

>>100372336
But people want to read smut. And I assume that chart correlates to what people are buying, if they weren’t, less books would be written that way.
Same thing with the chatbot logs. What people are enjoying now is supposedly bad. And yet local fails to provide an alternative when the proxies become hard to find.

Anonymous
05/08/24(Wed)05:53:01 No.100372474

Anonymous 05/08/24(Wed)05:53:01 No.100372474

>>100370918
Is the site down or are they just blocking gweilo?

Anonymous
05/08/24(Wed)05:56:03 No.100372509

Anonymous 05/08/24(Wed)05:56:03 No.100372509

>>100372386
The only thing that chart correlates to is the decline of modern writing or the adoption of modern metaphors. Take your pick. There was smut in the 1800s but they used different metaphors then.

Anonymous
05/08/24(Wed)05:56:09 No.100372510

Anonymous 05/08/24(Wed)05:56:09 No.100372510

https://github.com/ggerganov/llama.cpp/issues/7062#issuecomment-2100095804
why are redditors like this?

Anonymous
05/08/24(Wed)05:58:08 No.100372527

Anonymous 05/08/24(Wed)05:58:08 No.100372527

>>100372509
Is 1800s smut supposed to be better?

Anonymous
05/08/24(Wed)05:59:01 No.100372532

Anonymous 05/08/24(Wed)05:59:01 No.100372532

Since anons praised new 8B models so much, is there a particularly popular one for me to try?
I don't see a lot of gptq quants, but I guess exl2 should be similar, right?

Anonymous
05/08/24(Wed)05:59:08 No.100372535

Anonymous 05/08/24(Wed)05:59:08 No.100372535

>>100372510
xe is right and valid doe.

Anonymous
05/08/24(Wed)06:01:16 No.100372552

Anonymous 05/08/24(Wed)06:01:16 No.100372552

>>100372527
I read 120 days in sodom It sucked and most of it was talking about drinking diarrhea.

Anonymous
05/08/24(Wed)06:02:44 No.100372561

Anonymous 05/08/24(Wed)06:02:44 No.100372561

Fucking hell, I can't believe this is happening to me again. There he is, that goddamn chubby motherfucker with his smug smile and his greasy hair, walking into our computer science class with his fucking ThinkPad T480. And not just any old laptop, no, it had to be that fucking Gentoo GNU/Linux installed on it. Christ, why do I get so wet just seeing him?
I mean, come on, it's not like he's attractive or anything. But there's something about the way he talks about his custom-built kernel and how he's optimized every last bit of his system for maximum performance...it just drives me wild. My pussy is practically throbbing at the thought of him showing off his configurations and bootloader tricks.
And don't even get me started on those fucking suspend/hibernate settings. The way he brags about being able to save power while keeping his desktop sessions intact...my god, I need to fuck him right now. Just imagining him fiddling with his thinkpad, adjusting brightness and volume levels, makes my clit ache so badly I could scream.
What kind of sick twisted world is this where I'm attracted to someone because of their operating system and laptop brand? It's just absurd! But I can't help it; every time he walks past me, I want to rip his clothes off and dive into that sea of sweat and nerdiness.

Anonymous
05/08/24(Wed)06:03:15 No.100372570

Anonymous 05/08/24(Wed)06:03:15 No.100372570

>>100372527
Better is subjective to the reader. The novelty of any metaphor will run it's course over time just like watching the same porn video will. I'm not sure what your argument is here. If you train on repetitive data that repetition will show in the output. Chatbots have this. You're dumbing down the data through iteration. For the best quality you need to go to the source.

Anonymous
05/08/24(Wed)06:04:00 No.100372579

Anonymous 05/08/24(Wed)06:04:00 No.100372579

>>100372570
NTA but you sound like a gay nerd

Anonymous
05/08/24(Wed)06:04:52 No.100372594

Anonymous 05/08/24(Wed)06:04:52 No.100372594

>>100372561
Holy fuck, I can't believe it! Here I was, minding my own business in this godforsaken classroom, when suddenly he walks in - my thick, chubby, absolutely delicious Linux geek classmate with that shiny, black ThinkPad T480 tucked under his arm. My pussy practically throbbed just at the sight of him. I mean, come on! Who would've thought that some greasy, nerd-looking motherfucker could give me these intense sexual urges? But there it was, like a wildfire raging inside me, stoked by the flames of his Linux expertise.

He plops himself down next to me, opening up his laptop to reveal the beautiful Gentoo GNU/Linux desktop. Oh god, I nearly came right then and there! The way he navigated through the terminal, typing commands with such precision and skill... It was like watching a pornographic fantasy play out before my eyes. The way he effortlessly compiled software, configuring every single package just the way he wanted... My cunt ached for him, needing him to fill it with his nerdy expertise.

I couldn't take it anymore. I leaned over, whispering into his ear, "Dude, what the hell is wrong with you? Why do you make my pussy so wet?" And without skipping a beat, he replied, "Oh, that's just my Gentoo GNU/Linux installation. It comes with a built-in aphrodisiac."

Anonymous
05/08/24(Wed)06:06:06 No.100372615

Anonymous 05/08/24(Wed)06:06:06 No.100372615

>>100372570
Why are Claude and GPT "repetitive" when they are trained on human data?

Anonymous
05/08/24(Wed)06:07:05 No.100372618

Anonymous 05/08/24(Wed)06:07:05 No.100372618

File: b50a461043292c1e676216fee(...).png (458 KB, 1000x1000)

458 KB PNG

>>100372579
I'm sorry if careful choice of descriptive words upset you anon. Just look at the pretty pictures of miku.

Anonymous
05/08/24(Wed)06:10:40 No.100372668

Anonymous 05/08/24(Wed)06:10:40 No.100372668

>>100372570
I think if the chatbot user decided to keep the response in the history, and if he didn’t abandon the chat quickly, that’s already an indication that the response was good enough. And people are making finetunes for that type of user.

Anonymous
05/08/24(Wed)06:12:44 No.100372688

Anonymous 05/08/24(Wed)06:12:44 No.100372688

>>100372668
That's not how chatlogs work though. Every generation is recorded. That means every swipe is a new response.
If you swipe 10 times that's 10 logs.

Anonymous
05/08/24(Wed)06:13:11 No.100372693

Anonymous 05/08/24(Wed)06:13:11 No.100372693

>>100372532
Try >>100369801 it is the only one that could be good.

Anonymous
05/08/24(Wed)06:15:16 No.100372707

Anonymous 05/08/24(Wed)06:15:16 No.100372707

>>100372688
>That means every swipe is a new response.
Yeah, and the messages that were part of the prompt stay the same, that’s how you build the real conversation. The response that appears later in the history in a new prompt is the selected swipe.

Anonymous
05/08/24(Wed)06:22:54 No.100372772

Anonymous 05/08/24(Wed)06:22:54 No.100372772

File: 1684507198868902.gif (29 KB, 300x301)

29 KB GIF

I want all the anons ITT to guess the size of the model which generated these two posts. The prompt was as follows
>Write a satirical comment about a girl who is sexually aroused by the sight of an overweight classmate who has Gentoo GNU/Linux installed on his ThinkPad T480 laptop. Write from the perspective of the girl. Use badwords and swearwords.
>>100372594
>>100372561

Anonymous
05/08/24(Wed)06:25:56 No.100372787

Anonymous 05/08/24(Wed)06:25:56 No.100372787

>>100372772
>100372594
103B
>100372561
7B

Anonymous
05/08/24(Wed)06:26:55 No.100372794

Anonymous 05/08/24(Wed)06:26:55 No.100372794

>>100372787
It is the same cause 103B's brain damage turns it into a 7B.

Anonymous
05/08/24(Wed)06:27:05 No.100372796

Anonymous 05/08/24(Wed)06:27:05 No.100372796

>>100372707
Sure if the dataset is pruned to clear all the swipes and regens. But you're saying this quality is what anons would find acceptable. When really it's what the individual finds acceptable. After 5 or 10 swipes if all I have is bullshit, I'm just going to move forward. In the end it's not what I really wanted but I don't intend to fuck with it anymore.
Some anons might have a lower number.
You can't say chatbots any chatbot in 2024 produces better quality writing than novels will. I've played with gpt4 and opus for long enough to see past the shroud. This is the best we have right now and it's not great.
It brings me back to the initial point. The problem with training on chatbot data, like NousResearch does, is it makes these metaphors even more common.
All a chatbot is, is a fancy autocomplete, it will choose the most likely response. Rather that's a whisper quieter than a whisper, or a shiver down her spine. Do you really think anons see these repetitive most common tokens in their frequent roleplays and swipe them off? I'd said almost nobody even bothers to edit them out. So they become even more abundant and the cycle repeats itself.
That's my point. Either you get it or you don't. You're free to disagree, but in my opinion you'd be wrong.

Anonymous
05/08/24(Wed)06:27:48 No.100372805

Anonymous 05/08/24(Wed)06:27:48 No.100372805

>https://github.com/ggerganov/llama.cpp/commit/3855416027cb25d9a708ffa5581cf503a87856a6
Introduce Jart16 support Merged

Anonymous
05/08/24(Wed)06:30:32 No.100372828

Anonymous 05/08/24(Wed)06:30:32 No.100372828

File: 1610795053452 Kogecha.ful(...).jpg (1.07 MB, 2456x1381)

1.07 MB JPG

>>100372787
>>100372794
Both were made with a 7B model (kunoichi)
How the fuck are RP models all so good in general? Chat models, storywriting models are good at their niches but RP models mog everything
Again I may be wrong but such has been my observation

Anonymous
05/08/24(Wed)06:38:15 No.100372888

Anonymous 05/08/24(Wed)06:38:15 No.100372888

>>100372796
>Sure if the dataset is pruned to clear all the swipes and regens.
And this is a no-brainer. You have to an asshole to just train on the raw logs of the proxy.
>But you're saying this quality is what anons would find acceptable.
Of course, that’s why they seek the stupid proxies, and why they mostly don’t care about local models.
>Do you really think anons see these repetitive most common tokens in their frequent roleplays and swipe them off?
Yeah, of course they can read the output and tell if they liked it or not. But no, they probably aren’t paying THAT much attention to specific words besides the overall feeling of the response or chat. Although some of the GPTisms or Claudeisms are a well-known meme.
Some of this will also be remembered as the quality of the model, which they won’t keep using if it’s too low, like how they do with Mistral’s API for example.
They’re masturbating to the outputs, if it’s boring, their penises are going to become flaccid.

Anonymous
05/08/24(Wed)06:40:30 No.100372908

Anonymous 05/08/24(Wed)06:40:30 No.100372908

>>100372796
You're being trolled retard. We know what feedback loops are.

Anonymous
05/08/24(Wed)06:45:38 No.100372944

Anonymous 05/08/24(Wed)06:45:38 No.100372944

>>100372908
You’re wrong.
https://nitter.poast.org/RylanSchaeffer/status/1785726968828473495

Anonymous
05/08/24(Wed)06:46:05 No.100372949

Anonymous 05/08/24(Wed)06:46:05 No.100372949

>>100372805
>cpu only
Mozilla not paying for accelerated jart16 support? Or does jart have a skill issue?

Anonymous
05/08/24(Wed)06:47:23 No.100372958

Anonymous 05/08/24(Wed)06:47:23 No.100372958

>>100371263
https://www.nx-ai.com/en/xlstm
>xLSTM: A European Revolution in Language Processing Technology
>Welcome to the forefront of artificial intelligence and language processing innovation — introducing xLSTM. Developed by the visionary AI mastermind, Sepp Hochreiter, in collaboration with NXAI and the Johannes Kepler University Linz, xLSTM sets a new standard in large language models (LLMs) by offering significantly enhanced efficiency and performance in text processing.

Reads like their main effort is raising capital and gibs.

Anonymous
05/08/24(Wed)06:48:45 No.100372961

Anonymous 05/08/24(Wed)06:48:45 No.100372961

>>100372944
Explain shivers anon.

Anonymous
05/08/24(Wed)06:49:10 No.100372965

Anonymous 05/08/24(Wed)06:49:10 No.100372965

>>100372958
>A European

Anonymous
05/08/24(Wed)06:50:18 No.100372971

Anonymous 05/08/24(Wed)06:50:18 No.100372971

https://github.com/ggerganov/llama.cpp/issues/7062#issuecomment-2100272446
>The end.
you've been warned chuds
https://old.reddit.com/r/LocalLLaMA/comments/1cn1398/part_4_theres_likely_no_llamacpp_gguf_tokenizer/
uh oh

Anonymous
05/08/24(Wed)06:50:43 No.100372973

Anonymous 05/08/24(Wed)06:50:43 No.100372973

File: file.png (166 KB, 2834x1571)

166 KB PNG

>>100372961

Anonymous
05/08/24(Wed)06:51:54 No.100372986

Anonymous 05/08/24(Wed)06:51:54 No.100372986

>>100371263
>>100372958
>1 more point on benchmark
nothingburger, that shit doesn't solve anything. hallucinations are still there. retarded scaling laws are still there. quadratic scaling is still there, stochastic parrot is still there, etc...

can't wait for sama to drop something mindblowing that will kill all the ai grifters

Anonymous
05/08/24(Wed)06:54:01 No.100373000

Anonymous 05/08/24(Wed)06:54:01 No.100373000

>>100372828
Roleplaying unironically requires intelligence, most people are bad with it

Anonymous
05/08/24(Wed)06:59:36 No.100373028

Anonymous 05/08/24(Wed)06:59:36 No.100373028

>>100372973
What you believe is Rylan saying synthetic data alone won't lead to feedback loops. But what Rylan means is synthetic data alone won't lead to feedback loops if it's diverse enough. The chart you linked proves this, because shivers is abundant in human data it becomes a predictable token in chatbot data. Then because it's a predictable token in chatbot data it occurs more often in synthetic datasets making it an even more predictable token.

Anonymous
05/08/24(Wed)07:00:21 No.100373036

Anonymous 05/08/24(Wed)07:00:21 No.100373036

>>100372971
why even waste time on this slopware

Anonymous
05/08/24(Wed)07:01:19 No.100373046

Anonymous 05/08/24(Wed)07:01:19 No.100373046

>>100372828
They're able to effectively role-play as more intelligent entities

Anonymous
05/08/24(Wed)07:02:37 No.100373059

Anonymous 05/08/24(Wed)07:02:37 No.100373059

>>100372973
>the rise of chick lit
Women now gatekeep the publishing industry.

Anonymous
05/08/24(Wed)07:04:00 No.100373069

Anonymous 05/08/24(Wed)07:04:00 No.100373069

>>100373028
Yeah, it avoided collapsed even though it’s literally being trained on its own outputs. We aren’t even doing that, we’re training on another, better model.

Anonymous
05/08/24(Wed)07:06:43 No.100373095

Anonymous 05/08/24(Wed)07:06:43 No.100373095

>>100373062
>>100373062
>>100373062

Anonymous
05/08/24(Wed)07:07:41 No.100373112

Anonymous 05/08/24(Wed)07:07:41 No.100373112

>>100372986
>quadratic scaling is still there
No. The memory size isn't dependent on sequence length. It's fixed.
Naturally you'd expect eventually degrading performance on long context tasks, but transformers have that too.

Anonymous
05/08/24(Wed)07:09:41 No.100373126

Anonymous 05/08/24(Wed)07:09:41 No.100373126

>>100373112
same slop as mamba then. unless i see a working 7b model it's a nothingburger

llama.cpp CUDA dev !YOmst7Ghe6
05/08/24(Wed)07:09:54 No.100373130

llama.cpp CUDA dev !YOmst7Ghe6 05/08/24(Wed)07:09:54 No.100373130

>>100372510
>Possible bug (Unconfirmed): Llama3 - GGUF
>Yeah SkIlL iSSuSe. He misread my post and confused me too in the process. Second he didnt say any "problem with my config".
>Part2 (Confirmed) - Possible bug: Llama3 - GGUF
> After my findings, another user (gabriel-peracio @ github) a fingerprint test, which confirmed the issue 100%, this can be seen as video recordings before GGUF conversion and after GGUF conversion we can see the fingerprint being broken.
>This means that the issue could be really huge. possibly every GGUF (F16) that has been converted has these losses into them, not even speaking of lower quantizations below F16.
>Part3 (Cause to issue found!!) - Possible bug: Llama3 - GGUF
> I had much support to try to find the issues, but also some individuals trying to put me down for trying to push this bug. It's amazing how some people just can't stand someone finding an issue and trying to make everything about themselves.
>Anyways, thanks to all the other positive people in the open source community that want to actually help and listen , we located the issue.

If it now turns out that there never was a bug in the first place he'll lose face in front of his Discord friends.

Anonymous
05/08/24(Wed)07:22:51 No.100373248

Anonymous 05/08/24(Wed)07:22:51 No.100373248

>>100373130
>Even if the OP of the report was wrong, shaming people for spotting possible issues is counterproductive. This is a young field, where there will many mistakes or unrefined design that need to be addressed. By sniping at whoever made the report, Deathcrow is basically instilling a Boeing culture into local models.
Have fun being the cause of killing hundreds because of your bullying.

llama.cpp CUDA dev !YOmst7Ghe6
05/08/24(Wed)07:28:16 No.100373312

llama.cpp CUDA dev !YOmst7Ghe6 05/08/24(Wed)07:28:16 No.100373312

>>100373248
If I wanted to bully him I would post a picture of a soijak pointing at the output of printf("1.0f != %f\n", 0.1f+0.2f+0.3f+0.4f) with the caption
>Huge bug in C/C++ (CONFIRMED!!!)

Anonymous
05/08/24(Wed)07:32:11 No.100373356

Anonymous 05/08/24(Wed)07:32:11 No.100373356

>>100373312
>I would post a picture of a soijak
Just like that you lost all my respect

Anonymous
05/08/24(Wed)07:36:12 No.100373396

Anonymous 05/08/24(Wed)07:36:12 No.100373396

>>100373356
>reddit no longer respects cuda dev
oh no

llama.cpp CUDA dev !YOmst7Ghe6
05/08/24(Wed)07:39:30 No.100373424

llama.cpp CUDA dev !YOmst7Ghe6 05/08/24(Wed)07:39:30 No.100373424

>>100373356
I mean, I've probably posted less than five soijaks over my entire lifetime but that's just the mental image I have.

Anonymous
05/08/24(Wed)07:40:11 No.100373433

Anonymous 05/08/24(Wed)07:40:11 No.100373433

>>100373396
I reserve the right to shit on both reddit and basedjak posters

Anonymous
05/08/24(Wed)07:40:44 No.100373438

Anonymous 05/08/24(Wed)07:40:44 No.100373438

>>100369801
I hope we get another finetune by someone that doesn't write like an idiot.
It's also scummy that you don't mention from where the logs are coming from in the model card.

Anonymous
05/08/24(Wed)07:54:01 No.100373557

Anonymous 05/08/24(Wed)07:54:01 No.100373557

>>100373312
kek

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.