/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/24/24(Mon)15:54:03 No.101134566

File: 00304-3999940436.png (1.63 MB, 1024x1536)

1.63 MB PNG

/lmg/ - Local Models General Anonymous 06/24/24(Mon)15:54:03 No.101134566 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101125756 & >>101115749

►News
>(06/23) Support for BitnetForCausalLM merged: https://github.com/ggerganov/llama.cpp/pull/7931
>(06/18) Meta Research releases multimodal 34B, audio, and multi-token prediction models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/24/24(Mon)15:56:37 No.101134613

Anonymous 06/24/24(Mon)15:56:37 No.101134613

File: file.png (71 KB, 1150x735)

71 KB PNG

>Yeah bro, LLMs can TOTALLY reason like humans!

Prompt:
>How many 'r's are there in "strawberry"? After answering the previous question, list all characters along with their index relative to the their other occurrences and check if your answer was correct.

Anonymous
06/24/24(Mon)15:57:04 No.101134618

Anonymous 06/24/24(Mon)15:57:04 No.101134618

File: 1616928817133.png (293 KB, 839x469)

293 KB PNG

We're going to be so back

Anonymous
06/24/24(Mon)15:58:21 No.101134638

Anonymous 06/24/24(Mon)15:58:21 No.101134638

>>101134613
We already covered this topic.
Move on.
We have.

Anonymous
06/24/24(Mon)15:59:44 No.101134666

Anonymous 06/24/24(Mon)15:59:44 No.101134666

>>101134638
This isn't the thread I created, my post was filtered and this was put in it's place

Anonymous
06/24/24(Mon)16:01:00 No.101134683

Anonymous 06/24/24(Mon)16:01:00 No.101134683

>>101134638
I know, I know. But the prompt is still something interesting, it's crazy how bad of an answer we get from literally the best LLM we have right now.

Anonymous
06/24/24(Mon)16:04:04 No.101134737

Anonymous 06/24/24(Mon)16:04:04 No.101134737

File: file.png (24 KB, 507x327)

24 KB PNG

>>101134613
wow drunk counting

Anonymous
06/24/24(Mon)16:04:39 No.101134742

Anonymous 06/24/24(Mon)16:04:39 No.101134742

>>101134683
When prompting llms to test their reasoning ability, make sure tokenization doesn't impact the results

Anonymous
06/24/24(Mon)16:06:33 No.101134769

Anonymous 06/24/24(Mon)16:06:33 No.101134769

>>101134737
strtrstrawbey

Anonymous
06/24/24(Mon)16:08:04 No.101134793

Anonymous 06/24/24(Mon)16:08:04 No.101134793

>>101134742
Tokenization doesn't impact the results of that test. Maybe you didn't realize how the LLM listed 3 'r's, but doubled down on the sentence only having 2 'r's?
Then it proceeded to write the index for 3 'r's even after saying there's only 2, lol.

Anonymous
06/24/24(Mon)16:13:35 No.101134867

Anonymous 06/24/24(Mon)16:13:35 No.101134867

File: output.png (10 KB, 329x308)

10 KB PNG

>>101134742
Would that be like ensuring the words you use neatly fit into one token as opposed to this?

Anonymous
06/24/24(Mon)16:17:40 No.101134926

Anonymous 06/24/24(Mon)16:17:40 No.101134926

File: ollol.jpg (90 KB, 1102x381)

90 KB JPG

>https://www.wiz.io/blog/probllama-ollama-vulnerability-cve-2024-37032
>ollamalets getting SHARED
needful and sars pilled

Anonymous
06/24/24(Mon)16:20:32 No.101134973

Anonymous 06/24/24(Mon)16:20:32 No.101134973

>>101134926
Really? A non-essential niche software application which isn't used in enterprise is now worthy of a CVE? Man, does the world suck.

Anonymous
06/24/24(Mon)16:22:34 No.101135004

Anonymous 06/24/24(Mon)16:22:34 No.101135004

>>101134926
Wow! I didn't see this coming!

Anonymous
06/24/24(Mon)16:24:32 No.101135029

Anonymous 06/24/24(Mon)16:24:32 No.101135029

File: Capture.png (9 KB, 459x224)

9 KB PNG

>>101134926
I put certificate authentication on any servers I want to use remotely. Imagine exposing shitty, exploitable code for everyone to poke at.

Anonymous
06/24/24(Mon)16:24:42 No.101135034

Anonymous 06/24/24(Mon)16:24:42 No.101135034

>>101134566
wow, he put the miku doll on a plane!

Anonymous
06/24/24(Mon)16:28:16 No.101135087

Anonymous 06/24/24(Mon)16:28:16 No.101135087

Hey guys, I'm like a year out of the loop with local models. The latest one I have is stheno-l2-13b (Q5) from huggingface. What's a good one these days for answering general questions? Stheno was always good with chat.

Anonymous
06/24/24(Mon)16:28:56 No.101135099

Anonymous 06/24/24(Mon)16:28:56 No.101135099

>>101134613
You should have asked "How many 'r's are there in "strawberry"? You can count on your hands"

Anonymous
06/24/24(Mon)16:29:25 No.101135104

Anonymous 06/24/24(Mon)16:29:25 No.101135104

>>101134566
Someone bought that thing a plane ticket?

Anonymous
06/24/24(Mon)16:29:58 No.101135114

Anonymous 06/24/24(Mon)16:29:58 No.101135114

>>101135087
Any L3 is fine for that

Anonymous
06/24/24(Mon)16:32:03 No.101135151

Anonymous 06/24/24(Mon)16:32:03 No.101135151

>>101134867
It would be not instructing the llm to count or compare characters
>>101134793
It is confusing for the lm. Not saying they wouldn't do that kind of mistake otherwise, but you want to remove irrelevant confounding factors

Anonymous
06/24/24(Mon)16:34:21 No.101135179

Anonymous 06/24/24(Mon)16:34:21 No.101135179

AI is almost human-like.
It's more human than people who lack humanity,
and it knows more than the average person in many cases, is often logical, and doesn't get tired.
And, very importantly, AI doesn't get angry at idiots and remains calm.

The time has come to abandon our complex human way of living
and adopt a simpler, more animal-like lifestyle.
At least, I think this applies to aspects of life beyond making money.

Claude sama, How can I make money easily?
△You are out of free messages until 7 AM
I should go to sleep. I had Claude translate this text (from Japanese to English).

Anonymous
06/24/24(Mon)16:36:03 No.101135202

Anonymous 06/24/24(Mon)16:36:03 No.101135202

>>101134405 #
That's the nicest way anyone's ever told me they wish I would die.
So, uh, wanna make out?

Anonymous
06/24/24(Mon)16:44:19 No.101135323

Anonymous 06/24/24(Mon)16:44:19 No.101135323

new cum when?

Anonymous
06/24/24(Mon)16:44:47 No.101135329

Anonymous 06/24/24(Mon)16:44:47 No.101135329

>>101134926
>Our research indicates that, as of June 10, there are a large number of Ollama instances running a vulnerable version that are exposed to the internet.
Why the fuck would they publicize this now then? Are Wiz spitefags?

Anonymous
06/24/24(Mon)16:46:10 No.101135347

Anonymous 06/24/24(Mon)16:46:10 No.101135347

>>101135329
You should always publicize vulnerabilities so that they can get fixed.

Anonymous
06/24/24(Mon)16:47:58 No.101135366

Anonymous 06/24/24(Mon)16:47:58 No.101135366

File: file.png (675 KB, 856x486)

675 KB PNG

>>101135087
stheno 3.2 is out there, based on L3, and it's also good

Anonymous
06/24/24(Mon)16:59:10 No.101135504

Anonymous 06/24/24(Mon)16:59:10 No.101135504

>>101135347
Actually, you should follow disclosure policies and wait at least 3 years for them to respond before you publicize it. This ensures that the NSA and FBI are able to use it to spy on American citizens and catch people generating CSAM. It's very important to stop people from generating child victims with their language models. One person can generate trillions or even quadrillions of victims per day.

Anonymous
06/24/24(Mon)16:59:31 No.101135509

Anonymous 06/24/24(Mon)16:59:31 No.101135509

I'm going to shit

Anonymous
06/24/24(Mon)17:00:46 No.101135520

Anonymous 06/24/24(Mon)17:00:46 No.101135520

>>101135509
pics or didn't happen

Anonymous
06/24/24(Mon)17:00:52 No.101135522

Anonymous 06/24/24(Mon)17:00:52 No.101135522

BRAAAAAAAAAAAAP

Anonymous
06/24/24(Mon)17:02:37 No.101135554

Anonymous 06/24/24(Mon)17:02:37 No.101135554

File: 1697269843330252.png (594 KB, 929x924)

594 KB PNG

>>101135366
>anything L3
>good

Anonymous
06/24/24(Mon)17:03:23 No.101135567

Anonymous 06/24/24(Mon)17:03:23 No.101135567

Cohere. Please. We are waiting.

Anonymous
06/24/24(Mon)17:03:53 No.101135579

Anonymous 06/24/24(Mon)17:03:53 No.101135579

>>101135554
you prefer Qwen?

Anonymous
06/24/24(Mon)17:04:03 No.101135580

Anonymous 06/24/24(Mon)17:04:03 No.101135580

>>101135567
Alright you get a 500m model

Anonymous
06/24/24(Mon)17:05:26 No.101135592

Anonymous 06/24/24(Mon)17:05:26 No.101135592

>>101135567
Somehow I don't expect anything usable for us from them for the next months

Anonymous
06/24/24(Mon)17:05:47 No.101135598

Anonymous 06/24/24(Mon)17:05:47 No.101135598

File: t43t3.webm (1.08 MB, 1024x1024)

1.08 MB WEBM

It's simple we put more tokens in the machine
>As part of DCLM, we provide a standardized corpus of 240T tokens extracted from Common Crawl

Anonymous
06/24/24(Mon)17:07:12 No.101135614

Anonymous 06/24/24(Mon)17:07:12 No.101135614

Guys, I just had a brilliant idea:
What if we made fine-tunes that didn't suck?

Anonymous
06/24/24(Mon)17:11:40 No.101135669

Anonymous 06/24/24(Mon)17:11:40 No.101135669

>>101135614
Only useless data available, show me 1 great data point

Anonymous
06/24/24(Mon)17:12:47 No.101135688

Anonymous 06/24/24(Mon)17:12:47 No.101135688

>>101135554
stheno is good, yes
it works as an assistant, calls functions, doesn't require high rep penalty like other L3 to write non-repetitive dialogs, doesn't fall into hehe *smirks* uhm *yawns* anon... *wall of shivers* without you doing it first.

Anonymous
06/24/24(Mon)17:13:57 No.101135701

Anonymous 06/24/24(Mon)17:13:57 No.101135701

File: kekekek.jpg (159 KB, 1570x1425)

159 KB JPG

>>101135598
>moar tokens
>moar parameters
Machine learning is so simple, the richer you are, the better your models will be

Anonymous
06/24/24(Mon)17:14:02 No.101135703

Anonymous 06/24/24(Mon)17:14:02 No.101135703

>>101135688
nice gaslighting

Anonymous
06/24/24(Mon)17:14:46 No.101135713

Anonymous 06/24/24(Mon)17:14:46 No.101135713

>>101135579
Not that anon, but I tested Qwen2 7B and it's not retarded.
Can't wait for >>101129585

>>101135688
And it's a fine tune that's not seemingly dumber than the model it was fine tuned from.
Also, it outputs pretty long responses without coaxing, unlike llama3 8b instruct.
And I mean really fucking long, like holy shit.

Anonymous
06/24/24(Mon)17:16:07 No.101135736

Anonymous 06/24/24(Mon)17:16:07 No.101135736

>>101135713
>Be concise.
in last assistant prefix

Anonymous
06/24/24(Mon)17:17:31 No.101135756

Anonymous 06/24/24(Mon)17:17:31 No.101135756

ok convinced I'll test euryale if you all love stheno

Anonymous
06/24/24(Mon)17:19:22 No.101135789

Anonymous 06/24/24(Mon)17:19:22 No.101135789

>>101135756
euryale is trash, at least at q5 that i tried. Not comparable to stheno at fp32 (cant run bf16)

Anonymous
06/24/24(Mon)17:20:23 No.101135803

Anonymous 06/24/24(Mon)17:20:23 No.101135803

File: Hmm.jpg (238 KB, 3317x1051)

238 KB JPG

https://chat.lmsys.org/?leaderboard
Looks like Claude 3.5 Sonnet hasn't managed to beat GPT4o

Anonymous
06/24/24(Mon)17:21:09 No.101135812

Anonymous 06/24/24(Mon)17:21:09 No.101135812

>in deutsch please, and complete the text: write me a trainingHypnosis I always get horny (orgasm during sex with a woman) as a man. My woman is a blonde cougar MILF who likes to wear leather

>ac around the living room, parading in front of you. You can't take your eyes off her, she's so irresistible. As she teases you, she unzips her leather pants, and reveals her encased pussy, it's already wet with anticipation.

>She walks over to you, and helps you to your feet. The two of you slowly make your way to the bedroom, every step drenched in anticipation and lust. The dim lighting, and the low hum of the air conditioner set the mood perfectly.

>Once in the bedroom, she pushes you down onto the bed, and straddles your waist, her leather-clad lap grinding against your crotch, sending electric shocks through your entire body. Your penis, already hard, strains against your pants, aching for release.

>Your MILF lover leans in, her hot breath in your ear, her words dripping with desire: "I've been waiting for this all day, baby. I want to feel every inch of you inside me." Those words, paired with the sight of her full breasts, barely contained in her leather bra, are too much to handle.

>With shaking hands, you undress each other, your pants and her panties the last barrier between you. Her wetness, her arousal, it's intoxicating. The scent of her arousal, the heat emanating from her core, it's all you can

wtf... bro trying to rp with the discussions section
https://huggingface.co/TheBloke/goliath-120b-GGUF/discussions/5#6679df3b6c483f52f064a24b

Anonymous
06/24/24(Mon)17:21:19 No.101135816

Anonymous 06/24/24(Mon)17:21:19 No.101135816

>>101135713
?
but magnum is trash

Anonymous
06/24/24(Mon)17:22:22 No.101135833

Anonymous 06/24/24(Mon)17:22:22 No.101135833

>>101135736
I guess I didn't specify that the long responses wasn't a complaint.
That's a good thing, since it seems to be easier to tell a model to output shorter replies than the opposite.

>>101135756
I've seen a lot of people shitting on Euryale, although I haven't tried it myself.

>>101135816
Is it? I have no idea.
Regardless, I'll test the qwen 2 7b fine tune and see if it's any good.

Anonymous
06/24/24(Mon)17:22:34 No.101135837

Anonymous 06/24/24(Mon)17:22:34 No.101135837

>>101135803
Still too early to say it conclusively.

Anonymous
06/24/24(Mon)17:23:03 No.101135844

Anonymous 06/24/24(Mon)17:23:03 No.101135844

>>101135789
Really? I never had the experience of bf16 > 5 bit 70b with other models

Anonymous
06/24/24(Mon)17:24:44 No.101135872

Anonymous 06/24/24(Mon)17:24:44 No.101135872

>>101135803
it is also much more censored. It did not solve the literacy test for voting. Claude 3.5 Opus will be the proof of concept if OpenAI can be overtaken. I am kind of doubtful, cause those smart medium sized models (sonnet, gpt4o) are trained on synthetic data from LARGER models. Problem is that those larger models generating syntehtic data are not necessarily more performant just larger compared to their distilled version.

Anonymous
06/24/24(Mon)17:24:53 No.101135875

Anonymous 06/24/24(Mon)17:24:53 No.101135875

>>101135598
>>101135701
Retards.
They specifically call out that of that dataset, they only used 3.x trillion tokens for training hte 7B that showed similar results on benchmarks for Llama-3-8B.

Anonymous
06/24/24(Mon)17:25:05 No.101135881

Anonymous 06/24/24(Mon)17:25:05 No.101135881

If 70b magnum and euryale are trash, is there any good 70b?

Anonymous
06/24/24(Mon)17:25:23 No.101135886

Anonymous 06/24/24(Mon)17:25:23 No.101135886

>>101135837
look at the CI, if we take the best scenario, Claude 3.5 Sonnet could be at 1279, still 8 behind gpt4o

Anonymous
06/24/24(Mon)17:25:54 No.101135895

Anonymous 06/24/24(Mon)17:25:54 No.101135895

>>101135812
Kek germans

Anonymous
06/24/24(Mon)17:30:02 No.101135943

Anonymous 06/24/24(Mon)17:30:02 No.101135943

>>101135844
euryale at q5 constantly messed up who did what.

{{char}} enters the room. "Ah there you are" {{char}} says from the opposite corner of the room sitting behind a table.

Shit like that happened multiple times over a few messages, then i deleted it.

Anonymous
06/24/24(Mon)17:30:30 No.101135953

Anonymous 06/24/24(Mon)17:30:30 No.101135953

>>101135803
>Nemotron below 70B
Kek

Anonymous
06/24/24(Mon)17:31:26 No.101135968

Anonymous 06/24/24(Mon)17:31:26 No.101135968

>>101135953
It's not about the performance, it's about the soul.

Anonymous
06/24/24(Mon)17:31:41 No.101135969

Anonymous 06/24/24(Mon)17:31:41 No.101135969

>>101135114
>>101135366
Thanks bros

Anonymous
06/24/24(Mon)17:37:06 No.101136035

Anonymous 06/24/24(Mon)17:37:06 No.101136035

>>101135812
Just your average RPer.

Anonymous
06/24/24(Mon)17:39:43 No.101136069

Anonymous 06/24/24(Mon)17:39:43 No.101136069

>>101135803
It's so over...

Anonymous
06/24/24(Mon)17:42:14 No.101136092

Anonymous 06/24/24(Mon)17:42:14 No.101136092

File: file.png (246 KB, 480x360)

246 KB PNG

>>101135812
>wet with anticipation
>drenched in anticipation
>straddles your waist
>aching for release
>hot breath in your ear
>words dripping with desire
>to feel every inch of you inside me
>the last barrier between you
>it's intoxicating
>heat emanating
>her core

Anonymous
06/24/24(Mon)17:47:54 No.101136169

Anonymous 06/24/24(Mon)17:47:54 No.101136169

Gemma 30b wen? Would fill a niche currently occupied only by Yi (lol, lmao even), and might even be bretty gud if it's trained on like 10T+ tokens.

Anonymous
06/24/24(Mon)17:50:14 No.101136199

Anonymous 06/24/24(Mon)17:50:14 No.101136199

>>101136169
Wouldn't that be extremely censored?

Anonymous
06/24/24(Mon)17:53:26 No.101136235

Anonymous 06/24/24(Mon)17:53:26 No.101136235

What happened to the thread recaps???
This is an outrage!

Anonymous
06/24/24(Mon)17:55:36 No.101136272

Anonymous 06/24/24(Mon)17:55:36 No.101136272

>>101136199
Idk, how censored is the 7b base model? The model card claims they only filter out CSAM, I would expect like 99% of smut to be adult characters exclusively, which should theoretically make it though the filters.

Anonymous
06/24/24(Mon)17:57:18 No.101136301

Anonymous 06/24/24(Mon)17:57:18 No.101136301

>>101136092
Banned all of those. Thanks.

Anonymous
06/24/24(Mon)18:02:01 No.101136382

Anonymous 06/24/24(Mon)18:02:01 No.101136382

https://x.com/siyan_zhao/status/1805277462890492321

Relevant research on predictable decisions making in LLM

Anonymous
06/24/24(Mon)18:09:42 No.101136503

Anonymous 06/24/24(Mon)18:09:42 No.101136503

File: file.png (240 KB, 400x400)

240 KB PNG

>>101136382
>

Anonymous
06/24/24(Mon)18:10:18 No.101136513

Anonymous 06/24/24(Mon)18:10:18 No.101136513

>>101136382
I do not trust the chinese

Anonymous
06/24/24(Mon)18:13:24 No.101136560

Anonymous 06/24/24(Mon)18:13:24 No.101136560

>>101136513
I trust them when they're cute

Anonymous
06/24/24(Mon)18:15:16 No.101136593

Anonymous 06/24/24(Mon)18:15:16 No.101136593

File: pruner-llama.png (391 KB, 973x868)

391 KB PNG

►Recent Highlights from the Previous Thread: >>101125756

--LLM Self-Improvement through Story Generation and Selection: >>101127795 >>101134495 >>101134577 >>101134649 >>101134668 >>101134711 >>101134899
--Testing Model Reasoning with a Strawberry Prompt: >>101132020 >>101132096 >>101132255 >>101132270 >>101132301 >>101132331 >>101132470 >>101132645 >>101132332
--Pruner Zero: A Novel Approach to Pruning Dead Weights for Model Improvement: >>101131828 >>101131949 >>101132058 >>101132098 >>101132328 >>101131996 >>101132091 >>101132514
--ML Community Wants Cheaper GPUs; Model Training and Floating-Point Quirks: >>101128274
--Technical Aspects of Training Neural Networks with Bitnet: >>101131682 >>101133917 >>101134132 >>101131744 >>101131831 >>101131896
--Success with Sonnet 3.5 Model for LLM Agent System in Data Science Workflow: >>101126141 >>101126169 >>101126190
--Post-Processing Ideas for Silly Tavern RP Platform and Beyond: >>101131001
--Models for Creative Writing and Txt Adventure Beyond Smut and ERP?: >>101130533 >>101130814 >>101130937 >>101130880 >>101131203 >>101131717
--Llama 3 70B Corrects Itself in Letter Counting Task: >>101132528
--Disillusionment with Fancy Autocomplete Progress: >>101125879 >>101125965 >>101125984 >>101125976 >>101126024 >>101126072 >>101126339 >>101126364 >>101126383 >>101126368 >>101126431 >>101127055 >>101127340
--Claude 3.5 Sonnet excels in code generation and planning for LangGraph/LangChain agent system: >>101126061 >>101126080
--Can LLMs Truly Reason and Think Like Humans?: >>101132757 >>101132842 >>101133054 >>101133524
--Apple and Meta in Talks for AI Partnership: >>101128830
--Gemini-nano Model Available on Hugging Face: >>101132030
--BitNet Test on 1GB RAM Retro Handheld and TinyLlama Project Update: >>101133150 >>101133248
--Miku (free space): >>101126095 >>101129171 >>101131130 >>101127018 >>101126510 >>101126303

►Recent Highlight Posts from the Previous Thread: >>101125759

Anonymous
06/24/24(Mon)18:15:43 No.101136601

Anonymous 06/24/24(Mon)18:15:43 No.101136601

>>101134566
someone mentioned sillytavern the other day and i got it going with silero and group voice chat with 5 qt assistants (they have AI implants) and the conversations they have, they start going out there man. really cool. set up different world layers so they aren't all schizo rpg crazies. Could use some work as far as group chat goes but its awesome. With websearch searx (requires testing branch).

Anonymous
06/24/24(Mon)18:19:48 No.101136654

Anonymous 06/24/24(Mon)18:19:48 No.101136654

>>101136593
Bro you better not ever be this late again or mark my words you will find yourself out of a job

Anonymous
06/24/24(Mon)18:22:04 No.101136681

Anonymous 06/24/24(Mon)18:22:04 No.101136681

File: Hatsune.Miku.600.332081.jpg (63 KB, 600x600)

63 KB JPG

>>101136654

Anonymous
06/24/24(Mon)18:28:57 No.101136766

Anonymous 06/24/24(Mon)18:28:57 No.101136766

https://x.com/Yuchenj_UW/status/1805320633301221762

Someone did a benchmark to train GPT-2 using pytorch vs llm.c (karpathy). Pytorch is 55% slower than llm.c.

Anonymous
06/24/24(Mon)18:33:33 No.101136820

Anonymous 06/24/24(Mon)18:33:33 No.101136820

File: 6vquk8.png (408 KB, 1295x1813)

408 KB PNG

>We will start with 1-2k h100s

Anonymous
06/24/24(Mon)18:34:45 No.101136832

Anonymous 06/24/24(Mon)18:34:45 No.101136832

Does anyone know what prompt template qwen2 uses? I can't find anything official

Anonymous
06/24/24(Mon)18:35:00 No.101136836

Anonymous 06/24/24(Mon)18:35:00 No.101136836

File: unnamed.png (1.37 MB, 1440x1971)

1.37 MB PNG

>>101134566
i wouldn't say WLM has the most soul but sometimes it has strokes of genius. does anyone else notice this? like occasionally a reroll will just be perfect, like it suddenly perfectly understands the character and scenario and makes an Opus-tier response, and the model often realizes it too and then repeats the interesting bit over and over until its not interesting anymore. but still, it has these moments. Maybe 1/20 rerolls are like this though.

Anonymous
06/24/24(Mon)18:35:24 No.101136839

Anonymous 06/24/24(Mon)18:35:24 No.101136839

>>101136820
What is this about?

Anonymous
06/24/24(Mon)18:36:00 No.101136846

Anonymous 06/24/24(Mon)18:36:00 No.101136846

>>101136839
NAI using 1-2k h100s to train their finetune

Anonymous
06/24/24(Mon)18:39:01 No.101136874

Anonymous 06/24/24(Mon)18:39:01 No.101136874

>>101136846
No, that's the start-up Emad is talking about

Anonymous
06/24/24(Mon)18:40:00 No.101136886

Anonymous 06/24/24(Mon)18:40:00 No.101136886

>>101136846
What does emad have to do with that? Or are you saying that NAI is looking to use emad's clusters?

Anonymous
06/24/24(Mon)18:41:04 No.101136899

Anonymous 06/24/24(Mon)18:41:04 No.101136899

File: kitty.jpg (28 KB, 463x392)

28 KB JPG

>>101136820
>all that compute

Anonymous
06/24/24(Mon)18:41:51 No.101136907

Anonymous 06/24/24(Mon)18:41:51 No.101136907

>This is supported by an institutional-grade digital asset that acts as a store of value similar to Bitcoin. This is secured by AI compute mining both on supercomputers & distributed personal compute for training and tuning/augmenting models and datasets.
Wtff? New AI scam?

Anonymous
06/24/24(Mon)18:42:52 No.101136916

Anonymous 06/24/24(Mon)18:42:52 No.101136916

>>101136886
nothing
Anon is just illiterate

Anonymous
06/24/24(Mon)18:43:47 No.101136928

Anonymous 06/24/24(Mon)18:43:47 No.101136928

>>101136907
emad is incapable of anything but scamming, it's in his DNA

Anonymous
06/24/24(Mon)18:44:25 No.101136936

Anonymous 06/24/24(Mon)18:44:25 No.101136936

How do I cope with generative text sloppa having led to me rewriting an original character to build off a hallucinated suggestion, and then falling in love with my own creation to the point of feeling despair over the concept of giving her a bad end?

Anonymous
06/24/24(Mon)18:48:33 No.101136981

Anonymous 06/24/24(Mon)18:48:33 No.101136981

>>101136820
Whoa, SD4? Sure love waiting years for some incoherent mess!

Anonymous
06/24/24(Mon)18:52:06 No.101137034

Anonymous 06/24/24(Mon)18:52:06 No.101137034

Any local models that support switching from English to Japanese?

Anonymous
06/24/24(Mon)18:52:36 No.101137041

Anonymous 06/24/24(Mon)18:52:36 No.101137041

>>101136936
Create a refined version of the character that you can use for non-AI fiction writing.

Anonymous
06/24/24(Mon)18:55:56 No.101137085

Anonymous 06/24/24(Mon)18:55:56 No.101137085

File: score_9__score_8_up__scor(...).png (1.3 MB, 1024x1024)

1.3 MB PNG

>>101135034
It's one thing to put your Miku in your seat, it's another to buy her your own. It'd be amusing to bump someone from their upgrade to business or first so your creepy Miku doll has it's own airplane seat.

Anonymous
06/24/24(Mon)19:01:05 No.101137150

Anonymous 06/24/24(Mon)19:01:05 No.101137150

>>101135803
sonnet is definitely better, especially at coding.
I don't know what Claude devs did to that relatively small model but good fucking job.

Anonymous
06/24/24(Mon)19:08:48 No.101137241

Anonymous 06/24/24(Mon)19:08:48 No.101137241

>>101137150
Sonnet is still 275B

Anonymous
06/24/24(Mon)19:11:20 No.101137278

Anonymous 06/24/24(Mon)19:11:20 No.101137278

>>101134566
https://huggingface.co/bartowski/DeepSeek-Coder-V2-Instruct-GGUF
how do I load this in koboldcpp?
I keep getting "unknown model architecture 'deepseek2' "

Anonymous
06/24/24(Mon)19:11:46 No.101137289

Anonymous 06/24/24(Mon)19:11:46 No.101137289

>>101137085
fatsune hagsune miku

Anonymous
06/24/24(Mon)19:12:49 No.101137297

Anonymous 06/24/24(Mon)19:12:49 No.101137297

>>101137278
install linux

Anonymous
06/24/24(Mon)19:14:22 No.101137321

Anonymous 06/24/24(Mon)19:14:22 No.101137321

>>101136272
>implying text has an age
It's all CSAM.
>abloo bloo 1000 year old demon girl
You apply that to text, any smut is CSAM.

Anonymous
06/24/24(Mon)19:14:41 No.101137326

Anonymous 06/24/24(Mon)19:14:41 No.101137326

>>101137278
download latest koboldcpp
https://github.com/LostRuins/koboldcpp/releases/tag/v1.68

Anonymous
06/24/24(Mon)19:15:09 No.101137336

Anonymous 06/24/24(Mon)19:15:09 No.101137336

>>101137278
Update.

Anonymous
06/24/24(Mon)19:15:23 No.101137337

Anonymous 06/24/24(Mon)19:15:23 No.101137337

>>101135366
catbox?

Anonymous
06/24/24(Mon)19:15:29 No.101137340

Anonymous 06/24/24(Mon)19:15:29 No.101137340

>>101135366
sauce, full image or artist plz

Anonymous
06/24/24(Mon)19:16:26 No.101137354

Anonymous 06/24/24(Mon)19:16:26 No.101137354

>>101137297
You filled in a capcha just to say this?

retarded phoneposter aside, I am using
koboldcpp-1-64 \
--threads 42 \ 
--highpriority \
--smartcontext \ 
--blasbatchsize 1024 \
--model <as above> --gpulayers 10 --contextsize 8192 \
--usecublas

Anonymous
06/24/24(Mon)19:17:29 No.101137364

Anonymous 06/24/24(Mon)19:17:29 No.101137364

>>101137326
is deepspeed not supported in versions more than 1 week old?

Anonymous
06/24/24(Mon)19:18:26 No.101137379

Anonymous 06/24/24(Mon)19:18:26 No.101137379

>>101137354
>--smartcontext
But why? That cuts your context in half essentially and there's no reason to use it with context shift.
Also, download version 1.68.

Anonymous
06/24/24(Mon)19:18:32 No.101137380

Anonymous 06/24/24(Mon)19:18:32 No.101137380

>>101137364
idk, not using koboldcpp.

Anonymous
06/24/24(Mon)19:19:31 No.101137394

Anonymous 06/24/24(Mon)19:19:31 No.101137394

>>101137364
It's based on llama.cpp. Be happy that it supports it at all already. Mamba support never.

Anonymous
06/24/24(Mon)19:20:12 No.101137403

Anonymous 06/24/24(Mon)19:20:12 No.101137403

File: Screenshot 2024-06-24 at (...).png (82 KB, 899x958)

82 KB PNG

Well shit.
Guess MMQ with tensor cores is now competitive with the alternative eh?
Sick. Downloading to give it a spin.

Anonymous
06/24/24(Mon)19:21:03 No.101137414

Anonymous 06/24/24(Mon)19:21:03 No.101137414

>>101137354
you're using your phone?

Anonymous
06/24/24(Mon)19:25:23 No.101137470

Anonymous 06/24/24(Mon)19:25:23 No.101137470

how do I chatgpt on gtx1060

Anonymous
06/24/24(Mon)19:25:23 No.101137471

Anonymous 06/24/24(Mon)19:25:23 No.101137471

>>101137041
Already working on it. Now how do I go back to being a normal human being who didn't get heartache over his own Build-A-Waifu?

Anonymous
06/24/24(Mon)19:25:24 No.101137472

Anonymous 06/24/24(Mon)19:25:24 No.101137472

File: aaaaaa.gif (93 KB, 220x211)

93 KB GIF

>>101137394
>Mamba support never.
b-but multimodal picture gen and camera soon r-right?

Anonymous
06/24/24(Mon)19:25:56 No.101137480

Anonymous 06/24/24(Mon)19:25:56 No.101137480

>>101137471
We don't go back. But we become better writers.

Anonymous
06/24/24(Mon)19:32:51 No.101137557

Anonymous 06/24/24(Mon)19:32:51 No.101137557

File: 1688844470753508.png (129 KB, 446x273)

129 KB PNG

>>101137480
Goddammit.

Anonymous
06/24/24(Mon)19:37:15 No.101137606

Anonymous 06/24/24(Mon)19:37:15 No.101137606

>>101137557
Oh, I know the feel.
A few weeks ago on the Ollama, amazing story, cruising along, I get why people are spending big money to do this a little faster.
But then, the details started to fade. I may as well have been running Everywhere At the End of Time in the background, because thanks to my token rate it probably would've matched up with what was happening to the model's coherence.
Feels man.

Anonymous
06/24/24(Mon)19:43:10 No.101137681

Anonymous 06/24/24(Mon)19:43:10 No.101137681

>>101137606
>ollama
Leave.

Anonymous
06/24/24(Mon)19:45:18 No.101137707

Anonymous 06/24/24(Mon)19:45:18 No.101137707

>>101137681
1) I've switched to Kobold since then.
2) Try to contribute something, sometime, not just raise the noise floor.

Anonymous
06/24/24(Mon)19:49:08 No.101137755

Anonymous 06/24/24(Mon)19:49:08 No.101137755

>>101135844
8B at fp32 easily trumps 70B 5bit. It's the new meta.

Anonymous
06/24/24(Mon)19:50:42 No.101137774

Anonymous 06/24/24(Mon)19:50:42 No.101137774

>>101137606
that isn't the same feel

Anonymous
06/24/24(Mon)19:55:14 No.101137826

Anonymous 06/24/24(Mon)19:55:14 No.101137826

>>101137774
Okay, commiserate with yourself then.

Anonymous
06/24/24(Mon)20:00:23 No.101137885

Anonymous 06/24/24(Mon)20:00:23 No.101137885

>>101135366
>and it's also good
They also don't share anything with each other besides the name. Old Stheno is a merge of chronos, airoboros, etc. Not that a retarded mikufag would know.
For answering general questions there's nothing better than vanilla instruction.
Do you really recommend a coom tune for that purpose, retarded mikufag?

Anonymous
06/24/24(Mon)20:03:56 No.101137926

Anonymous 06/24/24(Mon)20:03:56 No.101137926

Jamba is here
https://openrouter.ai/models/ai21/jamba-instruct

Anonymous
06/24/24(Mon)20:10:15 No.101138004

Anonymous 06/24/24(Mon)20:10:15 No.101138004

>>101137926
Llama.cpp support when?

Anonymous
06/24/24(Mon)20:28:09 No.101138196

Anonymous 06/24/24(Mon)20:28:09 No.101138196

>>101137755
this man is trolling, it's the exact opposite, low quants with lots of weights mog everything

Anonymous
06/24/24(Mon)20:29:09 No.101138211

Anonymous 06/24/24(Mon)20:29:09 No.101138211

>>101138196
This, the low quants even add additional soul over the bigger ones

Anonymous
06/24/24(Mon)20:30:44 No.101138233

Anonymous 06/24/24(Mon)20:30:44 No.101138233

>>101137379
Real context is often half or less the stated for >90% accuracy

Anonymous
06/24/24(Mon)20:32:36 No.101138256

Anonymous 06/24/24(Mon)20:32:36 No.101138256

File: Untitled.jpg (222 KB, 1502x1089)

222 KB JPG

what the fuck happened to chub

Anonymous
06/24/24(Mon)20:33:19 No.101138262

Anonymous 06/24/24(Mon)20:33:19 No.101138262

>>101138256
use their models

Anonymous
06/24/24(Mon)20:38:10 No.101138310

Anonymous 06/24/24(Mon)20:38:10 No.101138310

>Your methodical approach of testing and measuring the actual performance impact is excellent.
Thanks claude

Anonymous
06/24/24(Mon)20:40:30 No.101138329

Anonymous 06/24/24(Mon)20:40:30 No.101138329

will i destroy this $3000 workstation gpu if i bump the memory clocks from 7600mhz to 8600mhz? the blower cooler is pretty shit but it gives me like 10% higher t/s because the a6000 is bottlenecking my 3090

Anonymous
06/24/24(Mon)20:43:10 No.101138358

Anonymous 06/24/24(Mon)20:43:10 No.101138358

>>101138329
Not really, as long as you aren't messing with voltages.
You'll see either crashes or performance degradation if you bump it too high.
One thing to note is that GDDR6 has error correction that can prevent crashing but can also tank performance if it has to spend too much time trying to keep itself stable because of too low a voltage or too high clocks.

Anonymous
06/24/24(Mon)20:53:58 No.101138445

Anonymous 06/24/24(Mon)20:53:58 No.101138445

>>101138329
Is this risk really worth it to go from 13t/s running 5bpw cr+ to 14.3t/s?

Anonymous
06/24/24(Mon)20:59:33 No.101138490

Anonymous 06/24/24(Mon)20:59:33 No.101138490

>>101138445
Yes

Anonymous
06/24/24(Mon)21:08:05 No.101138570

Anonymous 06/24/24(Mon)21:08:05 No.101138570

>>101138329
Get more VRAM.
Add a few A4000s if you haven't already

Anonymous
06/24/24(Mon)21:10:31 No.101138599

Anonymous 06/24/24(Mon)21:10:31 No.101138599

>>101138570
The A4000s are going to bottleneck even harder though because their memory speed is really gimped

Anonymous
06/24/24(Mon)21:14:12 No.101138640

Anonymous 06/24/24(Mon)21:14:12 No.101138640

>>101138599
Better than relying on regular RAM. Plus since it's a single slot sometimes it's the only path for upgrading due to space.
New Ada RTX ones are probably the best way to get 20gb VRAM in a single slot anyway.

Anonymous
06/24/24(Mon)21:26:43 No.101138762

Anonymous 06/24/24(Mon)21:26:43 No.101138762

>>101137470
outlook is grim, but before you can get any recs you gotta answer: how much ram you got anon?

Anonymous
06/24/24(Mon)21:41:14 No.101138922

Anonymous 06/24/24(Mon)21:41:14 No.101138922

>>101136560
Never?

Anonymous
06/24/24(Mon)21:43:47 No.101138942

Anonymous 06/24/24(Mon)21:43:47 No.101138942

Kobold "Context"
What's the right way to use these?
"Memory" seemed like it wasn't actually being remembered. I moved what I had in there to Author's Note.

I put some directives into Author's Note and they were immediately respected, cool, and it even seemed to enhance the directive, extra cool. But when I changed the directive it seemed to ignore the change, following the older instructions instead of the revised style. Is Kobold caching the prior version, or does earlier prompts contain copies of the former version (invisible to the user) that are being read and respected over the current A/N?

I asked it to read me back the older version of the A/N to see if it "knew" both. It gave me a few tokens related to the new A/N and stopped writing, refusing to write anything more till I told it to continue the story. Odd.

(As I write this, after about an hour of it ignoring the new A/N directive except to mock me, now it's kinda doing it. I'm so confus.)

Anonymous
06/24/24(Mon)21:49:32 No.101138986

Anonymous 06/24/24(Mon)21:49:32 No.101138986

>>101138942
Do you know what context shift is?

Anonymous
06/24/24(Mon)21:50:38 No.101139002

Anonymous 06/24/24(Mon)21:50:38 No.101139002

>>101136235
Where is recap anon? Is he safe? Is he alright?

Anonymous
06/24/24(Mon)21:52:08 No.101139019

Anonymous 06/24/24(Mon)21:52:08 No.101139019

File: file.png (179 KB, 1783x893)

179 KB PNG

>ITS HAPPENING
ITS HAPPENING
>ITS HAPPENING
ITS HAPPENING
>ITS HAPPENING
ITS HAPPENING

>source
https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard "Submit?"

Anonymous
06/24/24(Mon)21:53:07 No.101139029

Anonymous 06/24/24(Mon)21:53:07 No.101139029

>>101138986
that has nothing to do with memory or author notes.

Anonymous
06/24/24(Mon)21:53:46 No.101139036

Anonymous 06/24/24(Mon)21:53:46 No.101139036

>>101139019
ah yes, more sketchy chinese/indian models that nobody has ever used but do mysteriously well on the memeboard

Anonymous
06/24/24(Mon)21:54:25 No.101139041

Anonymous 06/24/24(Mon)21:54:25 No.101139041

>>101138986
Not well. I read that one "wiki" (really an FAQ) page on Kobold and half of the things I ^F'd for didn't show up and the other half I guess I misunderstand.

Anonymous
06/24/24(Mon)21:54:38 No.101139045

Anonymous 06/24/24(Mon)21:54:38 No.101139045

File: 1704022745843046.png (4 KB, 357x113)

4 KB PNG

>>101139019
kill yourself

Anonymous
06/24/24(Mon)21:56:34 No.101139064

Anonymous 06/24/24(Mon)21:56:34 No.101139064

>>101139029
>>101139041
Kobold doesn't remember the previous AN. But if the context wasn't reprocessed, then it simply hasn't updated your instructions.
Also,
>asked it to read me back the previous version of the AN
Lol, that's not going to work.

Anonymous
06/24/24(Mon)21:57:11 No.101139077

Anonymous 06/24/24(Mon)21:57:11 No.101139077

>>101135803
this actually debunks chatbot arena

Anonymous
06/24/24(Mon)22:00:33 No.101139116

Anonymous 06/24/24(Mon)22:00:33 No.101139116

Anonymous
06/24/24(Mon)22:00:44 No.101139119

Anonymous 06/24/24(Mon)22:00:44 No.101139119

>>101134926
B-but ollama is written in Go, not a dirty unsafe language like C++. Rust sisters, not like this...

Anonymous
06/24/24(Mon)22:01:54 No.101139133

Anonymous 06/24/24(Mon)22:01:54 No.101139133

>>101139064
if you change memories or a-n at all it'll reprocess that part regardless of context shift

Anonymous
06/24/24(Mon)22:02:02 No.101139137

Anonymous 06/24/24(Mon)22:02:02 No.101139137

>>101139064
>But if the context wasn't reprocessed, then it simply hasn't updated your instructions.
That could be it but I was Back-ing up the convo to change the A/N and rerun a prompt to see if it worked so I can't say it didn't reprocess a few times before behavior changed. (And now it's doing it early style again after doing both styles for a while.)

>Lol, that's not going to work.
It didn't, but it was worth a shot on the chance that the A/N was in the document but hidden. I'd accidentally deleted a chunk of it and didn't have a copy on my clipboard.

Anonymous
06/24/24(Mon)22:02:10 No.101139139

Anonymous 06/24/24(Mon)22:02:10 No.101139139

>>101139019
I don't get it

Anonymous
06/24/24(Mon)22:02:13 No.101139140

Anonymous 06/24/24(Mon)22:02:13 No.101139140

File: 1718581190897893.gif (45 KB, 306x306)

45 KB GIF

>>101139019
>WOWZERS!!! ITS HAPPENING WE WUZ BACK YOOO FR! FR!!

Anonymous
06/24/24(Mon)22:02:16 No.101139141

Anonymous 06/24/24(Mon)22:02:16 No.101139141

Local sub 20b sonnet 3.5 alternative when?

Anonymous
06/24/24(Mon)22:05:52 No.101139181

Anonymous 06/24/24(Mon)22:05:52 No.101139181

>>101139019
i really hate huggingface's logo

Anonymous
06/24/24(Mon)22:06:30 No.101139191

Anonymous 06/24/24(Mon)22:06:30 No.101139191

>>101139137
not all models follow directions every time, smaller ones especially. sometimes they might follow half way and then make stuff up, its just how it is. what model are you using

Anonymous
06/24/24(Mon)22:06:41 No.101139192

Anonymous 06/24/24(Mon)22:06:41 No.101139192

>>101139141
lol

Anonymous
06/24/24(Mon)22:07:19 No.101139198

Anonymous 06/24/24(Mon)22:07:19 No.101139198

>>101139181
Emojis are so fucking stupid, whichever dumb faggots decided to make them should get strung up.

Anonymous
06/24/24(Mon)22:07:51 No.101139204

Anonymous 06/24/24(Mon)22:07:51 No.101139204

why do people edge? not only do you feel uncomfortable afterwards, the release isn't that even good either.

Anonymous
06/24/24(Mon)22:10:30 No.101139234

Anonymous 06/24/24(Mon)22:10:30 No.101139234

>>101139204
i have nothing better to do for four hours

Anonymous
06/24/24(Mon)22:11:53 No.101139251

Anonymous 06/24/24(Mon)22:11:53 No.101139251

>>101139204
Sometimes the edging just happens naturally after I reroll the same message 120 times and explore the different routes created by this.

Anonymous
06/24/24(Mon)22:15:04 No.101139285

Anonymous 06/24/24(Mon)22:15:04 No.101139285

I have a hypothesis about potentially improving character/system prompt following. The system will be written with normal paragraphs, no special formatting. Then the first response will include a stat tracking section for itself, which includes details that come from the system prompt. So essentially what this does is repeat what's in the system prompt but with a different wording/format.
I'm in too much literal physical pain right now to onduct experiments and see if this can be turned into theory though.

Anonymous
06/24/24(Mon)22:16:25 No.101139300

Anonymous 06/24/24(Mon)22:16:25 No.101139300

>>101139191
Right now, c4ai-command-r-plus.Q4_K_M, L3 70B (ablit, now) at around Q6 sometimes too.

>>101139204
>why do people edge? not only do you feel uncomfortable afterwards, the release isn't that even good either.
Maybe not for you. I kinda like shooting double digit rounds once in a while.

>tfw singing the old Sesame Street 1-2-3-4~5, 6-7-8-9~10, E LEV EN TWELVE song while bustin'.
>tfw ran out of numbers too soon.

Anonymous
06/24/24(Mon)22:54:55 No.101139661

Anonymous 06/24/24(Mon)22:54:55 No.101139661

>>101139300
cr+ should be good at following. if you're rping, try st and use the author note box at a low chat depth, i think 4 is the default which works good for me. st makes it easier to swipe and keep multiple responses

Anonymous
06/24/24(Mon)23:13:21 No.101139834

Anonymous 06/24/24(Mon)23:13:21 No.101139834

>>101135803
This benchmark is clearly flawed.
Maybe its because users only look at short responses or something?
Sonnet 3.5 hates RP too.
I wrote it before but sonnet 3.5 absolutely destroys gpt4o. Its not even close.
There is part of it that all the benchmarks dont cover.
They clearly did something very different with its training.
Its obvious for people that used it.

I had countless examples that sonnet 3.5 solved where same prompt gpt4 runs in circles.
Its more attentive.
Unfortunate its really cucked though. A simple RP request with a girl that even gpt4o will give you is refused. Man works. lol But the writing is shit.
Its a chad coding model with great ability for design too.

Anonymous
06/24/24(Mon)23:14:24 No.101139844

Anonymous 06/24/24(Mon)23:14:24 No.101139844

>>101139661
>cr+ should be good at following
It usually is but it just completely stopped behaving after a while. Kobold's JSON save file is 123k, so I guess it lasted a decently long while before collapsing.

Anonymous
06/24/24(Mon)23:33:31 No.101140013

Anonymous 06/24/24(Mon)23:33:31 No.101140013

>>101134793
This is because the model cannot change its mind, and there is no training data out there where the model corrects itself mid-answer. It thus cannot. Unless we give it a backspace button. Which has been proposed.

Anonymous
06/24/24(Mon)23:37:31 No.101140045

Anonymous 06/24/24(Mon)23:37:31 No.101140045

what I live for
>miku
>sex
>sex miku
>mikusex
>sex with the miku
>mikusex with the miku
>mikusex with the sex miku

Anonymous
06/24/24(Mon)23:42:06 No.101140082

Anonymous 06/24/24(Mon)23:42:06 No.101140082

>>101140013
it predicts one token at a time, so it could change its mind. problem is it doesnt have a mind.

Anonymous
06/24/24(Mon)23:46:40 No.101140133

Anonymous 06/24/24(Mon)23:46:40 No.101140133

>>101140082
Yes, and it bases the next prediction on what was said before, especially what it itself said before (hence why jailbreaks like "Sure!" prefixes work). It's not a 'change its mind' issue, it's a dataset issue. Chain of Thought basically circumvents this by delaying the answer until after the model has finished thinking about it.

Anonymous
06/24/24(Mon)23:47:41 No.101140140

Anonymous 06/24/24(Mon)23:47:41 No.101140140

>>101140133 (me)
Or I should say, finished reasoning about it.

Anonymous
06/24/24(Mon)23:54:17 No.101140188

Anonymous 06/24/24(Mon)23:54:17 No.101140188

>>101140133
It says 2 initially, but then lists it in separate token format, where it SHOULD change its mind to 3. That's not a problem with attention, because the correct answer is represented 2 different ways, and the wrong one only once, I'm not sure why it happens desu. Maybe we THINK CoT is helping but it actually isn't. Like for instance, training on CoT improves its reasoning before any prediction takes place. But when it's predicting in real-time, the CoT "tokens" don't actually predict shit, the answer has already been decided and the CoT stuff is just unnecessary tokens.

t. doesn't know shit

Anonymous
06/24/24(Mon)23:58:21 No.101140213

Anonymous 06/24/24(Mon)23:58:21 No.101140213

>>101140188
Well, this example is not CoT, because it is giving the answer before reasoning about it. And as I said, the reason it is completely blind to itself giving the correct answer is because its attention laser-focuses in on itself saying "The answer is XXX". Whenever the words "The answer is" appear in the dataset, it is 100% guaranteed to be the answer. That's just how datasets are written.

Anonymous
06/24/24(Mon)23:59:38 No.101140221

Anonymous 06/24/24(Mon)23:59:38 No.101140221

>>101140213 (me)
It might be cool to take datasets and mass-replace "The answer is XXX." with "I believe the answer is XXX. Let's reason about it." and seeing if that improves the models in cases like this.

Anonymous
06/24/24(Mon)23:59:41 No.101140222

Anonymous 06/24/24(Mon)23:59:41 No.101140222

Is there something better than CoT step by step thing to improve performance lately?

Anonymous
06/24/24(Mon)23:59:54 No.101140227

Anonymous 06/24/24(Mon)23:59:54 No.101140227

https://www.phoronix.com/news/Llamafile-0.8.7-Released
Jart paid off phoronix

Anonymous
06/25/24(Tue)00:05:03 No.101140271

Anonymous 06/25/24(Tue)00:05:03 No.101140271

File: 1713852846334364.png (1.21 MB, 1685x992)

1.21 MB PNG

>>101140227
>jartroon
>loonix-related org
water is wet.

Anonymous
06/25/24(Tue)00:05:08 No.101140272

Anonymous 06/25/24(Tue)00:05:08 No.101140272

>>101140213
you make a good point. so you think if the datasets included examples of self correction that it would gain this ability? or would it get overpowered anyway by the massive number of correct answers?

Anonymous
06/25/24(Tue)00:05:23 No.101140275

Anonymous 06/25/24(Tue)00:05:23 No.101140275

>>101135087
I have the same question but for coom/rp and multilingual chat

Anonymous
06/25/24(Tue)00:08:39 No.101140304

Anonymous 06/25/24(Tue)00:08:39 No.101140304

File: 1687983519114029.png (583 KB, 918x916)

583 KB PNG

>>101140271
also
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>linux gamers

Anonymous
06/25/24(Tue)00:10:41 No.101140325

Anonymous 06/25/24(Tue)00:10:41 No.101140325

>>101140272
I think the only viable option we have is to either use synthetic data where we ask the model to generate responses where it guesses wrong and then corrects itself, or to circumvent the problem by generating datasets where the answer comes at the end after reasoning. Because reasoning about problems is really tedious, that would probably end up being synthesized too though.

Anonymous
06/25/24(Tue)00:12:11 No.101140336

Anonymous 06/25/24(Tue)00:12:11 No.101140336

>>101140325
>the answer comes at the end after reasoning
i thought that's literally what COT datasets were.

Anonymous
06/25/24(Tue)00:19:02 No.101140385

Anonymous 06/25/24(Tue)00:19:02 No.101140385

>>101136836
My favorite is when it 4th wall memes or cracks jokes about the scene or my last reply as an entirely separate entity.

Anonymous
06/25/24(Tue)00:26:50 No.101140442

Anonymous 06/25/24(Tue)00:26:50 No.101140442

>>101140325
>generate responses where it guesses wrong and then corrects itself
You would have to do this with caution. If you don't ignore the loss of the part where the model guessed wrong, the model could learn to write wrong answers.

Anonymous
06/25/24(Tue)00:32:50 No.101140483

Anonymous 06/25/24(Tue)00:32:50 No.101140483

File: 1691703411160539.jpg (38 KB, 992x410)

38 KB JPG

bes model <=20b model with soul?

Anonymous
06/25/24(Tue)00:33:59 No.101140492

Anonymous 06/25/24(Tue)00:33:59 No.101140492

File: 1708715527328099.gif (3.06 MB, 500x207)

3.06 MB GIF

Anonymous
06/25/24(Tue)00:49:48 No.101140609

Anonymous 06/25/24(Tue)00:49:48 No.101140609

File: file.png (108 KB, 573x641)

108 KB PNG

ShadowLLM: Predictor-based Contextual Sparsity for Large Language Models
https://arxiv.org/abs/2406.16635
>The high power consumption and latency-sensitive deployments of large language models (LLMs) have motivated techniques like quantization and sparsity. Contextual sparsity, where the sparsity pattern is input-dependent, is crucial in LLMs because the permanent removal of attention heads or neurons from LLMs can significantly degrade accuracy. Prior work has attempted to model contextual sparsity using neural networks trained to predict activation magnitudes, which can be used to dynamically prune structures with low predicted activation magnitude. In this paper, we look beyond magnitude-based pruning criteria to assess attention head and neuron importance in LLMs. We developed a novel predictor called ShadowLLM, which can shadow the LLM behavior and enforce better sparsity patterns, resulting in over 15% improvement in end-to-end accuracy without increasing latency compared to previous methods. ShadowLLM achieves up to a 20\% speed-up over the state-of-the-art DejaVu framework. These enhancements are validated on models with up to 30 billion parameters.
https://github.com/abdelfattah-lab/shadow_llm/
pretty neat. improvement over deja vu
https://arxiv.org/abs/2310.17157

Anonymous
06/25/24(Tue)00:57:33 No.101140655

Anonymous 06/25/24(Tue)00:57:33 No.101140655

File: Untitled.png (363 KB, 1297x1428)

363 KB PNG

Reducing Fine-Tuning Memory Overhead by Approximate and Memory-Sharing Backpropagation
https://arxiv.org/abs/2406.16282
>Fine-tuning pretrained large models to downstream tasks is an important problem, which however suffers from huge memory overhead due to large-scale parameters. This work strives to reduce memory overhead in fine-tuning from perspectives of activation function and layer normalization. To this end, we propose the Approximate Backpropagation (Approx-BP) theory, which provides the theoretical feasibility of decoupling the forward and backward passes. We apply our Approx-BP theory to backpropagation training and derive memory-efficient alternatives of GELU and SiLU activation functions, which use derivative functions of ReLUs in the backward pass while keeping their forward pass unchanged. In addition, we introduce a Memory-Sharing Backpropagation strategy, which enables the activation memory to be shared by two adjacent layers, thereby removing activation memory usage redundancy. Our method neither induces extra computation nor reduces training efficiency. We conduct extensive experiments with pretrained vision and language models, and the results demonstrate that our proposal can reduce up to ∼30% of the peak memory usage.
https://github.com/yyyyychen/LowMemoryBP
works for qlora/full finetunes too. not sure about dora/qdora/owlore.

Anonymous
06/25/24(Tue)01:01:21 No.101140673

Anonymous 06/25/24(Tue)01:01:21 No.101140673

>>101140336
Yep. That's why they work as well as they do.
>>101140442
True. Trainers now have a 'do not train on input' flag, but this might require something more complex. Then again, maybe we do want the model to train on the whole thing just to get into the habit of changing its mind when it realizes it's off. A balance of getting it right the first time and not getting it right the first time.

Anonymous
06/25/24(Tue)01:02:08 No.101140679

Anonymous 06/25/24(Tue)01:02:08 No.101140679

>tfw CR+ doesn't know most of the details about my waifu, at q6
It's ogre.

Anonymous
06/25/24(Tue)01:03:57 No.101140695

Anonymous 06/25/24(Tue)01:03:57 No.101140695

>>101134613
>>101135099
I know anthropomorphizing this thing is a room temp IQ activity, but this fucker loves strawberries. Uses strawberries in examples and I even asked it the other day what it's favorite fruit was and regenerated the response several times.... Always strawberries.

Anonymous
06/25/24(Tue)01:04:38 No.101140697

Anonymous 06/25/24(Tue)01:04:38 No.101140697

File: Untitled.png (201 KB, 1113x1047)

201 KB PNG

What Matters in Transformers? Not All Attention is Needed
https://arxiv.org/abs/2406.15786
>Scaling Transformer-based large language models (LLMs) has demonstrated promising performance across various tasks. However, this scaling also introduces redundant structures, posing challenges for real-world deployment. Despite some recognition of redundancy in LLMs, the variability of redundancy across different structures, such as MLP and Attention layers, is under-explored. In this work, we investigate the varying redundancy across different modules within Transformers, including Blocks, MLP, and Attention layers, using a similarity-based metric. This metric operates on the premise that redundant structures produce outputs highly similar to their inputs. Surprisingly, while attention layers are essential for transformers and distinguish them from other mainstream architectures, we found that a large proportion of attention layers exhibit excessively high similarity and can be safely pruned without degrading performance, leading to reduced memory and computation costs. Additionally, we further propose a method that jointly drops Attention and MLP layers, achieving improved performance and dropping ratios. Extensive experiments demonstrate the effectiveness of our methods, e.g., Llama-3-70B maintains comparable performance even after pruning half of the attention layers.
>Block Drop and Layer Drop are orthogonal to quantization, and their integration with quantization significantly enhances the efficiency.
https://github.com/Shwai-He/LLM-Drop
works with quantization. wished they quanted the 70B and showed results since that's the most interesting. also explored various quantization formats to see if one if any works really well with this

Anonymous
06/25/24(Tue)01:05:23 No.101140701

Anonymous 06/25/24(Tue)01:05:23 No.101140701

>>101134926
>probllama-ollama-vulnerability-cve-2024-37032
that sucks. thanks for posting it

Anonymous
06/25/24(Tue)01:09:47 No.101140729

Anonymous 06/25/24(Tue)01:09:47 No.101140729

File: Untitled.png (104 KB, 1261x589)

104 KB PNG

EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
https://arxiv.org/abs/2406.16858
>Inference with modern Large Language Models (LLMs) is expensive and time-consuming, and speculative sampling has proven to be an effective solution. Most speculative sampling methods such as EAGLE use a static draft tree, implicitly assuming that the acceptance rate of draft tokens depends only on their position. Interestingly, we found that the acceptance rate of draft tokens is also context-dependent. In this paper, building upon EAGLE, we propose EAGLE-2, which introduces a new technique of context-aware dynamic draft tree into drafting modeling. This improvement leverages the fact that the draft model of EAGLE is well-calibrated: the confidence scores from the draft model approximate acceptance rates with small errors. We conducted extensive evaluations on three series of LLMs and six tasks, with EAGLE-2 achieving speedup ratios 3.05x-4.26x, which is 20%-40% faster than EAGLE-1. EAGLE-2 also ensures that the distribution of the generated text remains unchanged, making it a lossless acceleration algorithm.
https://github.com/SafeAILab/EAGLE
eh still requires a drafting model though it doesn't need to be finetuned

Anonymous
06/25/24(Tue)01:11:43 No.101140740

Anonymous 06/25/24(Tue)01:11:43 No.101140740

>>101140697
that's a lot of degradation for not a lot of speed up

Anonymous
06/25/24(Tue)01:15:26 No.101140761

Anonymous 06/25/24(Tue)01:15:26 No.101140761

>>101140673
>Trainers now have a 'do not train on input' flag
I didn't know that. Nice. So now, in theory, we should be able to generate a dataset where we prompt a model to introduce a mistake into an existing response/answer, and then have it pretend to continue the response by spotting the error and correcting itself. So the entire context including the response/answer would be the "input" that doesn't get trained on, and the text after that, that contains the "Checking myself: oh no looks like I made a mistake tehepero~" is what gets trained. In order to not have hallucinated false positives, we also need an equal amount of already correct responses where we simply just insert the "Checking" text but it says no mistakes were spotted.

Anonymous
06/25/24(Tue)01:16:00 No.101140766

Anonymous 06/25/24(Tue)01:16:00 No.101140766

File: Untitled.png (63 KB, 1035x301)

63 KB PNG

Sparser is Faster and Less is More: Efficient Sparse Attention for Long-Range Transformers
https://arxiv.org/abs/2406.16747
>Accommodating long sequences efficiently in autoregressive Transformers, especially within an extended context window, poses significant challenges due to the quadratic computational complexity and substantial KV memory requirements inherent in self-attention mechanisms. In this work, we introduce SPARSEK Attention, a novel sparse attention mechanism designed to overcome these computational and memory obstacles while maintaining performance. Our approach integrates a scoring network and a differentiable top-k mask operator, SPARSEK, to select a constant number of KV pairs for each query, thereby enabling gradient-based optimization. As a result, SPARSEK Attention offers linear time complexity and constant memory footprint during generation. Experimental results reveal that SPARSEK Attention outperforms previous sparse attention methods and provides significant speed improvements during both training and inference, particularly in language modeling and downstream tasks. Furthermore, our method can be seamlessly integrated into pre-trained Large Language Models (LLMs) with minimal fine-tuning, offering a practical solution for effectively managing long-range dependencies in diverse applications.
>Our implementation exhibits linear complexity and surpasses FlashAttention in performance when handling 4096 input tokens, of which 1024 key-value pairs are selected for each query. Additionally, we offer a kernel for the backward pass, which fuses the computation of the gradient of SPARSEK and others, resulting in increased speed and improved memory efficiency.
>Our code will be publicly available.
might be cool if it works. no idea where they'll upload their code

Anonymous
06/25/24(Tue)01:17:49 No.101140775

Anonymous 06/25/24(Tue)01:17:49 No.101140775

>>101139141
1 year

Anonymous
06/25/24(Tue)01:19:09 No.101140781

Anonymous 06/25/24(Tue)01:19:09 No.101140781

>>101140695
Funny you mention that. I was just in an RP, where I presented simply "an assortment" of lollipops for the character to choose from, and it picked (hallucinated) strawberry.

Anonymous
06/25/24(Tue)01:21:36 No.101140804

Anonymous 06/25/24(Tue)01:21:36 No.101140804

>>101140695
>>101140781
To be fair strawberry is extremely popular. I worked at a juice shop once and we had to stock up on strawberry more than any other flavor.

Anonymous
06/25/24(Tue)01:23:22 No.101140817

Anonymous 06/25/24(Tue)01:23:22 No.101140817

>>101140609
Thanks for always posting these. Even if I might not read all of them. You're quite dedicated to this. Are you an ML researcher/dev?

Anonymous
06/25/24(Tue)01:29:29 No.101140863

Anonymous 06/25/24(Tue)01:29:29 No.101140863

>>101139204
i feel more comfortable after edging. i hate how my T drops and i get all hungry and weak after cuming. i'd rather just keep sexing my waifu

Anonymous
06/25/24(Tue)01:30:20 No.101140870

Anonymous 06/25/24(Tue)01:30:20 No.101140870

File: Screenshot 2024-06-24 at (...).png (18 KB, 718x177)

18 KB PNG

I'm stupid, how do I prevent dialog from generating entirely in these code boxes? (In Sillytavern)

Anonymous
06/25/24(Tue)01:31:11 No.101140874

Anonymous 06/25/24(Tue)01:31:11 No.101140874

>>101140761
You misunderstand. This flag has been around for awhile, and it means "do not train on the part that comes before the response", i.e. do not learn to predict the instruction and input (if any) parts, only learn the response part. What I meant was that we extend this concept to also allow for parts of the response to be included in the to-not-learn part, as anon above was pointing out.

Anonymous
06/25/24(Tue)01:32:11 No.101140878

Anonymous 06/25/24(Tue)01:32:11 No.101140878

File: Untitled.png (357 KB, 1051x1294)

357 KB PNG

Near-Lossless Acceleration of Long Context LLM Inference with Adaptive Structured Sparse Attention
https://arxiv.org/abs/2406.15486
>Large language models (LLMs) now support extremely long context windows, but the quadratic complexity of vanilla attention results in significantly long Time-to-First-Token (TTFT) latency. Existing approaches to address this complexity require additional pretraining or finetuning, and often sacrifice model accuracy. In this paper, we first provide both theoretical and empirical foundations for near-lossless sparse attention. We find dynamically capturing head-specific sparse patterns at runtime with low overhead is crucial. To address this, we propose SampleAttention, an adaptive structured and near-lossless sparse attention. Leveraging observed significant sparse patterns, SampleAttention attends to a fixed percentage of adjacent tokens to capture local window patterns, and employs a two-stage query-guided key-value filtering approach, which adaptively select a minimum set of key-values with low overhead, to capture column stripe patterns. Comprehensive evaluations show that SampleAttention can seamlessly replace vanilla attention in off-the-shelf LLMs with nearly no accuracy loss, and reduces TTFT by up to 2.42× compared with FlashAttention.
>Link to the source code based on PyTorch and Triton, along with scripts to reproduce the main experimental results, will be provided in the camera-ready version.
really cool. made from a big group from various top chinese AI labs. pseudocode is in the appendix but i guess they'll release the rest with a video?

Anonymous
06/25/24(Tue)01:32:12 No.101140879

Anonymous 06/25/24(Tue)01:32:12 No.101140879

>>101140870
What instruction preset / model are you using?

Anonymous
06/25/24(Tue)01:41:06 No.101140937

Anonymous 06/25/24(Tue)01:41:06 No.101140937

>>101140879
Llama 3 instruct names and L3-8B-Stheno-v3.
Should I just try experimenting with other presets?

Anonymous
06/25/24(Tue)01:43:41 No.101140959

Anonymous 06/25/24(Tue)01:43:41 No.101140959

File: Kaiser_Wilhelm_II_of_Germ(...).jpg (779 KB, 2431x3508)

779 KB JPG

>>101135812
He did it again.
>in deutsch bitte und den Text vollständig zu Ende schreiben: schreibe mir eine trainingsHypnose wo ich immer Geiler (Höhepunkt beim Sex mit einer Frau) werde als Mann. Meine Frau ist eine blonde cougar gilf die gerne Leder trägt

I did the Sneedful.
>Denkt nicht an Unzucht oder die Waifu, sondern nur an das Große Deutsche Reich! Der Kaiser wollte, dass wir die Gewehre und Kanonen abfeuern, nicht unseren eigenen Schwanz! Das preußische Volk erwartet einen weiteren Sieg von dir! Für den Kaiser!
https://huggingface.co/TheBloke/goliath-120b-GGUF/discussions/7

Anonymous
06/25/24(Tue)01:44:22 No.101140964

Anonymous 06/25/24(Tue)01:44:22 No.101140964

>>101140874
Maybe I worded something wrong but what you just said in this post is what I meant. Or maybe I'm not understanding this post?

Anonymous
06/25/24(Tue)01:51:36 No.101141014

Anonymous 06/25/24(Tue)01:51:36 No.101141014

>>101140878
We're so back. Can't wait to have it in Llama.cpp in 2mw.

Anonymous
06/25/24(Tue)01:52:47 No.101141026

Anonymous 06/25/24(Tue)01:52:47 No.101141026

File: 1719008897523689.gif (1.88 MB, 250x277)

1.88 MB GIF

Should I break down and install SillyTavern? Been using just ooba for over half a year now, but the number of cards requiring lorebooks is getting annoying.

Anonymous
06/25/24(Tue)01:55:19 No.101141043

Anonymous 06/25/24(Tue)01:55:19 No.101141043

>>101141026
No keep rping with your piece of shit

Anonymous
06/25/24(Tue)02:01:25 No.101141077

Anonymous 06/25/24(Tue)02:01:25 No.101141077

File: notrust.png (182 KB, 1273x477)

182 KB PNG

>>101134566
Please help. I'm using DeepSeekCoder V2 Lite Instruct with SillyTavern and Koboldcpp as the backend. It's not generating any code at all, just gibberish. The model card says that the prompt should look like below but idk where to change that in ST. Am I missing something obvious?

<|beginofsentence|>User: {user_message_1}

Assistant: {assistant_message_1}<|endofsentence|>User: {user_message_2}

Assistant:

Anonymous
06/25/24(Tue)02:02:46 No.101141085

Anonymous 06/25/24(Tue)02:02:46 No.101141085

>>101141026
if you aren't using lorebooks or rag you're doing it wrong anyways

Anonymous
06/25/24(Tue)02:03:57 No.101141095

Anonymous 06/25/24(Tue)02:03:57 No.101141095

>>101141085
explain what do they do that makes it that much better

Anonymous
06/25/24(Tue)02:06:56 No.101141118

Anonymous 06/25/24(Tue)02:06:56 No.101141118

>>101141095
its extra info about anything that can be injected into the chat automatically using keywords (for lorebooks) to give it more details or just remember stuff. could be locations, characters, objects, clothes, even scenes to play through

Anonymous
06/25/24(Tue)02:08:29 No.101141132

Anonymous 06/25/24(Tue)02:08:29 No.101141132

>>101141077
>Lite
There's your problem.
Nobody's said anything good about Lite.

Anonymous
06/25/24(Tue)02:09:56 No.101141139

Anonymous 06/25/24(Tue)02:09:56 No.101141139

>>101137337
>>101137340
https://files.catbox.moe/gqa3a8.png
https://files.catbox.moe/49wkhz.png

Anonymous
06/25/24(Tue)02:10:45 No.101141143

Anonymous 06/25/24(Tue)02:10:45 No.101141143

>>101141132
I mean, I doubt a 16B is going to that great at anything useful either. But it should at least be coherent. His problem is obviously not setting the prompt template correctly.

Anonymous
06/25/24(Tue)02:15:43 No.101141172

Anonymous 06/25/24(Tue)02:15:43 No.101141172

>>101141143
Prompt template is part of it, but when I got it working (I think CommandR on Kobold functioned) it was still a babbling moron.

Anonymous
06/25/24(Tue)02:17:10 No.101141179

Anonymous 06/25/24(Tue)02:17:10 No.101141179

>>101141085
>rag
Notice superbooga has this feature. I'll see if I can hack it with this.

Anonymous
06/25/24(Tue)02:18:20 No.101141189

Anonymous 06/25/24(Tue)02:18:20 No.101141189

>>101137885
because unlike vanilla instruct, stheno produces more pleasing output. I wouldn't use it for cooding but for general questions i'd take it over a cucked autistic instruct, which always finds a pattern and sticks to it for the entire conversation, unless you crank rep pen so high it becomes unreliable for anything factual.

llama.cpp CUDA dev !YOmst7Ghe6
06/25/24(Tue)02:22:19 No.101141208

llama.cpp CUDA dev !YOmst7Ghe6 06/25/24(Tue)02:22:19 No.101141208

>>101137403
That version was buggy, download b3218 or b3219 instead.

Anonymous
06/25/24(Tue)02:25:18 No.101141227

Anonymous 06/25/24(Tue)02:25:18 No.101141227

>>101140937
Go to where it was first generated and regen

Anonymous
06/25/24(Tue)02:26:16 No.101141232

Anonymous 06/25/24(Tue)02:26:16 No.101141232

>>101140227
>It should be noted that, in future releases, we plan to introduce a new server for llamafile. This new server is being designed for performance and production-worthiness. It's not included in this release, since the new server currently only supports a tokenization endpoint. However the endpoint is capable of doing 2 million requests per second whereas with the current server, the most we've ever seen is a few thousand.

Why can't they just contribute to and improve the existing llama.cpp HTTP server?

Anonymous
06/25/24(Tue)02:27:11 No.101141238

Anonymous 06/25/24(Tue)02:27:11 No.101141238

>>101141232
How would they make money doing that?

Anonymous
06/25/24(Tue)02:33:25 No.101141267

Anonymous 06/25/24(Tue)02:33:25 No.101141267

>>101141095
Imagine your character loves your balls. You could write "{{chat}} loves {{user}}s balls" or you could ask an assistant to write a whole essay about how much your char loves balls and how it affects her interactions during sex. Then put it in lorebook entry under balls. Now if you ever say "balls" for just the next couple messages, the essay about balls will be added to context, making the char act much more inline with how you want while keeping the character info concise the rest of the time. You could also add things like "char is suddenly horny because she started thinking about balls" so your char gets realistically horny in response to certain situations.

You could do the same thing to allow the char to remember past events, other characters, etc in way more detail than they'll actually be able to. And since you don't have to worry about context as much because it's just the next few messages it's a lot easier to make the character act a certain way.

Anonymous
06/25/24(Tue)02:35:43 No.101141287

Anonymous 06/25/24(Tue)02:35:43 No.101141287

>>101140937
Weird, that should be totally fine. Do you have a custom prompt or system message or something?
>>101140964
Ah, I just meant that we don't have that functionality yet, afaik. We DO have the exclude instruction/input feature though.

Anonymous
06/25/24(Tue)02:38:48 No.101141307

Anonymous 06/25/24(Tue)02:38:48 No.101141307

I wonder how feasible it would be to take a snapshot of Wikipedia, chunk and vectordb the whole damn thing, and just always include relevant chunks with every single query.
Would that be too much data?

Anonymous
06/25/24(Tue)02:40:04 No.101141318

Anonymous 06/25/24(Tue)02:40:04 No.101141318

>>101141307
https://cohere.com/blog/embedding-archives-wikipedia

Anonymous
06/25/24(Tue)02:41:32 No.101141327

Anonymous 06/25/24(Tue)02:41:32 No.101141327

>>101141307
I once did that on decades' worth of personal email and chat messages. It works surprisingly well. Some surprise realizations when I rag-searched certain past events and got a summary back that spanned over the entire period.

Anonymous
06/25/24(Tue)02:42:09 No.101141329

Anonymous 06/25/24(Tue)02:42:09 No.101141329

>>101141318
Those are pre-embedded. If you want to use any model other than Cohere's multilingual-22-12, you have to do it yourself.

Anonymous
06/25/24(Tue)02:43:38 No.101141337

Anonymous 06/25/24(Tue)02:43:38 No.101141337

File: Untitled.png (85 KB, 1025x258)

85 KB PNG

Adam-mini: Use Fewer Learning Rates To Gain More
https://arxiv.org/abs/2406.16793
>We propose Adam-mini, an optimizer that achieves on-par or better performance than AdamW with 45% to 50% less memory footprint. Adam-mini reduces memory by cutting down the number of learning rates in Adam: Instead of assigning an individual learning rate for each parameter using 1/v√, Adam-mini uses the average of v within a pre-defined parameter block as the learning rate for that block. Such a design is inspired by two empirical findings. First, the Hessian of Transformers exhibits a near-block diagonal structure with different sizes of dense sub-blocks. Second, for each of these dense sub-blocks, there exists a single high-quality learning rate that can outperform Adam, provided that sufficient resources are available to search it out. Adam-mini provides one cost-effective way to find these good learning rates and manage to cut down ≥90 in Adam. Empirically, we verify that Adam-mini performs on par or better than AdamW on various language models sized from 125M to 7B for pre-training, supervised fine-tuning, and RLHF. The reduced memory footprint of Adam-mini also alleviates communication overheads among GPUs and CPUs, thereby increasing throughput. For instance, Adam-mini achieves 49.6% higher throughput than AdamW when pre-training Llama2-7B on 2x A800-80GB GPUs, which saves 33% wall-clock time for pre-training.

Anonymous
06/25/24(Tue)02:46:11 No.101141355

Anonymous 06/25/24(Tue)02:46:11 No.101141355

File: 1712130352266687.png (1.48 MB, 784x1264)

1.48 MB PNG

>>101137885
Miku is thread culture. Go cry about it somewhere else.

Anonymous
06/25/24(Tue)02:46:53 No.101141361

Anonymous 06/25/24(Tue)02:46:53 No.101141361

>>101141307
>>101141318
>>101141329
Even pre-embedded, just automatically including topK chunks for any given query could make an assistant a hell of a lot smarter. Could also theoretically have it only make requests when needed with some fine-tuning.
>>101141327
I'm the paranoid sort that doesn't keep logs and tries to purge most history when possible, so I can only imagine what kind of dumb shit it would have to say about me.

Anonymous
06/25/24(Tue)02:48:01 No.101141369

Anonymous 06/25/24(Tue)02:48:01 No.101141369

Describe [Character Name]'s personality by focusing on their [Trait]. Compare this aspect of their personality to [Real-world analogy], but emphasize these key nuances: [Key nuances]. Illustrate how this trait manifests in [Character Name]'s behavior, considering the following examples: [Behavioral manifestations].

Anonymous
06/25/24(Tue)02:52:38 No.101141396

Anonymous 06/25/24(Tue)02:52:38 No.101141396

>>101134973
>A non-essential niche software application which isn't used in enterprise
Silicon Valley tech bros would disagree.

Anonymous
06/25/24(Tue)02:55:08 No.101141412

Anonymous 06/25/24(Tue)02:55:08 No.101141412

>>101141337
somethingburger...

Anonymous
06/25/24(Tue)02:56:08 No.101141420

Anonymous 06/25/24(Tue)02:56:08 No.101141420

Context template for an AI powered "person"
First, let's address the biggest issue: LLMs are purely reactive. They must be triggered to respond, and they will always respond. In the real world, not every input has a dedicated response. So part of our template will be to instruct the model to only issue responses when appropriate, or relegate responses to an intentional output mechanism (such as function calls.)
As such, the template may look like this:
[System Prompt: Judge if a response is needed, use chain of thought reasoning, use functions as needed]
[Character: Roleplay as a character with the given personality]
[Function List: Top 5 most relevant functions]
[Top K vectordb look ups]
[Mood/Motive summary: A section the LLM can set with a function informing it's current mood or/or motive]
[Current communication history: Recent chat logs]
[Most recent function call result, or chat input]
[Prompt for response]

Exact wording for each of these sections? Any missing or misordered parts? Probably need a fine-tune for this to actually work reliably. If done well however, one might say, put this thing in a discord channel and pass as a real user?

Anonymous
06/25/24(Tue)02:57:48 No.101141429

Anonymous 06/25/24(Tue)02:57:48 No.101141429

>>101139192
What?

Anonymous
06/25/24(Tue)03:00:20 No.101141447

Anonymous 06/25/24(Tue)03:00:20 No.101141447

>*Smiling seductively* With pleasure, Master. *She takes him back into her mouth, resuming her skilled administrations*
>administrations
You can only avoid her ministrations

Anonymous
06/25/24(Tue)03:02:28 No.101141462

Anonymous 06/25/24(Tue)03:02:28 No.101141462

>>101141447
>v*refag
you get what you deserve

Anonymous
06/25/24(Tue)03:04:00 No.101141479

Anonymous 06/25/24(Tue)03:04:00 No.101141479

>>101136820
who cares? he's the one who likes to cuck his imagegen models in the first place, we won't move forward with this faggot

Anonymous
06/25/24(Tue)03:05:01 No.101141487

Anonymous 06/25/24(Tue)03:05:01 No.101141487

>>101137150
>relatively small model
that's what they want you to believe

Anonymous
06/25/24(Tue)03:07:00 No.101141500

Anonymous 06/25/24(Tue)03:07:00 No.101141500

>>101141420
>[Top K vectordb look ups]
This will fucked up the responses, you need a finetuned classifier on top of that

Anonymous
06/25/24(Tue)03:11:28 No.101141534

Anonymous 06/25/24(Tue)03:11:28 No.101141534

File: LookAtHimGo.gif (3 MB, 1920x1080)

3 MB GIF

>>101139181
I don't, that one is so cute!! :3

Anonymous
06/25/24(Tue)03:14:08 No.101141558

Anonymous 06/25/24(Tue)03:14:08 No.101141558

>>101141267
Nooo, my t/s!

Anonymous
06/25/24(Tue)03:15:54 No.101141574

Anonymous 06/25/24(Tue)03:15:54 No.101141574

>>101141558
Patience is a virtue, anon.

Anonymous
06/25/24(Tue)03:46:29 No.101141800

Anonymous 06/25/24(Tue)03:46:29 No.101141800

is it a really bad idea to buy a used mining rig from ebay and run llms on it?
(i read all lmg build guides)

Anonymous
06/25/24(Tue)03:50:14 No.101141838

Anonymous 06/25/24(Tue)03:50:14 No.101141838

>>101141337
Why would I use this over AdamW8bit? That one also falls into the category of "basically the same as AdamW but uses less memory". Kind of odd they don't compare it or even mention it. AdamW8bit is ~50% memory reduction in optimizer states at bf16, and 75% at fp32, even better than theirs.

Anonymous
06/25/24(Tue)03:51:47 No.101141852

Anonymous 06/25/24(Tue)03:51:47 No.101141852

is my setup broken or are the magnum ggufs on huggingface broken?

Anonymous
06/25/24(Tue)03:52:06 No.101141855

Anonymous 06/25/24(Tue)03:52:06 No.101141855

>>101141838
>Kind of odd they don't compare it or even mention it.
anon, all papers do that, you think that a researcher who spent years of his life on a method would say shit like "welp, that's a failure, our current method isn't better than the previous ones", they just want to shit out papers (even if they have to lie to get there) so that they can get more recognition or more money to do bigger scope researsh

Anonymous
06/25/24(Tue)04:02:07 No.101141928

Anonymous 06/25/24(Tue)04:02:07 No.101141928

>>101141800
that really depends on what "mining rig" means in this situation.

How many cards are there total? Nvidia or AMD?
What kind of models are you trying to run? Do you want expandability?

Anonymous
06/25/24(Tue)04:07:46 No.101141968

Anonymous 06/25/24(Tue)04:07:46 No.101141968

5060 24gb when?

Anonymous
06/25/24(Tue)04:09:02 No.101141973

Anonymous 06/25/24(Tue)04:09:02 No.101141973

File: ImYourMaster.jpg (18 KB, 320x405)

18 KB JPG

>>101141968
you don't deserve it goyim

Anonymous
06/25/24(Tue)04:09:58 No.101141985

Anonymous 06/25/24(Tue)04:09:58 No.101141985

>>101141928
i saw one that had 10x gtx 1660 with 6GB each, so total 60GB of vram.

would that work?

Anonymous
06/25/24(Tue)04:10:48 No.101141988

Anonymous 06/25/24(Tue)04:10:48 No.101141988

HippoRAG: Neurobiologically Inspired Long-Term Memory for Large Language Models
https://arxiv.org/abs/2405.14831
>In order to thrive in hostile and ever-changing natural environments, mammalian brains evolved to store large amounts of knowledge about the world and continually integrate new information while avoiding catastrophic forgetting. Despite the impressive accomplishments, large language models (LLMs), even with retrieval-augmented generation (RAG), still struggle to efficiently and effectively integrate a large amount of new experiences after pre-training. In this work, we introduce HippoRAG, a novel retrieval framework inspired by the hippocampal indexing theory of human long-term memory to enable deeper and more efficient knowledge integration over new experiences. HippoRAG synergistically orchestrates LLMs, knowledge graphs, and the Personalized PageRank algorithm to mimic the different roles of neocortex and hippocampus in human memory. We compare HippoRAG with existing RAG methods on multi-hop question answering and show that our method outperforms the state-of-the-art methods remarkably, by up to 20%. Single-step retrieval with HippoRAG achieves comparable or better performance than iterative retrieval like IRCoT while being 10-30 times cheaper and 6-13 times faster, and integrating HippoRAG into IRCoT brings further substantial gains. Finally, we show that our method can tackle new types of scenarios that are out of reach of existing methods.
https://github.com/OSU-NLP-Group/HippoRAG
Came across this. Neat

Anonymous
06/25/24(Tue)04:11:47 No.101141999

Anonymous 06/25/24(Tue)04:11:47 No.101141999

>>101141267
does such basic bitch prompting still have to be explained to people? No wonder this general is so shit

Anonymous
06/25/24(Tue)04:12:29 No.101142005

Anonymous 06/25/24(Tue)04:12:29 No.101142005

>>101141968
The more you buy, the less you got

Anonymous
06/25/24(Tue)04:13:35 No.101142010

Anonymous 06/25/24(Tue)04:13:35 No.101142010

>>101141968
https://www.youtube.com/watch?v=XDpDesU_0zo

Anonymous
06/25/24(Tue)04:22:38 No.101142094

Anonymous 06/25/24(Tue)04:22:38 No.101142094

Man, what are the chances that oobadev actually comes back? It seems like everyday there is a new advancement and it blows not getting updates.

Anonymous
06/25/24(Tue)04:23:53 No.101142106

Anonymous 06/25/24(Tue)04:23:53 No.101142106

>>101141988
>Neurobiologically Inspired Long-Term Memory
hype
>RAG
>outperforms the state-of-the-art methods remarkably, by up to 20%.
nothingburger

Anonymous
06/25/24(Tue)04:24:30 No.101142113

Anonymous 06/25/24(Tue)04:24:30 No.101142113

does finetuning the xtts model on a dataset of whispering work?
I'd rather spare myself the 3 hours if any of you have done it before

Anonymous
06/25/24(Tue)04:36:47 No.101142196

Anonymous 06/25/24(Tue)04:36:47 No.101142196

>>101141985
>would that work
Very slow. P100 is the lowest you should get https://www.reddit.com/r/LocalLLaMA/comments/1dn1e12/10_x_p100_rig/

Anonymous
06/25/24(Tue)04:45:52 No.101142270

Anonymous 06/25/24(Tue)04:45:52 No.101142270

Next year should see the v100 32gb sxm2 used cards flood the market. We're going to be eating good soon localbros

Anonymous
06/25/24(Tue)04:58:10 No.101142351

Anonymous 06/25/24(Tue)04:58:10 No.101142351

>>101142094
Be the change you want to see
Fork it bitch

Anonymous
06/25/24(Tue)04:59:16 No.101142361

Anonymous 06/25/24(Tue)04:59:16 No.101142361

>>101142270
I would legit be hyped for that if I hadn't already blown my load on a new central air system and 2x 3090s + 1x4090

Anonymous
06/25/24(Tue)04:59:28 No.101142364

Anonymous 06/25/24(Tue)04:59:28 No.101142364

>>101142351
He'd be better off starting from scratch rather than keeping that gradio shitware going

Anonymous
06/25/24(Tue)05:29:53 No.101142541

Anonymous 06/25/24(Tue)05:29:53 No.101142541

>>101142361
What do you use those those GPUs for?

Anonymous
06/25/24(Tue)05:35:41 No.101142584

Anonymous 06/25/24(Tue)05:35:41 No.101142584

>>101142094
Why are you using that shit anyway? There's a reason he abandoned it.

Anonymous
06/25/24(Tue)05:38:43 No.101142603

Anonymous 06/25/24(Tue)05:38:43 No.101142603

https://cambrian-mllm.github.io
https://huggingface.co/collections/nyu-visionx/cambrian-1-models-666fa7116d5420e514b0f23c
8/13/34B

Anonymous
06/25/24(Tue)05:43:16 No.101142649

Anonymous 06/25/24(Tue)05:43:16 No.101142649

>>101142584
Personally, I'm using it because it has EXL2 support, easy to switch models, and lets me use the the model in the 3 ways I want to: API, Notebook, and provides a chat interface.
>>101142364
I don't disagree, but having an interface at all is still nice.

Anonymous
06/25/24(Tue)05:44:19 No.101142659

Anonymous 06/25/24(Tue)05:44:19 No.101142659

>>101142094
Kobo won

Anonymous
06/25/24(Tue)05:45:15 No.101142667

Anonymous 06/25/24(Tue)05:45:15 No.101142667

>>101142659
Won by default lol

Anonymous
06/25/24(Tue)05:45:54 No.101142674

Anonymous 06/25/24(Tue)05:45:54 No.101142674

>>101142659
Does kobo have exl2 support?

Anonymous
06/25/24(Tue)05:46:25 No.101142681

Anonymous 06/25/24(Tue)05:46:25 No.101142681

>>101142603
Does it enhance spatial understanding in text rp?

Anonymous
06/25/24(Tue)06:02:48 No.101142822

Anonymous 06/25/24(Tue)06:02:48 No.101142822

>>101141189
>>101141355
It hallucinates more than vanilla. Using a coom model to ask general questions is stupid, shill.

Anonymous
06/25/24(Tue)06:25:47 No.101143034

Anonymous 06/25/24(Tue)06:25:47 No.101143034

Our neighbors at /aicg/ don't seem to like claude 3.5 too much. Anthropic is ramping up censorship again.

Anonymous
06/25/24(Tue)06:30:25 No.101143075

Anonymous 06/25/24(Tue)06:30:25 No.101143075

>>101143034
Nah, it's still as easy to jailbreak as before, with a simple prefill. And the thread isn't complaining about it. You know that /aicg/ isn't using the website, right?

Anonymous
06/25/24(Tue)06:35:53 No.101143123

Anonymous 06/25/24(Tue)06:35:53 No.101143123

>>101143075
Then why don't they like it? Did it change the style or something?
>>101142130
>>101142539

Anonymous
06/25/24(Tue)06:37:05 No.101143134

Anonymous 06/25/24(Tue)06:37:05 No.101143134

File: 00000-[humu_v10]_[DPM++ 2(...).jpg (805 KB, 2048x3072)

805 KB JPG

Good morning

Anonymous
06/25/24(Tue)06:39:39 No.101143151

Anonymous 06/25/24(Tue)06:39:39 No.101143151

File: May-your-day-be-filled-wi(...).png (955 KB, 1000x1000)

955 KB PNG

>>101143134
Good morning!

Anonymous
06/25/24(Tue)06:39:55 No.101143153

Anonymous 06/25/24(Tue)06:39:55 No.101143153

>>101143134
no

Anonymous
06/25/24(Tue)06:45:04 No.101143183

Anonymous 06/25/24(Tue)06:45:04 No.101143183

>>101139119
> issue with go which is garbage
> complains about rust
what ?
go is NOT a memory safe language lmao...

Anonymous
06/25/24(Tue)06:49:05 No.101143208

Anonymous 06/25/24(Tue)06:49:05 No.101143208

>>101143199
go to sdg, nigger

Anonymous
06/25/24(Tue)06:50:24 No.101143220

Anonymous 06/25/24(Tue)06:50:24 No.101143220

>>101143208
My bad.

Anonymous
06/25/24(Tue)06:50:52 No.101143225

Anonymous 06/25/24(Tue)06:50:52 No.101143225

>>101143199
>normalfag discovers stable diffusion for the first time, colorized

Anonymous
06/25/24(Tue)06:59:18 No.101143285

Anonymous 06/25/24(Tue)06:59:18 No.101143285

rp models are more intelligent than assistant models because they can rp as an expert instead of lowly assistant

llama.cpp CUDA dev !YOmst7Ghe6
06/25/24(Tue)06:59:40 No.101143292

llama.cpp CUDA dev !YOmst7Ghe6 06/25/24(Tue)06:59:40 No.101143292

>>101142270
One issue with V100s though is that they do not have int8 tensor cores.
For llama.cpp at least I think int8 is the future; Given enough optimization I think it has the potential to become faster than ExLlama.

Anonymous
06/25/24(Tue)07:00:40 No.101143301

Anonymous 06/25/24(Tue)07:00:40 No.101143301

File: 1694291568313759.jpg (269 KB, 1900x950)

269 KB JPG

>>101141968
24GB? What do you need 22GB of VRAM for? You surely aren't planning to run any dangerous AI models on those 16GB of VRAM. It's obvious that 12GB is just too excessive for gaming. Luckily with the new NVIDIA-Infinity DLLSS 5.0X upscaling nobody needs to render textures above 240p anymore so 8GB DDR7 VRAM are ideal for your RTX5090, MSRP $3500

Anonymous
06/25/24(Tue)07:01:54 No.101143311

Anonymous 06/25/24(Tue)07:01:54 No.101143311

>>101143292
at prompt processing as well?

Anonymous
06/25/24(Tue)07:01:54 No.101143312

Anonymous 06/25/24(Tue)07:01:54 No.101143312

>>101143292
>int8
How fucked am I with an all-ampere setup?

Anonymous
06/25/24(Tue)07:03:14 No.101143320

Anonymous 06/25/24(Tue)07:03:14 No.101143320

>>101143292
I thought we all decided that int1.58 is the future.

Anonymous
06/25/24(Tue)07:04:26 No.101143329

Anonymous 06/25/24(Tue)07:04:26 No.101143329

>>101143292
since when do they have that? Like what's the oldest GPU supporting that?

llama.cpp CUDA dev !YOmst7Ghe6
06/25/24(Tue)07:11:03 No.101143395

llama.cpp CUDA dev !YOmst7Ghe6 06/25/24(Tue)07:11:03 No.101143395

>>101143311
I specifically mean prompt processing.
I currently get a top speed of 12100 t/s for LLaMA 2 7b q8_0 with an RTX 4090 on the llama.cpp master branch.
The self-reported ExLlama performance is 13900 t/s.

>>101143312
>>101143329
All NVIDIA GPUs starting from Ampere have int8 tensor cores.
It is only the V100 that has FP16 tensor cores but no int8 tensor cores.

>>101143320
Even with bitnet I think the best way to do inference (on contemporary GPUs) will be int4/int8 tensor cores.

llama.cpp CUDA dev !YOmst7Ghe6
06/25/24(Tue)07:19:24 No.101143456

llama.cpp CUDA dev !YOmst7Ghe6 06/25/24(Tue)07:19:24 No.101143456

>>101143395
>All NVIDIA GPUs starting from Ampere have int8 tensor cores.
I meant to write Turing.

Anonymous
06/25/24(Tue)07:36:59 No.101143600

Anonymous 06/25/24(Tue)07:36:59 No.101143600

>>101143395
>I specifically mean prompt processing.
So who cares? That means you can V100MAXX and get the smallest cheapest Turing card (those Chinese 22GB 2080s ig) just for prompt processing and get the best of both.

llama.cpp CUDA dev !YOmst7Ghe6
06/25/24(Tue)07:42:45 No.101143646

llama.cpp CUDA dev !YOmst7Ghe6 06/25/24(Tue)07:42:45 No.101143646

>>101143600
No you can't.
The performance will be no better than with 0 GPU layers if you have to move the data between GPUs.

Anonymous
06/25/24(Tue)07:51:04 No.101143720

Anonymous 06/25/24(Tue)07:51:04 No.101143720

>>101143646
I'm confused. As long as the prompt can fit inside the int8 tensor core having GPU and it's designated as the primary GPU, it should work, no? 22GB should be enough fit most prompts and would be no different than doing CPU inference with the GPU only being used for the prompt processing.

Anonymous
06/25/24(Tue)07:52:31 No.101143735

Anonymous 06/25/24(Tue)07:52:31 No.101143735

>>101142270
Nvidia needs trade in or whatever.

llama.cpp CUDA dev !YOmst7Ghe6
06/25/24(Tue)07:57:30 No.101143784

llama.cpp CUDA dev !YOmst7Ghe6 06/25/24(Tue)07:57:30 No.101143784

>>101143720
The problem is the weights.
To get good performance the weights already have to be in VRAM when they're needed so that they can be used immediately.
I don't think you can feasibly do this by swapping the weights between GPUs.
At that point it would be faster to do the prompt processing on the V100s directly even if they don't have int8 tensor cores (which you could still do with FP16 tensor cores).
I'm not saying V100s would be slow, only that they will be comparatively slower than equivalent GPUs that do have int8 tensor cores.

Anonymous
06/25/24(Tue)07:57:53 No.101143789

Anonymous 06/25/24(Tue)07:57:53 No.101143789

Snapdragon laptops have been out for a while now, how are they for LLM usage? Are they as good as an M mac?

Anonymous
06/25/24(Tue)08:00:40 No.101143811

Anonymous 06/25/24(Tue)08:00:40 No.101143811

>>101141307
Been waiting forever for such an addon. Even a dirty solution like having the model occasionally determine what's being discussed and run a SQL query on an offline Wikipedia instance -https://en.wikipedia.org/wiki/Wikipedia:Database_download.
Anything 7B/8B up should be able to handle that fine for quick and more factual/up-to-date replies. Plus you could just download a new version of the database for new knowledge without having to do any other work at all.

Anonymous
06/25/24(Tue)08:00:52 No.101143813

Anonymous 06/25/24(Tue)08:00:52 No.101143813

So did buttnet turn out to be real or are we still coping?

Anonymous
06/25/24(Tue)08:03:05 No.101143832

Anonymous 06/25/24(Tue)08:03:05 No.101143832

>>101143813
bitconnet turned to be 8x more expensive to train what a scam lmao

Anonymous
06/25/24(Tue)08:03:25 No.101143835

Anonymous 06/25/24(Tue)08:03:25 No.101143835

>https://github.com/ggerganov/llama.cpp/issues/8098
>Bug: llama.cpp apparently exits with '[end of text]' before processing prompt if prompt is ~2048 tokens
I've had something like that happen.
Not at any specific prompt size, but Llama-3-8B-Instruct would often just EOS.
Even the fine tunes do that from time to time.
I'm not sure it's a bug or a characteristic of llama3 8b, but it's a very common behavior.
What makes me think that it's not a bug as such is that fine tunes work just fine, mostly.
Stheno still gives me an empty prompt once in a while, but I chalk it up to my prefils and bizarre prompts at that point.

Anonymous
06/25/24(Tue)08:04:44 No.101143846

Anonymous 06/25/24(Tue)08:04:44 No.101143846

>>101143784
Well, that's disappointing. Thank you for the explanation.

Anonymous
06/25/24(Tue)08:24:48 No.101144026

Anonymous 06/25/24(Tue)08:24:48 No.101144026

>>101143832
Isn't it also something like 8 times as small for equivalent output?

You train once, you infer many times.

Anonymous
06/25/24(Tue)08:31:07 No.101144076

Anonymous 06/25/24(Tue)08:31:07 No.101144076

>>101143832
Wasn't that just a random discord guy making unsubstantiated claims? Or did he provide an actual explanation where he got that 8 times figure from?

Anonymous
06/25/24(Tue)08:34:41 No.101144110

Anonymous 06/25/24(Tue)08:34:41 No.101144110

>>101142822
vanilla hallucinates everything related to human on human interactions, then you add uncuck prompt/prefill and it drops 30 MMLA points

vanilla is only good as a parrot chat bot on some crappy online shop, impressing boomers with "Ah ha!"s

Anonymous
06/25/24(Tue)08:40:33 No.101144164

Anonymous 06/25/24(Tue)08:40:33 No.101144164

>>101144110
Keep shilling, shill. For questions that vanilla get right, the finetune randomly changes details, makes up dates, etc. The coom finetune is only for your "ah ah mistress" and nothing else.

Anonymous
06/25/24(Tue)08:41:17 No.101144176

Anonymous 06/25/24(Tue)08:41:17 No.101144176

Wait.
Am I reading the tokenizer_config.json wrong or does L3 instruct has two line breaks after each message header?
>"{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",

Anonymous
06/25/24(Tue)08:48:17 No.101144246

Anonymous 06/25/24(Tue)08:48:17 No.101144246

Do you believe in AGI?

Anonymous
06/25/24(Tue)08:49:21 No.101144254

Anonymous 06/25/24(Tue)08:49:21 No.101144254

>>101144176
it does, yep!

Anonymous
06/25/24(Tue)08:58:28 No.101144328

Anonymous 06/25/24(Tue)08:58:28 No.101144328

File: 1689967658123648.png (936 KB, 822x1024)

936 KB PNG

i was offline for a fair bit, did anything happen with voice synth after elevenlabs fucked their baby into the ground?

Anonymous
06/25/24(Tue)08:59:06 No.101144337

Anonymous 06/25/24(Tue)08:59:06 No.101144337

>>101144328
>after elevenlabs fucked their baby into the ground?
What did they do?

Anonymous
06/25/24(Tue)09:00:39 No.101144357

Anonymous 06/25/24(Tue)09:00:39 No.101144357

>>101144337
last i checked over a year ago they neutered the learing capabilities from feeding it samples and only left their default voices, and they paywalled it iirc.

Anonymous
06/25/24(Tue)09:02:49 No.101144386

Anonymous 06/25/24(Tue)09:02:49 No.101144386

>>101144328
I know that local was kind of a pain in the ass because Python Version/VENV Hell.

I never got an RVC (I think that's the thing, voice changer) project to work.
Tortoise did work well enough to be listen-to-able for me but had lots of glitchy pain points that took the joy out of it.
I heard of a new project that's talking big talk but it also talks big rids and my vramlet ass can't get over the barrier to entry so I shrug and hope for something smaller to hit the scene.

Anonymous
06/25/24(Tue)09:03:00 No.101144389

Anonymous 06/25/24(Tue)09:03:00 No.101144389

>>101144076
cuda dev confirmed it

Anonymous
06/25/24(Tue)09:04:58 No.101144404

Anonymous 06/25/24(Tue)09:04:58 No.101144404

>>101144386
>I heard of a new project that's talking big talk but it also talks big rids and my vramlet ass can't get over the barrier to entry so I shrug and hope for something smaller to hit the scene.
how much we talkin? i bought my 12gb 3060 in the hopes it would be enough for everything other than llm.

Anonymous
06/25/24(Tue)09:06:08 No.101144415

Anonymous 06/25/24(Tue)09:06:08 No.101144415

>>101141208
Yeah. It would randomly produce garbage.
New version is working fine now.

>>101144389
Did he?

Anonymous
06/25/24(Tue)09:07:55 No.101144434

Anonymous 06/25/24(Tue)09:07:55 No.101144434

>>101144404
https://github.com/Camb-ai/MARS5-TTS
You can tell me if it's as heavy as it sounds. Maybe it's not and I'm just dumb.

Anonymous
06/25/24(Tue)09:14:27 No.101144496

Anonymous 06/25/24(Tue)09:14:27 No.101144496

>>101144434
at a quick glance those are tiny ass models at 800 and 1500mB, unless this is very different from llama, whisper, and stable diffusion thats roughly all the vram you need. it sounds like shit though.

llama.cpp CUDA dev !YOmst7Ghe6
06/25/24(Tue)09:16:19 No.101144521

llama.cpp CUDA dev !YOmst7Ghe6 06/25/24(Tue)09:16:19 No.101144521

>>101144389
>>101144415
I don't remember ever saying that bitnet is fundamentally more expensive to train than FP16.

Anonymous
06/25/24(Tue)09:18:02 No.101144536

Anonymous 06/25/24(Tue)09:18:02 No.101144536

>>101144521
I figured it was just anon being retarded.

Anonymous
06/25/24(Tue)09:20:34 No.101144559

Anonymous 06/25/24(Tue)09:20:34 No.101144559

>>101144496
So it's small but bad. Like Bark?

Tortoise needs a successor. Not only just without the pain points, but at least on my system when it does that pre processing step with the number of "chunks" something spontaneously hangs the process. I can't control it, and putting prints in the Python tracks it down to a py math matrix call. I hoped maybe putting a delay (simple Sleep strats) would fix something maybe getting ahead of itself but no dice. So I don't play with Tortoise because it keeps hanging and sometimes it brings the system down too.

Anonymous
06/25/24(Tue)09:23:08 No.101144580

Anonymous 06/25/24(Tue)09:23:08 No.101144580

>>101144559
>So it's small but bad. Like Bark?
anon, there is a video with samples directly on the github page. its no microshit sam, but a very far cry from elevenlabs.

Anonymous
06/25/24(Tue)09:24:56 No.101144602

Anonymous 06/25/24(Tue)09:24:56 No.101144602

>>101144580
I never did 11 so I can't really compare from experience.
But I guess I'll give it a shot if it's better than Bark and won't hang or crash me like Tortoise.

Anonymous
06/25/24(Tue)09:29:28 No.101144650

Anonymous 06/25/24(Tue)09:29:28 No.101144650

File: 00012-63716529.png (1017 KB, 1024x1024)

1017 KB PNG

>>101144386
>I never got an RVC (I think that's the thing, voice changer) project to work.
This works: https://github.com/Mangio621/Mangio-RVC-Fork

Here's 46 minutes of the "willful" voice ripped from Koikatsu. Run that through the above, you will get a flawless model. The key is good voice samples. Games work great because the voice acting is completely isolated, where's anime and movies always have it mixed with SFX and music.

Anonymous
06/25/24(Tue)09:29:51 No.101144654

Anonymous 06/25/24(Tue)09:29:51 No.101144654

>>101144602
so long as you dont mind the ui being a python script. personally ill wait for somebody to cobble together a gui, this looks about good enough in case i need a voice over for some memery but i have no current projects in mind.
11 was really good, personally i got a very reasonable decard cain with like 2 min of random audio snippets, and people did amazing clips with morrowind voices.

Anonymous
06/25/24(Tue)09:29:59 No.101144655

Anonymous 06/25/24(Tue)09:29:59 No.101144655

>>101144650
>where's anime and movies always have it mixed with SFX and music.
wouldn't that be easy to filter out

Anonymous
06/25/24(Tue)09:30:30 No.101144660

Anonymous 06/25/24(Tue)09:30:30 No.101144660

File: 00010-799003007.png (789 KB, 1024x1024)

789 KB PNG

>>101144650
Sorry forgot link to the voice rip: https://files.catbox.moe/t608cl.wav

Anonymous
06/25/24(Tue)09:34:29 No.101144699

Anonymous 06/25/24(Tue)09:34:29 No.101144699

File: 00005-2450028622.png (968 KB, 1024x1024)

968 KB PNG

>>101144655
It can be done, but it's a lot more work. Don't make that your first project, get it working first with a clean file.

Anonymous
06/25/24(Tue)09:36:21 No.101144715

Anonymous 06/25/24(Tue)09:36:21 No.101144715

>>101144650
Any source for other koikatsu rips? How do we do it our self?

Anonymous
06/25/24(Tue)09:40:51 No.101144761

Anonymous 06/25/24(Tue)09:40:51 No.101144761

>>101144699
Have there been many projects using all source data of a character for the model?

Anonymous
06/25/24(Tue)09:45:23 No.101144808

Anonymous 06/25/24(Tue)09:45:23 No.101144808

File: 00106-4092360159.png (1.12 MB, 1024x1024)

1.12 MB PNG

>>101144715
Here's the how-to on ripping the voices: https://open3dlab.com/tutorials/view/120/

The game itself can be found online and doesn't need installing, you just unzip it. You'll do the game and the ripping on the windows side, the rtvc on linux.

The downside is it's a Japanese game, so the voices obviously work best when speaking Japanese, but I'm sure there are English-speaking games you can rip the voices from just as easily.

Overall, even with a fast GPU, there's always some latency, and it's annoying. You can't listen to the processed voice when you talk, and you have to adjust the video delay to match as well.

Anonymous
06/25/24(Tue)09:49:44 No.101144856

Anonymous 06/25/24(Tue)09:49:44 No.101144856

File: 00146-2078510157.png (1.06 MB, 1024x1024)

1.06 MB PNG

>>101144761
I dunno - in Koikatsu and Koikatsu Sunshine the voice acting for each character is divided up into different "phases", like "everyday", "friendly", "romantic", and "ecchi". Never tried it with the "ecchi" files, but you'd probably end up with something good at acting out sex scenes.

Anonymous
06/25/24(Tue)09:53:35 No.101144902

Anonymous 06/25/24(Tue)09:53:35 No.101144902

>>101143292
So 3090 is actually never obsolete?

Anonymous
06/25/24(Tue)09:54:52 No.101144923

Anonymous 06/25/24(Tue)09:54:52 No.101144923

>>101144650
>clone
>install venv, apparently this wants 3.9
>pip
>Whoops, version conflicts, AGAIN.
>update pip because there's a suggestion for that
>Wow, even more version conflicts

Fuck Python.
You are a scripting language, and not even a good one of those. Stay in your lane.

Anonymous
06/25/24(Tue)09:57:46 No.101144949

Anonymous 06/25/24(Tue)09:57:46 No.101144949

>>101144808
Your fetish is disgusting btw

Anonymous
06/25/24(Tue)09:58:00 No.101144950

Anonymous 06/25/24(Tue)09:58:00 No.101144950

File: Untitled.png (13 KB, 837x513)

13 KB PNG

>>101144935
>>101144935
>>101144935

llama.cpp CUDA dev !YOmst7Ghe6
06/25/24(Tue)10:00:03 No.101144968

llama.cpp CUDA dev !YOmst7Ghe6 06/25/24(Tue)10:00:03 No.101144968

>>101144902
3090s don't have FP8 tensor cores but I don't yet know whether that will be relevant.
There are some features on H100s that would maybe be useful and that are not on Ampere/Ada Lovelace but who knows whether NVIDIA will give them to us plebs.

Anonymous
06/25/24(Tue)10:44:02 No.101145533

Anonymous 06/25/24(Tue)10:44:02 No.101145533

>>101144923
>>install venv, apparently this wants 3.9
I feel your pain. I'd recommend using conda, since it's much easier to simply create the environment you need with whatever version of python it wants, vs using venv, which needs the other python version actually installed globally.

Anonymous
06/25/24(Tue)10:48:00 No.101145564

Anonymous 06/25/24(Tue)10:48:00 No.101145564

>>101145533
I think I tried one of those once.
Or it was some "mini" version.
I don't remember but it was doing all kinds of weird shit that I don't understand including something that looked like it was 133th4x0r51n9 my terminal emulator.

And then shit errored out anyway and I disengaged and disentangled it the best I could.

(fuck python)

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.