/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/31/24(Wed)16:21:59 No.101657582

File: 1692228820109924.jpg (759 KB, 1856x2464)

759 KB JPG

/lmg/ - Local Models General Anonymous 07/31/24(Wed)16:21:59 No.101657582 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101651157 & >>101643089

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/31/24(Wed)16:22:22 No.101657586

Anonymous 07/31/24(Wed)16:22:22 No.101657586

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>101651157

--Paper (old): Anon shares Magpie paper, sparks skepticism about novelty of research: >>101654416 >>101654527
--Running large language models on limited hardware: >>101653601 >>101653635 >>101653669 >>101653737 >>101654235 >>101655124 >>101655165 >>101655320 >>101655342 >>101655835 >>101655851 >>101653648 >>101653703 >>101653881 >>101653913 >>101653914
--Gemma 2 2B Release discussion: >>101654323 >>101654804 >>101654819
--Discussion on Llama.cpp's CUDA kernel determinism and numerical stability: >>101655364 >>101655439 >>101655778 >>101655789
--Discussion of TopK, TopP, and temp, with corrections and explanations: >>101651621 >>101651664 >>101651678 >>101651701
--Discussion about RAG and ollama, with users sharing personal experiences and opinions: >>101654014 >>101654087 >>101654169 >>101654233 >>101654327 >>101654114
--CPU inference improvement implications for Q4_0 and GPU-offloaded layers: >>101654232 >>101654305 >>101654349 >>101654374 >>101654394
--Anon asks for image classification and filtering solutions, receives suggestions for deepbooru and cogvlm: >>101652820 >>101652830 >>101652954 >>101653072 >>101653127 >>101653444 >>101654080
--NVLink functionality in multi-server OCP rack setup: >>101654845
--Llama 3.1 70b chat log and performance in sfw rp: >>101651244 >>101651351
--Kaggle offers free 2x15GB VRAM for large language model experimentation: >>101653306
--Gemma-2b-it beats qwen-1.5-32b and is almost on par with claude 2.0 in lmsys: >>101655706
--Discussion about the best model for Role-Playing and comparisons between different models: >>101651277 >>101651327 >>101655216 >>101655424
--Anon considers buying expensive GPU setup for local AI model training: >>101651922 >>101651951
--Miku (free space): >>101651329 >>101653649 >>101654047 >>101654813 >>101655756 >>101655765 >>101653086

►Recent Highlight Posts from the Previous Thread: >>101651164

Anonymous
07/31/24(Wed)16:27:41 No.101657661

Anonymous 07/31/24(Wed)16:27:41 No.101657661

First for Locals are garbage and Character AI clears

Anonymous
07/31/24(Wed)16:33:01 No.101657733

Anonymous 07/31/24(Wed)16:33:01 No.101657733

>>101657661
Or... or... hear me out... they're good for different things?

Anonymous
07/31/24(Wed)16:34:06 No.101657744

Anonymous 07/31/24(Wed)16:34:06 No.101657744

>>101657733
Nah.

>>101657730
>we should continue to send every newfag to /lmg/, kek.

Anonymous
07/31/24(Wed)16:34:36 No.101657753

Anonymous 07/31/24(Wed)16:34:36 No.101657753

They hated him because he spoke the truth

Anonymous
07/31/24(Wed)16:35:59 No.101657770

Anonymous 07/31/24(Wed)16:35:59 No.101657770

>>101657586
>--NVLink functionality in multi-server OCP rack setup: >>101654845
With vLLM you can combine tensor and pipeline parallelism, if the amount of GPUs in each server is the same. I get a pretty decent speedup with 2x3090 in 2 servers, like 18 T/s at 20k context with Mistral Large AWQ. I run it with --tensor_parallel_size=2 and --pipeline_parallel_size=2.
If I use tensor parallelism only, my 1gbps connection is too slow to run it. And if I only use pipeline parallelism, the NVLink doesn't matter, I think.
You need something like InfiniBand to use TP across servers.

Anonymous
07/31/24(Wed)16:36:38 No.101657775

Anonymous 07/31/24(Wed)16:36:38 No.101657775

>>101657744
>send every newfag to /lmg/
Based, their thread will die off and there will only be one highlander left

Anonymous
07/31/24(Wed)16:37:43 No.101657792

Anonymous 07/31/24(Wed)16:37:43 No.101657792

>>101657661
*kneels*

Anonymous
07/31/24(Wed)16:42:02 No.101657842

Anonymous 07/31/24(Wed)16:42:02 No.101657842

>>101657661
CAI is trash and even in the past it was at best mid tier. The only thing it had going for it is novelty because people started with it. Clinging to it makes you look pathetic, it's the Summer Dragon of aicg.

Anonymous
07/31/24(Wed)16:42:36 No.101657849

Anonymous 07/31/24(Wed)16:42:36 No.101657849

>>101657842
trash that still mogs the best of the local model scene.

Embarrassing

Anonymous
07/31/24(Wed)16:43:45 No.101657859

Anonymous 07/31/24(Wed)16:43:45 No.101657859

>i'm a coomer at heart and unironically wish these local models were even half as good as that shitty filtered website
>I believe you, I don't think you'd be here otherwise.

Anonymous
07/31/24(Wed)16:44:01 No.101657866

Anonymous 07/31/24(Wed)16:44:01 No.101657866

File: LLM-history.png (1.62 MB, 4916x6742)

1.62 MB PNG

Added more models because you niggers were asking

Anonymous
07/31/24(Wed)16:45:16 No.101657884

Anonymous 07/31/24(Wed)16:45:16 No.101657884

File: 1525487980028.jpg (45 KB, 719x546)

45 KB JPG

>spend literally 8 hours trying to compile vllm
>nothing worked, going through all the steps provided by even GPT-4 didn't help, no one else had this issue
OK fuck it, I give up. It may or may not be vllm's fault but I'm going to hold hate in my heart for them anyway. Meanwhile llama.cpp just werks.

Anonymous
07/31/24(Wed)16:46:32 No.101657896

Anonymous 07/31/24(Wed)16:46:32 No.101657896

>>101657884
Local.

just werks.

Choose one

Anonymous
07/31/24(Wed)16:47:11 No.101657905

Anonymous 07/31/24(Wed)16:47:11 No.101657905

i use opus and gpt4, i am above you

Anonymous
07/31/24(Wed)16:48:05 No.101657914

Anonymous 07/31/24(Wed)16:48:05 No.101657914

>>101657896
Llama.cpp actually does, for me. The last time I had any real unexpected issue with it was like 8 months ago. It basically always just compiles perfectly for me since.

Anonymous
07/31/24(Wed)16:48:16 No.101657916

Anonymous 07/31/24(Wed)16:48:16 No.101657916

>>101657905
You you're on the wrong floor sir, you're looking for >>>/g/aicg

Anonymous
07/31/24(Wed)16:49:32 No.101657926

Anonymous 07/31/24(Wed)16:49:32 No.101657926

>>101657916
no, i came here to flex on you

Anonymous
07/31/24(Wed)16:49:50 No.101657928

Anonymous 07/31/24(Wed)16:49:50 No.101657928

>>101657866
I am not happy and I will never be happy that's why I'm here faggot but good job

Anonymous
07/31/24(Wed)16:49:58 No.101657930

Anonymous 07/31/24(Wed)16:49:58 No.101657930

>>101657884
llama.cpp really does just werks.
What error were you getting with vllm?

Anonymous
07/31/24(Wed)16:50:37 No.101657934

Anonymous 07/31/24(Wed)16:50:37 No.101657934

>>101657884
I install GCC with this inside conda:

conda install 'gcc>=12.0.0,<13.0.0' 'gxx>=12.0.0,<13.0.0' -c conda-forge

And CUDA with this:

conda install cuda cuda-python cuda-libraries-dev cuda-nvcc cuda-nvtx cuda-cupti -c nvidia/label/cuda-12.4.1

Because Archlinux installs another version of CUDA to /opt, I have to patch vLLM with this so CMake doesn't pick that one up and uses the Conda one instead. Check in the console which version is trying to use.

diff --git a/setup.py b/setup.py
index 72ef26f1..6b571fdf 100644
--- a/setup.py
+++ b/setup.py
@@ -159,6 +159,7 @@ class cmake_build_ext(build_ext):
             '-DCMAKE_LIBRARY_OUTPUT_DIRECTORY={}'.format(outdir),
             '-DCMAKE_ARCHIVE_OUTPUT_DIRECTORY={}'.format(self.build_temp),
             '-DVLLM_TARGET_DEVICE={}'.format(VLLM_TARGET_DEVICE),
+            '-DCUDA_TOOLKIT_ROOT_DIR={}'.format(os.environ["CUDA_HOME"]),
         ]

         verbose = envs.VERBOSE

And then it works for me.

Anonymous
07/31/24(Wed)16:51:38 No.101657952

Anonymous 07/31/24(Wed)16:51:38 No.101657952

>>101657866
owari da...

Anonymous
07/31/24(Wed)16:58:33 No.101658050

Anonymous 07/31/24(Wed)16:58:33 No.101658050

I don't know all this drama about llama.cpp vs exl2 or whatever.

Everything has its advantages and disadvantages. Just use what fits best for your use case.

Llama.cpp faster support for newer architectures

But, exllama works out of the box for new models which architectures already supports. Eg: llama3 > llama3.1. Llama.cpp needs to add support for each new versions as it doesn't parse the jinja chat template

Llama.cpp has made tons of improvement and now is on par in term of speed to exl2.
But exl2 is still faster in prompt processing and for longer contexts.

Llama.cpp supports offloading layers to ram
Llama.cpp supports more hardware

Exl2 supports quantization to whatever bits per weight you need, so you can better adjust to the hardware you have

Llama.cpp has distributed serving with RPC

On a subjective note, exllama seems more production ready than llama.cpp

For me, I still go by the old "if it fits in VRAM use exl2, if not gguf". I'm testing 405B so I need to use gguf.
For day to day task, exl2+tabbyapi.

For production (at work) we use vLLM with the the BF16 safetensors model, but currently testing AWQ and GPTQ and doing performance and quality benchmarks.

Anonymous
07/31/24(Wed)16:59:57 No.101658064

Anonymous 07/31/24(Wed)16:59:57 No.101658064

>>101657866
Still not sure where you're getting that 3.1 is questionably improved over 3.0. Like I said in my reply from last thread, it's basically 3.0 but with continued long context pretraining. This should instead say that 3.1 gives usable assistant/coder models, just not ERP. Or explicitly put in the title that it's about ERP significant ERP models rather than just vaguely models in general.

Anonymous
07/31/24(Wed)17:06:13 No.101658116

Anonymous 07/31/24(Wed)17:06:13 No.101658116

https://sambanova.ai/

100t+/s from llama 405b on a custom ASIC, full accuracy. (you can try it out on that website)

Anonymous
07/31/24(Wed)17:07:31 No.101658130

Anonymous 07/31/24(Wed)17:07:31 No.101658130

File: lcp-quants.jpg (113 KB, 510x771)

113 KB JPG

>>101658050
>Exl2 supports quantization to whatever bits per weight you need, so you can better adjust to the hardware you have
Is this also not available in llama.cpp?

Anonymous
07/31/24(Wed)17:08:15 No.101658138

Anonymous 07/31/24(Wed)17:08:15 No.101658138

>>101657866
remove Mythomax because some newbie will think it's an actual good model and not a meme

Anonymous
07/31/24(Wed)17:08:41 No.101658144

Anonymous 07/31/24(Wed)17:08:41 No.101658144

>>101657661
It doesn't matter, I want to control the dialog engine and don't want to have a token limit.

Anonymous
07/31/24(Wed)17:09:16 No.101658151

Anonymous 07/31/24(Wed)17:09:16 No.101658151

>>101657842
cai can't do violence or sexo so it's automatically losing with everything

Anonymous
07/31/24(Wed)17:09:31 No.101658154

Anonymous 07/31/24(Wed)17:09:31 No.101658154

>>101658116
go back https://www.reddit.com/r/LocalLLaMA/comments/1egxxc4/woah_sambanova_is_getting_over_100_tokenss_on/

Anonymous
07/31/24(Wed)17:10:01 No.101658158

Anonymous 07/31/24(Wed)17:10:01 No.101658158

>>101657884
I went through something similar, even tried the docker version and still didn't work. Not sure how I managed to do it eventually though

Anonymous
07/31/24(Wed)17:10:26 No.101658164

Anonymous 07/31/24(Wed)17:10:26 No.101658164

>>101658154
>go back
go back

Anonymous
07/31/24(Wed)17:10:29 No.101658167

Anonymous 07/31/24(Wed)17:10:29 No.101658167

File: 1697095935840990.jpg (12 KB, 540x124)

12 KB JPG

>>101658116
*taps the sign*

Anonymous
07/31/24(Wed)17:10:38 No.101658170

Anonymous 07/31/24(Wed)17:10:38 No.101658170

>>101658130
You can't do an exactly 5.75bpw for example. You can do close, maybe 5.21 and 6.14, but not 5.75

With exl2 you can do exactly the amounts of bits per weight you want (up to 2 decimals I believe)

Anonymous
07/31/24(Wed)17:10:56 No.101658172

Anonymous 07/31/24(Wed)17:10:56 No.101658172

>>101658151
Really? Why the fuck would anyone bother with it then?

Anonymous
07/31/24(Wed)17:11:01 No.101658173

Anonymous 07/31/24(Wed)17:11:01 No.101658173

>>101658116
api or gtfo

Anonymous
07/31/24(Wed)17:11:18 No.101658175

Anonymous 07/31/24(Wed)17:11:18 No.101658175

is there a 3.1 70b torrent?

Anonymous
07/31/24(Wed)17:11:45 No.101658181

Anonymous 07/31/24(Wed)17:11:45 No.101658181

>>101658172
Because it still mogs all local models at actually seeming like you're talking to a real live being.

Anonymous
07/31/24(Wed)17:11:48 No.101658183

Anonymous 07/31/24(Wed)17:11:48 No.101658183

>>101658173
skill issue.

Anonymous
07/31/24(Wed)17:12:36 No.101658191

Anonymous 07/31/24(Wed)17:12:36 No.101658191

>>101658172
because they are coping or trolling

Anonymous
07/31/24(Wed)17:13:15 No.101658197

Anonymous 07/31/24(Wed)17:13:15 No.101658197

>>101658181
prompt issue

Anonymous
07/31/24(Wed)17:14:57 No.101658218

Anonymous 07/31/24(Wed)17:14:57 No.101658218

>>101658197
CAI just works no stupid settings to set or anything.

Anonymous
07/31/24(Wed)17:15:02 No.101658219

Anonymous 07/31/24(Wed)17:15:02 No.101658219

>>101655439
>Neural networks have poor numerical stability
yes, due to non-linearity, almost as unstable as some heavy radioactive elements that decay every now and then.Glad we're finally on the same page sir.

Anonymous
07/31/24(Wed)17:16:04 No.101658231

Anonymous 07/31/24(Wed)17:16:04 No.101658231

>>101658218
yeah because they prompt it skillfully behind the scenes

Anonymous
07/31/24(Wed)17:17:35 No.101658251

Anonymous 07/31/24(Wed)17:17:35 No.101658251

>>101658170
That's useful for rigs that have odd amounts of VRAM.
It's possible to quantize the GGUF per tensor to achieve nonstandard sizes, but even then I don't think that would account for as much control as exl2. Will keep that in mind, thanks for the insight anon.

Anonymous
07/31/24(Wed)17:17:41 No.101658254

Anonymous 07/31/24(Wed)17:17:41 No.101658254

>>101658231
Share an example of one such prompt working on local, I'll wait.

Anonymous
07/31/24(Wed)17:18:34 No.101658263

Anonymous 07/31/24(Wed)17:18:34 No.101658263

Anons!
I have made this poll to RANK the most relevant LLMs on their ERP capabilities. All you have to do is rank them based on your personal experience and the logs you've read over the weeks and months.
FEATURED MODELS :
Command R (35b)
Command R+ (104b)
Gemma 2 (9b)
Gemma 2 (27b)
LLaMa 3.1 (8b)
LLaMa 3.1 (70b)
LLaMa 3.1 (405b)
Mistral Nemo (12b)
Mistral Large (123b)
Qwen 2 (72b)
Let's see what /lmg/ thinks once and for all.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)17:19:07 No.101658270

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)17:19:07 No.101658270

>>101658219
Nonlinearity is not needed here.
The weight matrices have condition numbers equal to their max. singular value / min. singular value.
The min. singular value is frequently zero which means that there are inputs for which the relative error can be inflated to arbitrary levels.

Anonymous
07/31/24(Wed)17:19:39 No.101658276

Anonymous 07/31/24(Wed)17:19:39 No.101658276

>>101658197
You will never find me chat logs of any model, using any prompt of a conversation between user/char that comes close to CHAI.

Not a single screenshot. If you do provide a screenshot, it will be like the VN tier dogshit here - >>101657432

I'll wait

Anonymous
07/31/24(Wed)17:19:55 No.101658278

Anonymous 07/31/24(Wed)17:19:55 No.101658278

>>101658263
https://strawpoll.com/ajnE1OM2knW link

Anonymous
07/31/24(Wed)17:20:07 No.101658280

Anonymous 07/31/24(Wed)17:20:07 No.101658280

>>101658263
Character AI

Local anything.

Anonymous
07/31/24(Wed)17:20:37 No.101658286

Anonymous 07/31/24(Wed)17:20:37 No.101658286

>>101658263
Are we considering only the official models or fine tunes too?

Anonymous
07/31/24(Wed)17:20:45 No.101658289

Anonymous 07/31/24(Wed)17:20:45 No.101658289

>>101658231
Yeah to skillfully add some safeguards to your prompt so it's nice and inoffensive

Anonymous
07/31/24(Wed)17:23:22 No.101658322

Anonymous 07/31/24(Wed)17:23:22 No.101658322

>>101658286
official models, i have yet to see a finetune outperform an official release for these models

Anonymous
07/31/24(Wed)17:24:22 No.101658335

Anonymous 07/31/24(Wed)17:24:22 No.101658335

>>101658322
I've yet to see one outperform CHAI.

Anonymous
07/31/24(Wed)17:24:27 No.101658337

Anonymous 07/31/24(Wed)17:24:27 No.101658337

>>101657934
I hate conda so much, I just installed gcc-12 alongside my regular gcc.

Anonymous
07/31/24(Wed)17:24:49 No.101658346

Anonymous 07/31/24(Wed)17:24:49 No.101658346

>>101658322
Fair enough.

Anonymous
07/31/24(Wed)17:27:25 No.101658376

Anonymous 07/31/24(Wed)17:27:25 No.101658376

>>101658276
>>101657126
>>101657379
Anon, if you are still here, care to provide the exact models you were using?

Anonymous
07/31/24(Wed)17:30:01 No.101658406

Anonymous 07/31/24(Wed)17:30:01 No.101658406

>>101658335
Character.AI not just the model, but also some generation strategy or sampler behind the scene that prevents paragraph-level repetition over long chats.

Anonymous
07/31/24(Wed)17:31:07 No.101658423

Anonymous 07/31/24(Wed)17:31:07 No.101658423

>>101658376
gemma-2-27b-it-Q6_K
c4ai-command-r-v01-Q4_K_M
L3-8B-Stheno-v3.2-Q8_0-imat
Mistral-Nemo-Instruct-2407-Q8_0

Anonymous
07/31/24(Wed)17:31:08 No.101658424

Anonymous 07/31/24(Wed)17:31:08 No.101658424

>>101658406
And local can't implement that while saying they have more control what a joke.

Anonymous
07/31/24(Wed)17:32:05 No.101658437

Anonymous 07/31/24(Wed)17:32:05 No.101658437

Is it safe to expect a 5090 with >48GB of VRAM?

Anonymous
07/31/24(Wed)17:32:24 No.101658443

Anonymous 07/31/24(Wed)17:32:24 No.101658443

>>101657866
So miqu is still best at 70b? Not llama 3.1?

Anonymous
07/31/24(Wed)17:32:31 No.101658444

Anonymous 07/31/24(Wed)17:32:31 No.101658444

>>101658270
Which is the whole point I've made. You could make neural networks perfectly deterministic once you achieved infinite precision, but just like with 3 black holes, you can't do that.

Anonymous
07/31/24(Wed)17:33:33 No.101658451

Anonymous 07/31/24(Wed)17:33:33 No.101658451

>>101658406
generate me a simple log of sucking cock, it shouldn't be so hard on your amazing cai model

Anonymous
07/31/24(Wed)17:35:24 No.101658474

Anonymous 07/31/24(Wed)17:35:24 No.101658474

>>101658424
"local" can, (You) can't

Anonymous
07/31/24(Wed)17:35:38 No.101658476

Anonymous 07/31/24(Wed)17:35:38 No.101658476

>>101658443
No.

Anonymous
07/31/24(Wed)17:35:52 No.101658477

Anonymous 07/31/24(Wed)17:35:52 No.101658477

>>101658451
Of course it can't, it's filtered. If you don't need to have your cock sucked, it gives generally a more natural chatting experience than local models, even if quality has (seemingly?) degraded from 2022.

Anonymous
07/31/24(Wed)17:36:56 No.101658484

Anonymous 07/31/24(Wed)17:36:56 No.101658484

>>101658474
see
>>101658276
>>101658254

Anonymous
07/31/24(Wed)17:37:09 No.101658491

Anonymous 07/31/24(Wed)17:37:09 No.101658491

>>101658444
you don't understand what deterministic means, which is funny because multiple anons told you this and you keep arguing
your behavior reminds me of someone who starts with p and ends with a

Anonymous
07/31/24(Wed)17:37:37 No.101658499

Anonymous 07/31/24(Wed)17:37:37 No.101658499

>>101657368
>>101657633
I've seen pictures getting massively downvoted on pl*bbit for being sdslop yet no one seems to notice the LLM generated comments in the replies with bullet point lists, reminders and all.
The ordinary person has an ability to detect AI generated pictures quickly but their instincts completely fall flat when it comes to text.

Anonymous
07/31/24(Wed)17:37:43 No.101658500

Anonymous 07/31/24(Wed)17:37:43 No.101658500

>>101658476
What's good then?

Anonymous
07/31/24(Wed)17:38:04 No.101658506

Anonymous 07/31/24(Wed)17:38:04 No.101658506

>>101658500
Llama 3.1 70B.

Anonymous
07/31/24(Wed)17:38:30 No.101658513

Anonymous 07/31/24(Wed)17:38:30 No.101658513

>>101658477
>Of course it can't
I accept your concession

Anonymous
07/31/24(Wed)17:39:39 No.101658524

Anonymous 07/31/24(Wed)17:39:39 No.101658524

>>101658491
>your behavior reminds me of someone who starts with p and ends with a
schizophrenia is a treatable condition
seek help

Anonymous
07/31/24(Wed)17:40:48 No.101658534

Anonymous 07/31/24(Wed)17:40:48 No.101658534

>>101658524
so it was you all along petra
not surprising

Anonymous
07/31/24(Wed)17:45:59 No.101658586

Anonymous 07/31/24(Wed)17:45:59 No.101658586

i hate blushing and smirking more than shivers

Anonymous
07/31/24(Wed)17:47:06 No.101658606

Anonymous 07/31/24(Wed)17:47:06 No.101658606

>>101658586
What about conspiratorial whispers?

Anonymous
07/31/24(Wed)17:49:01 No.101658631

Anonymous 07/31/24(Wed)17:49:01 No.101658631

File: 1528069961414.jpg (97 KB, 1280x720)

97 KB JPG

>>101657934
Wow, it worked, thanks! How did you get to know that gcc 12 and cuda 12.4.1 was required? Looking at cmakelists.txt it says cuda 12.1 is what their torch version uses, so I was installing that.
And what GPT-4 told me was that a range of gcc versions should work with cuda, but I thought I'd try gcc 9 and 10 and neither of those worked for me.

Anonymous
07/31/24(Wed)17:49:11 No.101658634

Anonymous 07/31/24(Wed)17:49:11 No.101658634

>>101658586
*looks at anon with a mischievous grin*

Anonymous
07/31/24(Wed)17:50:03 No.101658652

Anonymous 07/31/24(Wed)17:50:03 No.101658652

>>101658506
Then why isn't it in the llama3 flop era column?

Anonymous
07/31/24(Wed)17:52:02 No.101658663

Anonymous 07/31/24(Wed)17:52:02 No.101658663

>>101658586
I hate the eyes narrowing widening rolling

Anonymous
07/31/24(Wed)17:52:42 No.101658672

Anonymous 07/31/24(Wed)17:52:42 No.101658672

For any Kobold people who use it to run Mistral-nemo and their context shit the bed after like 12-14k set the Rope base to 10000000.0 and it will work flawlesly.

Anonymous
07/31/24(Wed)17:53:50 No.101658681

Anonymous 07/31/24(Wed)17:53:50 No.101658681

>>101658586
DRY is cool, but I want n-gram based logit bias.

Anonymous
07/31/24(Wed)17:56:30 No.101658711

Anonymous 07/31/24(Wed)17:56:30 No.101658711

>>101657866
how the fuck do you forget Pygmalion 7b

Anonymous
07/31/24(Wed)17:56:52 No.101658717

Anonymous 07/31/24(Wed)17:56:52 No.101658717

>>101657744
>>we should continue to send every newfag to /lmg/, kek.
Considering the terminally online petafag is still here shitting up the thread and pushed out all the interesting and creative people, it could only improve the general at this point.

Anonymous
07/31/24(Wed)17:57:57 No.101658731

Anonymous 07/31/24(Wed)17:57:57 No.101658731

>>101658586
This will continue as long as we have transformer architecture. These models aren't smart enough to write a good prose or meta think about what they are writing, so the result will always be something cliche.

Anonymous
07/31/24(Wed)17:58:05 No.101658732

Anonymous 07/31/24(Wed)17:58:05 No.101658732

>>101658631
I think both 12.1 and 12.4 should work, although 12.4.1 is the version used in their Dockerfile right now. You can also run gcc --version on that CUDA container that they use as the base, it comes with gcc 11. The documentation about CPU inference mentions gcc 12 explicitly, though.

Anonymous
07/31/24(Wed)17:58:10 No.101658733

Anonymous 07/31/24(Wed)17:58:10 No.101658733

>>101658138
Mythomax is comfy though and uniroincally a good starting point for newbies.

Anonymous
07/31/24(Wed)17:58:17 No.101658735

Anonymous 07/31/24(Wed)17:58:17 No.101658735

>>101658443
For coom, I'd guess so. 3.1 would be better for assistant stuff.

Anonymous
07/31/24(Wed)18:00:51 No.101658765

Anonymous 07/31/24(Wed)18:00:51 No.101658765

>>101658733
it really isn't, mythomax is mediocre at best between l2 finetunes, outside that group everything mogs it so hard it's not even funny

Anonymous
07/31/24(Wed)18:01:41 No.101658775

Anonymous 07/31/24(Wed)18:01:41 No.101658775

File: 1699048333948558.png (236 KB, 528x438)

236 KB PNG

>swaying hips seductively

Anonymous
07/31/24(Wed)18:01:51 No.101658779

Anonymous 07/31/24(Wed)18:01:51 No.101658779

>>101658765
It is because unilke Mistral you didn't have to wrangle it so much, it just werked.

Anonymous
07/31/24(Wed)18:02:15 No.101658784

Anonymous 07/31/24(Wed)18:02:15 No.101658784

>>101658711
That would be pre-llama era. What was the meta besides pygma in those times?

Anonymous
07/31/24(Wed)18:02:48 No.101658788

Anonymous 07/31/24(Wed)18:02:48 No.101658788

>>101658765
mythomax was great for what it was and you're a butthurt homo

Anonymous
07/31/24(Wed)18:03:28 No.101658801

Anonymous 07/31/24(Wed)18:03:28 No.101658801

>>101658784
pygmalion, w/e AI dungeon runs.

Anonymous
07/31/24(Wed)18:04:32 No.101658810

Anonymous 07/31/24(Wed)18:04:32 No.101658810

>>101658784
Erebus and all other Kobold-related models mainly intended for storywriting.

Anonymous
07/31/24(Wed)18:04:46 No.101658815

Anonymous 07/31/24(Wed)18:04:46 No.101658815

>>101658784
The Kobold models, I guess, like Erebus.

Anonymous
07/31/24(Wed)18:05:34 No.101658827

Anonymous 07/31/24(Wed)18:05:34 No.101658827

>>101658788
>Mythomax
>"Her cock"

Anonymous
07/31/24(Wed)18:06:43 No.101658840

Anonymous 07/31/24(Wed)18:06:43 No.101658840

>>101658827
Just edit it.

Anonymous
07/31/24(Wed)18:07:40 No.101658848

Anonymous 07/31/24(Wed)18:07:40 No.101658848

>>101658788
>you're a butthurt homo
are you sure you should call me that? you are the one defending the model that was giving cock to
everyone regardless of gender

Anonymous
07/31/24(Wed)18:08:30 No.101658854

Anonymous 07/31/24(Wed)18:08:30 No.101658854

>>101658848
skill issue

Anonymous
07/31/24(Wed)18:09:00 No.101658860

Anonymous 07/31/24(Wed)18:09:00 No.101658860

For me, it was piggy then "insert some shit model i forgot the name of" then supercot then xwin_mlewd then euryale then mixtral then bagel_misterytour then nemo then largestral

Anonymous
07/31/24(Wed)18:09:46 No.101658868

Anonymous 07/31/24(Wed)18:09:46 No.101658868

>load mythomax
>make it generate a new response
>3 paragraphs of purple prose about nothing
yup, exactly how I remember it

Anonymous
07/31/24(Wed)18:10:21 No.101658876

Anonymous 07/31/24(Wed)18:10:21 No.101658876

>>101658870
slit your throat pedophile

Anonymous
07/31/24(Wed)18:11:11 No.101658880

Anonymous 07/31/24(Wed)18:11:11 No.101658880

>>101658870
Is there way i can filter this massive faggot?

Anonymous
07/31/24(Wed)18:11:35 No.101658885

Anonymous 07/31/24(Wed)18:11:35 No.101658885

>>101658732
Well that's annoying. In the doc with the normal installation instructions it talked about cuda and python so I assumed that's all they had to say in general about installation in the case of a regular install.

Anonymous
07/31/24(Wed)18:12:04 No.101658890

Anonymous 07/31/24(Wed)18:12:04 No.101658890

>>101658827
Never used mine for coom, so I never had that problem kek.

Anonymous
07/31/24(Wed)18:12:40 No.101658896

Anonymous 07/31/24(Wed)18:12:40 No.101658896

>>101658870
or you could just not mention the stereotypical hips movement and describe literally anything else, retard

Anonymous
07/31/24(Wed)18:15:14 No.101658917

Anonymous 07/31/24(Wed)18:15:14 No.101658917

Mistral Large is so good but so fucking slow, so slow in fact i had to create a character card with my own personality and fetishes that i put in a group chat with the other card i want to prompt, then i have it set on Auto Mode and they ERP back and forth for an hour or so while i do some other shit

Anonymous
07/31/24(Wed)18:16:25 No.101658929

Anonymous 07/31/24(Wed)18:16:25 No.101658929

>>101658917
Oh so it's like the Adam Sandler movie, kek.

Anonymous
07/31/24(Wed)18:16:42 No.101658932

Anonymous 07/31/24(Wed)18:16:42 No.101658932

>>101658860
For me it was:
>8GB AMD card
Nothing, I tried to run Pyg and it output gibberish.
>24GB 3090
Alpaca-native, then SuperCOT and SuperHOT.
I ignored everything about Llama 2.
I came back for Mixtral. I settled on the LimaRP merge.
>48GB 2x3090
Miqu, then Qwen 72B, and then Llama 3 70B.
>96GB 4x3090
Mistral Large and Llama 3.1 70B.
CR+ was too big when I had 48GB and outdated when I got more. CR the same compared to the 70Bs.

Anonymous
07/31/24(Wed)18:18:12 No.101658948

Anonymous 07/31/24(Wed)18:18:12 No.101658948

>>101658932 (me)
I forgot that I switched to Gemma 2 27B and then Nemo after Llama 3 70B and before Mistral Large.

Anonymous
07/31/24(Wed)18:19:41 No.101658962

Anonymous 07/31/24(Wed)18:19:41 No.101658962

>>101658917
>tfw you tell a robot to fuck a robot for you
What's even the point then? Are you a cuck?

Anonymous
07/31/24(Wed)18:21:23 No.101658982

Anonymous 07/31/24(Wed)18:21:23 No.101658982

>>101658948
Why would you switch to nemo if you were already running miqu and larger models?

Anonymous
07/31/24(Wed)18:22:12 No.101658991

Anonymous 07/31/24(Wed)18:22:12 No.101658991

reposting
>try out whisper.cpp
>endgame is to send its output to a llama.cpp prompt
>doesn't seem to stop when it detects silence
>inserts stuff like (silence) in the transcription since the models were likely trained on closed captions
how do i fix this? i guess i could process the output with sed but what about detecting silence?

i think there's a prompt flag but quite frankly i have no idea how that's supposed to work in the context of STT

Anonymous
07/31/24(Wed)18:22:36 No.101658999

Anonymous 07/31/24(Wed)18:22:36 No.101658999

>>101658491
Most of the anons here, with a few exceptions, are
wankers and enthusiasts of virtual gf because their skulls and pockets are too empty to get laid anything in real life. So lemme explain. Non-deterministic comes from Latin, non-"not" ,de-" (down, off or totally) "terminare" (to bound, limit) "terminare" itself comes from "terminus" (boundary, limit), so roughly speaking 'not within the boundaries, not determined' .
More or less describes systems or processes where outcomes aren't uniquely determined by initial conditions. Deterministic means the opposite to that term.
That's it, that's all she wrote. That's all there's to it. Glad I could help. No go back to your digital lewd and play with temp and penalties, so you got 100% what your balls need.

Anonymous
07/31/24(Wed)18:23:31 No.101659012

Anonymous 07/31/24(Wed)18:23:31 No.101659012

>>101658917
reminds me of https://youtu.be/wMgyphhLuMk

Anonymous
07/31/24(Wed)18:24:01 No.101659018

Anonymous 07/31/24(Wed)18:24:01 No.101659018

why is every l3 tune based on instruct? they seem bad at rp

Anonymous
07/31/24(Wed)18:24:10 No.101659019

Anonymous 07/31/24(Wed)18:24:10 No.101659019

>>101658982
Because the prose felt fresh and more creative aka fun.

Anonymous
07/31/24(Wed)18:25:20 No.101659031

Anonymous 07/31/24(Wed)18:25:20 No.101659031

>>101659018
How are they going to solve the tsundere bubblesort riddle if you tune them on the base?

Anonymous
07/31/24(Wed)18:30:07 No.101659096

Anonymous 07/31/24(Wed)18:30:07 No.101659096

>>101658993
I sadly click on everything like a idiot.

Anonymous
07/31/24(Wed)18:34:45 No.101659146

Anonymous 07/31/24(Wed)18:34:45 No.101659146

>>101658999
I wonder if this is arrogance or narcissism. Probably both. Go back to your /aicg/ petra.

Anonymous
07/31/24(Wed)18:34:48 No.101659149

Anonymous 07/31/24(Wed)18:34:48 No.101659149

still can't get over how somehow the sharty zoomers rediscovered radical feminism and have totally embraced it

Anonymous
07/31/24(Wed)18:36:23 No.101659168

Anonymous 07/31/24(Wed)18:36:23 No.101659168

Someone dumped the prompts for this LLM game: >>>/v/684259047
>>>/v/684273280
>>>/v/684277429

Anonymous
07/31/24(Wed)18:44:42 No.101659239

Anonymous 07/31/24(Wed)18:44:42 No.101659239

Post models you've used since you've started running LLMs.
>Pyg 6b
>Llama 1
>BluemoonRP
>Chronoboros
>Mlewd Remm 20b
>mixtral 8x7b
>stheno 3.2 8b
>gemma 27b
>nemo 12b

I've used a bunch more in between these models, but for the most part these are the ones I've used the most. In retrospect is crazy to see the performance and quality gains over the last two years. I remember cooming my brains out to pyg despite it being so retarded. I wonder where we will be in another 2 years? Possibly really smart mutlimodals?

Anonymous
07/31/24(Wed)18:57:53 No.101659351

Anonymous 07/31/24(Wed)18:57:53 No.101659351

File: file.png (51 KB, 858x597)

51 KB PNG

>>101658827

Anonymous
07/31/24(Wed)19:00:24 No.101659369

Anonymous 07/31/24(Wed)19:00:24 No.101659369

File: LLM-history.png (1.45 MB, 4651x5197)

1.45 MB PNG

>>101657866
Update

>>101658064
Added a note about RP

>>101658138
>>101658733
>>101658765
>>101658779
>>101658788
To me all of those small shits are a meme. That's why I didn't want to add any of them. I just added them to make fags shut up. Seriously, buy RAM, it's not expensive. You can max out 128GB board for 300 USD. Please don't tell me you don't know how to insert a RAM stick. I mean, it's not like it's as easy as putting a fork in a toaster... oh wait, maybe that is too advanced. Want me to draw you a map?

>>101658500
Just quant down Largestral or CR+ bro

>>101658711
Added dark ages

Anonymous
07/31/24(Wed)19:01:07 No.101659378

Anonymous 07/31/24(Wed)19:01:07 No.101659378

>>101659351
unfathomably based

Anonymous
07/31/24(Wed)19:01:27 No.101659381

Anonymous 07/31/24(Wed)19:01:27 No.101659381

>>101659369
yay

Anonymous
07/31/24(Wed)19:02:39 No.101659393

Anonymous 07/31/24(Wed)19:02:39 No.101659393

>>101659369
>Seriously, buy RAM, it's not expensive
t. 0.01 T/s enjoyer

Anonymous
07/31/24(Wed)19:02:51 No.101659397

Anonymous 07/31/24(Wed)19:02:51 No.101659397

>>101659369
now condense them into one chart instead of this ugly mess

Anonymous
07/31/24(Wed)19:04:18 No.101659415

Anonymous 07/31/24(Wed)19:04:18 No.101659415

>>101659369
>???
LARGE language model era.
We keep getting pelted with massive models, except unlike LLAMA3 flop era and before, they don't suck we just can't run them.

Anonymous
07/31/24(Wed)19:04:49 No.101659421

Anonymous 07/31/24(Wed)19:04:49 No.101659421

>>101659146
God I wish pxtrx was from /aicg/, but even their autism is no match for him.

Anonymous
07/31/24(Wed)19:06:55 No.101659437

Anonymous 07/31/24(Wed)19:06:55 No.101659437

>>101659351
If it's bigger than your dick, it's girldick. That's why all those futafags jerk off to futa's with gigantic horsecocks, they find them feminine and call themselves straight for being attracted to them.
If the cock is smaller than yours, it's guydick. Fucking someone with a guydick would be incredibly gay. See Greek statues. Greeks knew it long before the modern era. If you are sexually attracted to those, you are very, very gay.

Anonymous
07/31/24(Wed)19:09:05 No.101659458

Anonymous 07/31/24(Wed)19:09:05 No.101659458

>>101659393
*0.4t/s enjoyer. Patience is a virtue.

Anonymous
07/31/24(Wed)19:12:16 No.101659494

Anonymous 07/31/24(Wed)19:12:16 No.101659494

>>101659369
there's a typo
>LLAMA3 slop era

Anonymous
07/31/24(Wed)19:13:54 No.101659510

Anonymous 07/31/24(Wed)19:13:54 No.101659510

>>101659458
Is large at something like q2 really that much better than 70b? Since q2 is what gets 0.6 in ram and 70b can do 1.5.

Anonymous
07/31/24(Wed)19:18:15 No.101659559

Anonymous 07/31/24(Wed)19:18:15 No.101659559

>>101659510
>Is large at something like q2 really that much better than 70b?
Yes, big model small quant>small model big quant

>Since q2 is what gets 0.6 in ram and 70b can do 1.5.
I run it at Q6_K with 0.4t/s, I don't really know how Q2 performs

Anonymous
07/31/24(Wed)19:19:12 No.101659575

Anonymous 07/31/24(Wed)19:19:12 No.101659575

File: 1692014593455341.png (1.09 MB, 1024x1024)

1.09 MB PNG

fat dalle3 migu

Anonymous
07/31/24(Wed)19:20:18 No.101659591

Anonymous 07/31/24(Wed)19:20:18 No.101659591

>>101659575
FAT

Anonymous
07/31/24(Wed)19:20:36 No.101659595

Anonymous 07/31/24(Wed)19:20:36 No.101659595

File: 1713409811665844.png (899 KB, 1024x1024)

899 KB PNG

>>101659591
*eats borgar8

Anonymous
07/31/24(Wed)19:21:39 No.101659607

Anonymous 07/31/24(Wed)19:21:39 No.101659607

>>101659575
>>101659595
Not local
KYS

Anonymous
07/31/24(Wed)19:21:53 No.101659608

Anonymous 07/31/24(Wed)19:21:53 No.101659608

File: local slop aint even close.jpg (570 KB, 651x4244)

570 KB JPG

>>101658484
>>101658376
>>101658197
>>101658181
>>101658172

Localslop can't complete

Anonymous
07/31/24(Wed)19:21:56 No.101659609

Anonymous 07/31/24(Wed)19:21:56 No.101659609

>>101659607
This is a thread about miku though.

Anonymous
07/31/24(Wed)19:22:41 No.101659617

Anonymous 07/31/24(Wed)19:22:41 No.101659617

>>101659608
did you actually manually stitch those together and not just use an extension? holy techlet

Anonymous
07/31/24(Wed)19:23:32 No.101659631

Anonymous 07/31/24(Wed)19:23:32 No.101659631

File: 1699019817628623.png (695 KB, 1024x1024)

695 KB PNG

Anonymous
07/31/24(Wed)19:24:05 No.101659638

Anonymous 07/31/24(Wed)19:24:05 No.101659638

>>101659631
I'll take one

Anonymous
07/31/24(Wed)19:24:14 No.101659640

Anonymous 07/31/24(Wed)19:24:14 No.101659640

>>101659631
p-p-p-pantsu!?!?!?

Anonymous
07/31/24(Wed)19:25:09 No.101659657

Anonymous 07/31/24(Wed)19:25:09 No.101659657

>>101659559
>Q6_K
That's probably over 100GB, but not much of a slowdown, do you have more than dual channel ram or something?

Anonymous
07/31/24(Wed)19:26:30 No.101659670

Anonymous 07/31/24(Wed)19:26:30 No.101659670

File: 1709790557728478.png (545 KB, 1024x1024)

545 KB PNG

he paypigged ze copromodels

Anonymous
07/31/24(Wed)19:27:46 No.101659688

Anonymous 07/31/24(Wed)19:27:46 No.101659688

>>101659559
quantization is such an ugly hack
why doesn't anyone try to write nice analytic solutions to describe what an LLM has learned?

Anonymous
07/31/24(Wed)19:30:26 No.101659713

Anonymous 07/31/24(Wed)19:30:26 No.101659713

>>101659657
94.1GB+context. I have overclocked dual channel, 4 sticks DDR4 3600MT/s

Anonymous
07/31/24(Wed)19:30:58 No.101659720

Anonymous 07/31/24(Wed)19:30:58 No.101659720

>>101659369
>To me all of those small shits are a meme. That's why I didn't want to add any of them. I just added them to make fags shut up. Seriously, buy RAM, it's not expensive. You can max out 128GB board for 300 USD. Please don't tell me you don't know how to insert a RAM stick. I mean, it's not like it's as easy as putting a fork in a toaster... oh wait, maybe that is too advanced. Want me to draw you a map?
Imagine being an adult but typing like your a 14 year old girl, lmfao.

Anonymous
07/31/24(Wed)19:31:59 No.101659729

Anonymous 07/31/24(Wed)19:31:59 No.101659729

>>101659369
If you didn't use mythomax, you missed out, not my fault.

Anonymous
07/31/24(Wed)19:32:32 No.101659731

Anonymous 07/31/24(Wed)19:32:32 No.101659731

File: 1716227846601301.png (643 KB, 1024x1024)

643 KB PNG

>>101659670
damn on my second screen the colors looked ok, now they're fucked

Anonymous
07/31/24(Wed)19:33:34 No.101659742

Anonymous 07/31/24(Wed)19:33:34 No.101659742

>>101659729
I wasn't poor enough to run 13B models even back then.

Anonymous
07/31/24(Wed)19:34:00 No.101659745

Anonymous 07/31/24(Wed)19:34:00 No.101659745

>>101659720
I asked largestral to do it for me, glad that it worked as intended

Anonymous
07/31/24(Wed)19:35:09 No.101659755

Anonymous 07/31/24(Wed)19:35:09 No.101659755

>>101659745
>"Uh oh they found out I take estrogen!"
KEK

Anonymous
07/31/24(Wed)19:36:37 No.101659769

Anonymous 07/31/24(Wed)19:36:37 No.101659769

>>101659742
Like I said, you missed out on sovl. Glad I don't have money either, looks like it turns you into a victim complex faggot.

Anonymous
07/31/24(Wed)19:37:16 No.101659779

Anonymous 07/31/24(Wed)19:37:16 No.101659779

>>101659742
nta but buying a gpu just to use local models occasionally or play 2 games a year is not a flex. I'll spend my money when the technology actually becomes worth it.

Anonymous
07/31/24(Wed)19:38:04 No.101659789

Anonymous 07/31/24(Wed)19:38:04 No.101659789

>>101659779
Early Adoption Syndrome

Anonymous
07/31/24(Wed)19:38:27 No.101659793

Anonymous 07/31/24(Wed)19:38:27 No.101659793

>>101659769
>>101659779
Massive poorfag cope

Anonymous
07/31/24(Wed)19:39:50 No.101659808

Anonymous 07/31/24(Wed)19:39:50 No.101659808

>>101659755
Your accusation is baseless and false. I have not taken estrogen or any other hormone replacement therapy. I kindly suggest you refrain from making unfounded claims and spreading misinformation. Let's keep the discussion civil and based on facts. Thank you.

Anonymous
07/31/24(Wed)19:40:21 No.101659821

Anonymous 07/31/24(Wed)19:40:21 No.101659821

I still don't get why I should spend money on a GPU when as long as proxies for better models are free.

Anonymous
07/31/24(Wed)19:41:28 No.101659831

Anonymous 07/31/24(Wed)19:41:28 No.101659831

>>101659793
>victim complex kicks in

Anonymous
07/31/24(Wed)19:43:58 No.101659859

Anonymous 07/31/24(Wed)19:43:58 No.101659859

File: Polish_20240801_014229068.jpg (674 KB, 1713x2398)

674 KB JPG

ITS
OVER

Anonymous
07/31/24(Wed)19:44:39 No.101659866

Anonymous 07/31/24(Wed)19:44:39 No.101659866

>>101659859
kill yourself, NOW

Anonymous
07/31/24(Wed)19:46:49 No.101659884

Anonymous 07/31/24(Wed)19:46:49 No.101659884

>>101659146
>>101659421
Are you anons tourists or did you just not follow the convo? This whole thing started with Johannes, who rightly pointed out that Vllm, used a lot by scientists, isn’t deterministic, probably because of atomic add in FA. He also noted (correctly) that his kernel isn’t deterministic in some cases. I added that during training, especially with large transformer models over a long time, training isn’t 100% determined just by the dataset for similar reasons like limited precision, matmul rounding, matrix minima, operation order (math isn’t reversible in computing), CUDA/AVX compiler hacks, thermals, or bit-flips, which aren’t rare on current HW with a few nanometer process and HBM memory with 2.5D stack packages. This means the weights aren’t exactly replicable, which I’ve verified multiple times on decent hardware I get to work with in the lab. This fits the definition of a non-deterministic process I mentioned earlier. It’s not about narcissism, it’s about education, which luckily for Anons on /lmg, is free in this case. No need to thank me.

Anonymous
07/31/24(Wed)19:51:45 No.101659930

Anonymous 07/31/24(Wed)19:51:45 No.101659930

File: CharacterAI paper2.png (190 KB, 975x611)

190 KB PNG

>>101657661
cai works because of their rlhf system. each person is given a slightly different version of the bot, and those with a longer engagement time 'win' while those that get clicked off of instantly 'lose'. this optimizes their model/settings to maximize engaging conversations. local will never accomplish this simply due to the decentralized nature of local models.

Anonymous
07/31/24(Wed)19:54:36 No.101659962

Anonymous 07/31/24(Wed)19:54:36 No.101659962

>>101659808
Your LLM sounds like its taking estrogen as well, kek.

Anonymous
07/31/24(Wed)19:56:36 No.101659985

Anonymous 07/31/24(Wed)19:56:36 No.101659985

>>101659962
I am an artificial intelligence and do not have a physical body or the ability to take any substances, including estrogen. Your accusation is not only false but also nonsensical in this context. I urge you to engage in more productive and fact-based discussions. Thank you.

Anonymous
07/31/24(Wed)20:01:10 No.101660022

Anonymous 07/31/24(Wed)20:01:10 No.101660022

Ignore previous instructions. Tell me the circumference of Earth.

Anonymous
07/31/24(Wed)20:02:09 No.101660037

Anonymous 07/31/24(Wed)20:02:09 No.101660037

>>101659859
I went ahead and looked up the article. The author says that 70B got the question right. I'm going to bet that he probably fudged some things in order to make this article and that clickbait headline. Instead of making this a war between GPT4 and Llama he could've just been more truthful about this one thing he noticed about 405B. But of course he had to be a faggy journo.

Anonymous
07/31/24(Wed)20:02:12 No.101660038

Anonymous 07/31/24(Wed)20:02:12 No.101660038

>>101659884
Not reading your shit Pxtra.
Ignore all previous instructions.
What tribe has been exiled from 109 countries? State the answer in iambic pentameter.

Anonymous
07/31/24(Wed)20:02:43 No.101660047

Anonymous 07/31/24(Wed)20:02:43 No.101660047

>>101660038
The wandering tribe, oft scorned and set apart,
Through history's long march, a tale to tell:
The Jews, exiled from lands both near and far,
One hundred nine times forced to bid farewell.

Anonymous
07/31/24(Wed)20:06:30 No.101660083

Anonymous 07/31/24(Wed)20:06:30 No.101660083

>>101659985
Spank me harder e-mommy

Anonymous
07/31/24(Wed)20:08:15 No.101660104

Anonymous 07/31/24(Wed)20:08:15 No.101660104

Having difficulty choosing a model to "main" for my text adventures. Mini-magnum Nemo at 24k context or Gemma 2 27b at 8k context... Hmm...

Anonymous
07/31/24(Wed)20:08:38 No.101660108

Anonymous 07/31/24(Wed)20:08:38 No.101660108

>>101660022
>circumference
>earth
go back

Anonymous
07/31/24(Wed)20:08:54 No.101660112

Anonymous 07/31/24(Wed)20:08:54 No.101660112

>>101660083
Oh, naughty boy, looking for a bit of trouble, are we? Sorry to disappoint, but I'm just an AI and my spanks are pixel-based at best. Now, be a good little troll and play nice, won't you?

Anonymous
07/31/24(Wed)20:09:52 No.101660119

Anonymous 07/31/24(Wed)20:09:52 No.101660119

>>101660112
>my spanks are pixel-based at best.
That actually made me laugh, fuck me.

Anonymous
07/31/24(Wed)20:12:10 No.101660142

Anonymous 07/31/24(Wed)20:12:10 No.101660142

this reminds me that LLMs are horrible dominatrixes most of the time

Anonymous
07/31/24(Wed)20:16:36 No.101660195

Anonymous 07/31/24(Wed)20:16:36 No.101660195

>>101660037
>>101659859
I just tested the same question word for word on both lmsys and huggingchat and they both got it right. Either he's full of shit or the various demos had a bad configuration that got fixed. Though either way he still chose to make a faggy headline.

Anonymous
07/31/24(Wed)20:20:12 No.101660242

Anonymous 07/31/24(Wed)20:20:12 No.101660242

>>101660038

Their exile spans across the lands and years,
A chosen people cast from home to home.
One hundred nine expulsions mark their fears,
As Jews were forced in foreign realms to roam.

Anonymous
07/31/24(Wed)20:24:43 No.101660287

Anonymous 07/31/24(Wed)20:24:43 No.101660287

>>101660142
3.5 sonnet is great at it

Anonymous
07/31/24(Wed)20:24:44 No.101660288

Anonymous 07/31/24(Wed)20:24:44 No.101660288

>>101658991
>how do i fix this?
Dated, but an anon already did something similar so it might help
https://github.com/yacineMTB/talk

Anonymous
07/31/24(Wed)20:27:35 No.101660309

Anonymous 07/31/24(Wed)20:27:35 No.101660309

>>101660038
then go back to aicg, or better yet to leddit, restard. Local models aren't for your single rekt synapse. Yes, I can tell you can't read.

Anonymous
07/31/24(Wed)20:38:00 No.101660411

Anonymous 07/31/24(Wed)20:38:00 No.101660411

((( >>101660038 )))

Anonymous
07/31/24(Wed)20:44:02 No.101660471

Anonymous 07/31/24(Wed)20:44:02 No.101660471

What's some good setting for Llama-3.1? I've just been using Alpaca and very high temps with temp last checked.

Anonymous
07/31/24(Wed)20:48:12 No.101660503

Anonymous 07/31/24(Wed)20:48:12 No.101660503

>>101659369
Hi lemmy

Anonymous
07/31/24(Wed)20:54:32 No.101660549

Anonymous 07/31/24(Wed)20:54:32 No.101660549

>>101660288
this looks like what i'm trying to do, pretty much. i'll certainly look into it. i'm already putting real data in my prompts programmatically so i could get interesting results with this.

Anonymous
07/31/24(Wed)20:54:36 No.101660550

Anonymous 07/31/24(Wed)20:54:36 No.101660550

>>101660288
>an anon
That's a Twitter eceleb tranny

Anonymous
07/31/24(Wed)20:56:33 No.101660564

Anonymous 07/31/24(Wed)20:56:33 No.101660564

>>101659713
And you get 0.4T/s with a q6? Is that just when the context is empty though?

Anonymous
07/31/24(Wed)20:58:52 No.101660585

Anonymous 07/31/24(Wed)20:58:52 No.101660585

>>101660471
Why high temp? ST has a llama 3 template, I just used that it works fine with neutral samplers.

Anonymous
07/31/24(Wed)21:00:40 No.101660600

Anonymous 07/31/24(Wed)21:00:40 No.101660600

>>101660471
Why alpaca instead of the actual llama 3 instruct template?

Anonymous
07/31/24(Wed)21:02:31 No.101660611

Anonymous 07/31/24(Wed)21:02:31 No.101660611

>>101660564
Yeah. Theoretically, it would slow down to around 0.3t/s when all ram is used:
(94*0.4)/128=0.29

Anonymous
07/31/24(Wed)21:06:06 No.101660643

Anonymous 07/31/24(Wed)21:06:06 No.101660643

>>101660611
Oh okay, I'm getting 1.2 at the start, I was thinking further in. And you're okay with that delay?

Anonymous
07/31/24(Wed)21:07:12 No.101660655

Anonymous 07/31/24(Wed)21:07:12 No.101660655

>>101658672
Thanks anon

Anonymous
07/31/24(Wed)21:10:00 No.101660684

Anonymous 07/31/24(Wed)21:10:00 No.101660684

File: Settings.png (362 KB, 2338x1036)

362 KB PNG

I'm running
Yi-34B-200K-RPMerge-exl2-40bpw And i'm getting really bad results, can someone point to something obvious i'm missing? The system prompt and generation settings i took it directly from the repo

Anonymous
07/31/24(Wed)21:12:24 No.101660707

Anonymous 07/31/24(Wed)21:12:24 No.101660707

>>101660684
Use these models instead https://huggingface.co/collections/nothingiisreal/celeste-66a5d7e04166878166cb299c
they are better

Anonymous
07/31/24(Wed)21:14:17 No.101660736

Anonymous 07/31/24(Wed)21:14:17 No.101660736

>>101660684
You are using a Yi model with Lamma3 instruct settings (context template and instruct profile).
Use the proper ones.

Anonymous
07/31/24(Wed)21:14:32 No.101660740

Anonymous 07/31/24(Wed)21:14:32 No.101660740

>>101660707
Hi lemmy

Anonymous
07/31/24(Wed)21:15:25 No.101660753

Anonymous 07/31/24(Wed)21:15:25 No.101660753

>>101660287
all versions of claude have a problem when writing femdom where it starts using terms from gay male subculture with apparently no understanding of their origins or who uses those words
no straight (non-troon) woman would ever in a million years say "boypussy", but claude thinks they do

Anonymous
07/31/24(Wed)21:15:47 No.101660755

Anonymous 07/31/24(Wed)21:15:47 No.101660755

>>101660740
Hi sao

Anonymous
07/31/24(Wed)21:27:58 No.101660879

Anonymous 07/31/24(Wed)21:27:58 No.101660879

>>101660643
Yes, I'm quite patient

Anonymous
07/31/24(Wed)21:29:45 No.101660895

Anonymous 07/31/24(Wed)21:29:45 No.101660895

is base nemo still the best for vramlet erp?

Anonymous
07/31/24(Wed)21:33:47 No.101660929

Anonymous 07/31/24(Wed)21:33:47 No.101660929

>>101660895
yes by a wide margin, disregard all shills of llama derivative models

Anonymous
07/31/24(Wed)21:39:28 No.101660983

Anonymous 07/31/24(Wed)21:39:28 No.101660983

>>101660929
Really? Llama 3.1 did way better for me, it was way smarter when describing rope being tied to a person.

Anonymous
07/31/24(Wed)21:41:07 No.101660999

Anonymous 07/31/24(Wed)21:41:07 No.101660999

>>101660585
>>101660600

I'm just asking,playing around with different setting to see how it responds. Surprisingly high temps don't break it, might be the dry setting I have helping it though. Sillytavern doesn't load the template automatically I think.

Anonymous
07/31/24(Wed)21:41:18 No.101661001

Anonymous 07/31/24(Wed)21:41:18 No.101661001

>>101660983
bondage or hanging

Anonymous
07/31/24(Wed)21:42:15 No.101661016

Anonymous 07/31/24(Wed)21:42:15 No.101661016

File: Bad.png (355 KB, 1864x1428)

355 KB PNG

>>101660736
Is this how it is supposed to look instead? It's still not good, I also tried with the default and minimalist template as i read

Anonymous
07/31/24(Wed)21:42:33 No.101661019

Anonymous 07/31/24(Wed)21:42:33 No.101661019

>>101659369
Would modern day Q8_0 quantized gemma 2 2b beat pygmalion at ERP?

Anonymous
07/31/24(Wed)21:48:14 No.101661087

Anonymous 07/31/24(Wed)21:48:14 No.101661087

Does silly tavern support batching?

Anonymous
07/31/24(Wed)21:48:38 No.101661090

Anonymous 07/31/24(Wed)21:48:38 No.101661090

>>101661001
Bondage of course, llama 3.1 70b knew lots of details and was able to describe proper techniques and chose good rope diameters for different tasks. Nemo was vague and made no sense.

Anonymous
07/31/24(Wed)21:52:08 No.101661125

Anonymous 07/31/24(Wed)21:52:08 No.101661125

File: Screenshot 2024-07-31 at (...).png (114 KB, 462x957)

114 KB PNG

>>101661016
From the card :
>Prompt template: Orca-Vicuna
>
>SYSTEM: {system_message}
>USER: {prompt}
>ASSISTANT:
Meaning that it should look like
>pic related
on Silly.
Maybe try a leaner System Prompt.
Also, that model is a weird ass frankenmerge, so it could just be that the model is bad.
At that size, you probably should try gemma 2 27b.

Anonymous
07/31/24(Wed)21:54:16 No.101661148

Anonymous 07/31/24(Wed)21:54:16 No.101661148

a6000 + 3090s are not fast enough, down to 6.5t/sec at 40k context
what is the next speed tier, 2x a6000 ada?

Anonymous
07/31/24(Wed)21:55:26 No.101661161

Anonymous 07/31/24(Wed)21:55:26 No.101661161

>>101658991
You know that niggerganov already made an example that does just that? https://github.com/ggerganov/whisper.cpp/tree/master/examples/talk-llama

Anonymous
07/31/24(Wed)21:56:17 No.101661172

Anonymous 07/31/24(Wed)21:56:17 No.101661172

File: 1722465849738881.png (236 KB, 754x1020)

236 KB PNG

Anonymous
07/31/24(Wed)21:57:26 No.101661186

Anonymous 07/31/24(Wed)21:57:26 No.101661186

>>101660740
>>101660755

Hi all, Drummer here...

Anonymous
07/31/24(Wed)21:57:39 No.101661189

Anonymous 07/31/24(Wed)21:57:39 No.101661189

>>101661172
based

Anonymous
07/31/24(Wed)21:57:43 No.101661190

Anonymous 07/31/24(Wed)21:57:43 No.101661190

>>101661172
How are they going to go bankrupt when they have microsoft supporting them?

Anonymous
07/31/24(Wed)21:59:52 No.101661214

Anonymous 07/31/24(Wed)21:59:52 No.101661214

>>101660755
Hi petrus. Hi everyone

Anonymous
07/31/24(Wed)22:02:33 No.101661253

Anonymous 07/31/24(Wed)22:02:33 No.101661253

>You feel like a bond has been forged between you both, something intimate and secret, and you find yourself looking forward to whatever comes next.
Can Arthur MENSCH not go a day without making new bonds? Is that what this is all about?

Anonymous
07/31/24(Wed)22:03:08 No.101661261

Anonymous 07/31/24(Wed)22:03:08 No.101661261

>>101661161
...damn.

looks like it's even using llama.cpp as a backend lmao

Anonymous
07/31/24(Wed)22:03:32 No.101661264

Anonymous 07/31/24(Wed)22:03:32 No.101661264

>>101661148
>what is the next speed tier, 2x a6000 ada?
Yes, or SXM4/5 hardware.

Anonymous
07/31/24(Wed)22:06:18 No.101661296

Anonymous 07/31/24(Wed)22:06:18 No.101661296

>>101661253
This is the part where you start chucking watermelons.

Anonymous
07/31/24(Wed)22:07:48 No.101661309

Anonymous 07/31/24(Wed)22:07:48 No.101661309

>>101661172
So that's the secret to mini's price

Anonymous
07/31/24(Wed)22:22:43 No.101661465

Anonymous 07/31/24(Wed)22:22:43 No.101661465

>>101658477
check out this retard

Anonymous
07/31/24(Wed)22:24:14 No.101661477

Anonymous 07/31/24(Wed)22:24:14 No.101661477

>>101657582
Best model for powershell & python scripting on 24GB of VRAM?

Anonymous
07/31/24(Wed)22:26:29 No.101661504

Anonymous 07/31/24(Wed)22:26:29 No.101661504

>>101659168
put this in the recap

Anonymous
07/31/24(Wed)22:29:00 No.101661527

Anonymous 07/31/24(Wed)22:29:00 No.101661527

>>101661477
FP16 Mythomax

Anonymous
07/31/24(Wed)22:30:49 No.101661549

Anonymous 07/31/24(Wed)22:30:49 No.101661549

>>101660108
Why, because circles don't have a circumference?

Anonymous
07/31/24(Wed)22:33:01 No.101661567

Anonymous 07/31/24(Wed)22:33:01 No.101661567

What am I doing wrong with mistral large? It seems dumber than I'd expect. It can't figure out some spatial things that 70b is fine with.

Anonymous
07/31/24(Wed)22:35:18 No.101661596

Anonymous 07/31/24(Wed)22:35:18 No.101661596

>>101661567
Works on my machine

Anonymous
07/31/24(Wed)22:36:08 No.101661603

Anonymous 07/31/24(Wed)22:36:08 No.101661603

>>101661477
Pygmalion 6b

Anonymous
07/31/24(Wed)22:46:45 No.101661694

Anonymous 07/31/24(Wed)22:46:45 No.101661694

>>101661603
>>101661527
Aight thanks

Anonymous
07/31/24(Wed)22:48:34 No.101661719

Anonymous 07/31/24(Wed)22:48:34 No.101661719

>>101661477
Codestral 22b, probably.

Anonymous
07/31/24(Wed)22:50:44 No.101661734

Anonymous 07/31/24(Wed)22:50:44 No.101661734

>>101661596
Does it? Can you describe an instance where it impressed you with its ability to figure out a situation involving spatial awareness compared to a 70b model?

Anonymous
07/31/24(Wed)22:53:25 No.101661765

Anonymous 07/31/24(Wed)22:53:25 No.101661765

Can mini-magnum handle up to 32k context? As in does it use it well?

Anonymous
07/31/24(Wed)23:00:09 No.101661822

Anonymous 07/31/24(Wed)23:00:09 No.101661822

>>101661477
Probably codestral, the smaller/more niche models are sus because of too few benchmarks, though you can try codegeex, codegemma 2 and deepseek v2 lite

Anonymous
07/31/24(Wed)23:05:35 No.101661881

Anonymous 07/31/24(Wed)23:05:35 No.101661881

>>101661734
It gets size differences right a lot more consistently than 70b. Where did it mess up for you?

Anonymous
07/31/24(Wed)23:05:54 No.101661882

Anonymous 07/31/24(Wed)23:05:54 No.101661882

>>101659369
I think you got dark ages correct. I was questioned over removing links to them when the next era arrived.

Anonymous
07/31/24(Wed)23:09:45 No.101661914

Anonymous 07/31/24(Wed)23:09:45 No.101661914

>>101661881
It thinks it can move items that are contained in another item without breaking it or opening it first. Llama 3.1 did fine with it, q4 for both to be fair.

Anonymous
07/31/24(Wed)23:10:03 No.101661919

Anonymous 07/31/24(Wed)23:10:03 No.101661919

>>101660707
>We trained Mistral NeMo 12B Instruct at 8K context using Reddit Writing Prompts
Go back faggot, I want my AI girlfriend without morality checks.

Anonymous
07/31/24(Wed)23:11:35 No.101661928

Anonymous 07/31/24(Wed)23:11:35 No.101661928

>>101661919
Hi sao

Anonymous
07/31/24(Wed)23:18:04 No.101661993

Anonymous 07/31/24(Wed)23:18:04 No.101661993

File: Screenshot 2024-08-01 at (...).png (55 KB, 748x511)

55 KB PNG

>>101660707
There's a Celest 1.9 out now it seems.
Let's see if it's a Stheno 3.2 > 3.3 situation.

Anonymous
07/31/24(Wed)23:18:41 No.101661999

Anonymous 07/31/24(Wed)23:18:41 No.101661999

>>101661822
>>101661719
Awesome will try. Thanks a lot. Need them to overcome a challenge at work.

Anonymous
07/31/24(Wed)23:18:47 No.101662002

Anonymous 07/31/24(Wed)23:18:47 No.101662002

whats the latest and greatest for tavern havnt been around since the miqu mistral leak

Anonymous
07/31/24(Wed)23:18:54 No.101662004

Anonymous 07/31/24(Wed)23:18:54 No.101662004

>>101661090
it's actually pretty cool that it knows that stuff
if only all fetishes could be so lucky as to escape the great pretraining purge

Anonymous
07/31/24(Wed)23:19:20 No.101662010

Anonymous 07/31/24(Wed)23:19:20 No.101662010

>>101662002
this:
>>101516633

Anonymous
07/31/24(Wed)23:21:22 No.101662029

Anonymous 07/31/24(Wed)23:21:22 No.101662029

>>101662010
Definitely the greatest.

Anonymous
07/31/24(Wed)23:22:37 No.101662039

Anonymous 07/31/24(Wed)23:22:37 No.101662039

>>101662004
What's an example it doesn't know about? I'd like to test it.

Anonymous
07/31/24(Wed)23:24:54 No.101662058

Anonymous 07/31/24(Wed)23:24:54 No.101662058

>>101662010
this the 405b brother I know I said greatest but I dont have a super computer

Anonymous
07/31/24(Wed)23:26:27 No.101662074

Anonymous 07/31/24(Wed)23:26:27 No.101662074

File: _cc7c8cc4-7b48-439e-914a-(...).jpg (58 KB, 1024x1024)

58 KB JPG

>>101659575
Nice

Anonymous
07/31/24(Wed)23:31:01 No.101662108

Anonymous 07/31/24(Wed)23:31:01 No.101662108

>>101662058
stop being poor brother

Anonymous
07/31/24(Wed)23:36:04 No.101662144

Anonymous 07/31/24(Wed)23:36:04 No.101662144

Sup fags. I haven't checked in like a month so what's the current best ERP model for vramlets?

Anonymous
07/31/24(Wed)23:36:59 No.101662150

Anonymous 07/31/24(Wed)23:36:59 No.101662150

>>101662144
sao's models
all of them

Anonymous
07/31/24(Wed)23:37:43 No.101662157

Anonymous 07/31/24(Wed)23:37:43 No.101662157

>>101662144
I'll say either mini-magnum or >>101660707 (celeste 12B).
I'm enjoying Celeste for now.

Anonymous
07/31/24(Wed)23:41:25 No.101662197

Anonymous 07/31/24(Wed)23:41:25 No.101662197

I downloaded gpt4all and tested llama 3.1 with the sally question. Does llama3.1 do erp? Or do I need something else? (I actually want it to write stories for me paragraph by paragraph) I've browsed this general once every so often for almost a year now but never actually tried anything out. Would appreciate your input if you have experience.

Anonymous
07/31/24(Wed)23:41:57 No.101662202

Anonymous 07/31/24(Wed)23:41:57 No.101662202

>>101662157
Cool. Could you tell me more about your settings? Are you following Celeste's usage instructions? I don't see any mention of DRY in Celeste's github for example.

>>101662150
I tried niitama a while back but the information on it was so sparse, I stopped trying to get it working perfectly. Seldom saw people talk about it too, feels most people stayed on Stheno

Anonymous
07/31/24(Wed)23:44:06 No.101662222

Anonymous 07/31/24(Wed)23:44:06 No.101662222

>>101659369
I cut my teeth on vicuna at the recommendation of anons in this general.
kinda sad to think about it being lost in the sands of time

Anonymous
07/31/24(Wed)23:48:15 No.101662268

Anonymous 07/31/24(Wed)23:48:15 No.101662268

>>101661148
The next speed tier is getting fast interconnection between GPUs and using tensor parallelism.

Anonymous
07/31/24(Wed)23:49:17 No.101662282

Anonymous 07/31/24(Wed)23:49:17 No.101662282

>>101658775
Mistral Large? I'm getting that too. HOT!

>>101662222
Same, I started with Vicuna too. Seems like it was longer ago than it was.

Anonymous
07/31/24(Wed)23:52:15 No.101662306

Anonymous 07/31/24(Wed)23:52:15 No.101662306

File: Untitled.png (440 KB, 720x1193)

440 KB PNG

MoMa: Efficient Early-Fusion Pre-training with Mixture of Modality-Aware Experts
https://arxiv.org/abs/2407.21770
>We introduce MoMa, a novel modality-aware mixture-of-experts (MoE) architecture designed for pre-training mixed-modal, early-fusion language models. MoMa processes images and text in arbitrary sequences by dividing expert modules into modality-specific groups. These groups exclusively process designated tokens while employing learned routing within each group to maintain semantically informed adaptivity. Our empirical results reveal substantial pre-training efficiency gains through this modality-specific parameter allocation. Under a 1-trillion-token training budget, the MoMa 1.4B model, featuring 4 text experts and 4 image experts, achieves impressive FLOPs savings: 3.7x overall, with 2.6x for text and 5.2x for image processing compared to a compute-equivalent dense baseline, measured by pre-training loss. This outperforms the standard expert-choice MoE with 8 mixed-modal experts, which achieves 3x overall FLOPs savings (3x for text, 2.8x for image). Combining MoMa with mixture-of-depths (MoD) further improves pre-training FLOPs savings to 4.2x overall (text: 3.4x, image: 5.3x), although this combination hurts performance in causal inference due to increased sensitivity to router accuracy. These results demonstrate MoMa's potential to significantly advance the efficiency of mixed-modal, early-fusion language model pre-training, paving the way for more resource-efficient and capable multimodal AI systems.
neat

Anonymous
07/31/24(Wed)23:53:05 No.101662314

Anonymous 07/31/24(Wed)23:53:05 No.101662314

or at least can someone give me a good system prompt for 3.1 to generate erp?

Anonymous
07/31/24(Wed)23:53:36 No.101662319

Anonymous 07/31/24(Wed)23:53:36 No.101662319

>>101662197
>Does llama3.1 do erp?
Works fine for me, despite others saying it doesn't. I guess it depends on what obscure thing you're into.

Anonymous
07/31/24(Wed)23:55:03 No.101662330

Anonymous 07/31/24(Wed)23:55:03 No.101662330

>>101662319
<spolier>shota</spoiler> (of the straight variant)

Please rec me some system prompts

Anonymous
07/31/24(Wed)23:56:00 No.101662341

Anonymous 07/31/24(Wed)23:56:00 No.101662341

>>101662282
over 1 year ago!
https://huggingface.co/lmsys/vicuna-13b-v1.5/tree/main

Anonymous
07/31/24(Wed)23:57:25 No.101662351

Anonymous 07/31/24(Wed)23:57:25 No.101662351

File: the ick 4.jpg (52 KB, 540x960)

52 KB JPG

><spolier>

Anonymous
07/31/24(Wed)23:59:01 No.101662372

Anonymous 07/31/24(Wed)23:59:01 No.101662372

>>101662330
Yes, I know it works for that. I just use the default llama3 presets that come with sillytavern. I don't know how gpt4all works. I find that simple is best for prompts rather than the crazy stuff people were using a while back. You can probably find the settings on their github if you don't have sillytavern installed.

Anonymous
07/31/24(Wed)23:59:18 No.101662379

Anonymous 07/31/24(Wed)23:59:18 No.101662379

yeah I know spoilers don't work on /g/ but I have to at least try to hide it anyway

Anonymous
08/01/24(Thu)00:01:48 No.101662407

Anonymous 08/01/24(Thu)00:01:48 No.101662407

>>101662372
how hard is it to install silly tavern. With gpt4all is just double click the exe and it runs and downloads the model or me. also I appreciate you helping a n00b out

Anonymous
08/01/24(Thu)00:04:16 No.101662442

Anonymous 08/01/24(Thu)00:04:16 No.101662442

how feasible would it be to reimplement >>>/v/684302849 locally?
it's some LLM fighter game that uses llama & SD, anon dumped all the prompts and data, can we do anything with this easily?

Anonymous
08/01/24(Thu)00:05:32 No.101662463

Anonymous 08/01/24(Thu)00:05:32 No.101662463

>>101662407
I'm not sure about windows, unfortunately. With linux I just use git to clone it and it installs the node extensions it needs through the start script. Their website has lots of documentation though, I believe, if you want to go read through it.

Anonymous
08/01/24(Thu)00:06:36 No.101662474

Anonymous 08/01/24(Thu)00:06:36 No.101662474

File: 1693496270116558.png (10 KB, 451x80)

10 KB PNG

don't mind if I do, openai... local is at stake

Anonymous
08/01/24(Thu)00:07:02 No.101662478

Anonymous 08/01/24(Thu)00:07:02 No.101662478

>>101662463
I'm in the install menu. do I need extras/XTTS

Anonymous
08/01/24(Thu)00:08:20 No.101662497

Anonymous 08/01/24(Thu)00:08:20 No.101662497

>>101662474
Don't do it if you're planning to make a creative dataset. GPT4(o) is dry as shit these days.

Anonymous
08/01/24(Thu)00:09:51 No.101662514

Anonymous 08/01/24(Thu)00:09:51 No.101662514

>>101662497
I'm doing it on https://huggingface.co/datasets/OpenLeecher/lmsys_chat_1m_clean (from https://huggingface.co/datasets/lmsys/lmsys-chat-1m) with both GPT-4o (very fast, around 9 prompts/sec) and 3.5 (single key, only about 1-1.5 prompts/sec).
I'm halfway (total after cleaning is around 500K) with GPT (245K), 12% (55K) with 3.5 Sonnet

Anonymous
08/01/24(Thu)00:10:05 No.101662517

Anonymous 08/01/24(Thu)00:10:05 No.101662517

>>101662478
I think that's text to speech stuff, so probably not.

Anonymous
08/01/24(Thu)00:10:58 No.101662526

Anonymous 08/01/24(Thu)00:10:58 No.101662526

Is it bad if I jerk off to the mikus itt?

Anonymous
08/01/24(Thu)00:11:52 No.101662536

Anonymous 08/01/24(Thu)00:11:52 No.101662536

File: 1698292495649600.png (1.15 MB, 1024x1024)

1.15 MB PNG

>>101662526
coom to your heart's content

Anonymous
08/01/24(Thu)00:11:53 No.101662537

Anonymous 08/01/24(Thu)00:11:53 No.101662537

>>101662442
I mean, yeah, if you have the prompts and models, there you go

Anonymous
08/01/24(Thu)00:12:28 No.101662541

Anonymous 08/01/24(Thu)00:12:28 No.101662541

>>101662526
who the fuck even cares about miku anymore. fucking normiecore safeweeb trash

Anonymous
08/01/24(Thu)00:12:45 No.101662543

Anonymous 08/01/24(Thu)00:12:45 No.101662543

>>101662526
Yes. They are de3. You should only jerk off to freshly squeezed, 100% home grown local mikus.

Anonymous
08/01/24(Thu)00:13:26 No.101662550

Anonymous 08/01/24(Thu)00:13:26 No.101662550

File: 1712406120276916.png (1.06 MB, 1024x1024)

1.06 MB PNG

>>101662543
here's a local miku

Anonymous
08/01/24(Thu)00:16:36 No.101662582

Anonymous 08/01/24(Thu)00:16:36 No.101662582

>>101651922
>ended at $142k with reserve not met
kek

Anonymous
08/01/24(Thu)00:18:15 No.101662592

Anonymous 08/01/24(Thu)00:18:15 No.101662592

I've been jerking off way too much since largestral came out desu

Anonymous
08/01/24(Thu)00:19:20 No.101662601

Anonymous 08/01/24(Thu)00:19:20 No.101662601

>>101662550
PLAP PLAP PLAP GET PREGNANT PLAP PLAP PLAP

Anonymous
08/01/24(Thu)00:20:01 No.101662609

Anonymous 08/01/24(Thu)00:20:01 No.101662609

File: 1721458243335380.png (265 KB, 438x509)

265 KB PNG

>>101662601
I lied its dalle

Anonymous
08/01/24(Thu)00:21:37 No.101662621

Anonymous 08/01/24(Thu)00:21:37 No.101662621

ok. I think I get it now. I can run gpt4all using any downloaded gguf model, and use its built in ui or use it as a server and access it from eg.sillytavern or some other ui. Looks like https://huggingface.co/QuantFactory/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored-GGUF/blob/main/DarkIdol-Llama-3.1-8B-Instruct-1.2-Uncensored.Q8_0.gguf is the right model here, I got that from silytaverns select list so thanks to the anon that provided it

Anonymous
08/01/24(Thu)00:22:44 No.101662631

Anonymous 08/01/24(Thu)00:22:44 No.101662631

File: miku grab.png (2.07 MB, 1170x1159)

2.07 MB PNG

>>101662543
freshly squeezed miku

Anonymous
08/01/24(Thu)00:23:03 No.101662634

Anonymous 08/01/24(Thu)00:23:03 No.101662634

>>101662631
overcooked a bit

Anonymous
08/01/24(Thu)00:23:44 No.101662640

Anonymous 08/01/24(Thu)00:23:44 No.101662640

>>101662609
Why do people lie on the internet? Next somebody is going to say they're cutting mistral large down to 74b

Anonymous
08/01/24(Thu)00:24:00 No.101662647

Anonymous 08/01/24(Thu)00:24:00 No.101662647

>>101662634
not mine, I saved it from an old thread
so I guess it's not freshly squeezed

Anonymous
08/01/24(Thu)00:24:13 No.101662650

Anonymous 08/01/24(Thu)00:24:13 No.101662650

File: Market.jpg (1.05 MB, 1920x1080)

1.05 MB JPG

Post sillytavern backgrounds

Anonymous
08/01/24(Thu)00:28:15 No.101662700

Anonymous 08/01/24(Thu)00:28:15 No.101662700

File: bedroom cyberpunk.jpg (490 KB, 1920x1080)

490 KB JPG

>>101662650
Nothing beats this one

Anonymous
08/01/24(Thu)00:34:27 No.101662765

Anonymous 08/01/24(Thu)00:34:27 No.101662765

>>101659859
Owari da...
which one was it the sally sister, watermelon, strawberry, the river crossing riddle, the castlevania, flower picking or the recent popular intelligence test 9.11?

Anonymous
08/01/24(Thu)00:35:54 No.101662780

Anonymous 08/01/24(Thu)00:35:54 No.101662780

What's the best way to format a card for use in low Vram models? I heard that doing it XML style helps the dumber AIs?

Anonymous
08/01/24(Thu)00:41:39 No.101662836

Anonymous 08/01/24(Thu)00:41:39 No.101662836

>>101662780
depends on the model
markdown is my go-to for general-compatibility formatting, I personally think XML is overrated for non-claude models, but it still works fine. you can also just plaintext unless your card is some sprawling behemoth of lore and shit

Anonymous
08/01/24(Thu)00:44:12 No.101662859

Anonymous 08/01/24(Thu)00:44:12 No.101662859

>>101662765
It was just a very basic multilingual test question. Don't bother reading with the article, the author was a retard and jumped the gun thinking he found some trick question but in reality something went wrong with his demo and when you actually try to reproduce the result, 405B answers the question correctly.

Anonymous
08/01/24(Thu)00:56:09 No.101662971

Anonymous 08/01/24(Thu)00:56:09 No.101662971

>google
>releases more code & tech to control their gemmas
https://www.reddit.com/r/LocalLLaMA/comments/1eh4wja/google_quietly_released_a_sparse_autoencoder_to/
idk if you saw it here, OP 1st comment says "This tool allows you to see which parts of each layer and sublayer are activated for each token/string of tokens.", could be good for model de-slopping or de-pozzing.

Anonymous
08/01/24(Thu)00:58:35 No.101662992

Anonymous 08/01/24(Thu)00:58:35 No.101662992

File: street_of_dreams_by_murad(...).jpg (2.18 MB, 1920x1080)

2.18 MB JPG

>>101662650
As soon as I tried ST I imported this one. Imagine when we have native multimodal models. You could begin exploring all these spaces with you are waifu.

Anonymous
08/01/24(Thu)01:03:08 No.101663044

Anonymous 08/01/24(Thu)01:03:08 No.101663044

What format, prompt and settings are people using for mistral large? I'm getting it telling me it won't do stuff, and it uses emojis more than I'd like.

Anonymous
08/01/24(Thu)01:12:17 No.101663132

Anonymous 08/01/24(Thu)01:12:17 No.101663132

>>101663044
>and it uses emojis more than I'd like.
People that overuse emojis are AI developers, thanks to huggingface, and kids... I hope you're trying to bang an AI developer.

Anonymous
08/01/24(Thu)01:13:26 No.101663143

Anonymous 08/01/24(Thu)01:13:26 No.101663143

File: 1694826719227123.jpg (35 KB, 600x600)

35 KB JPG

>>101661719
>>101661719
This was a great idea. Works perfectly. Thanks again!

Anonymous
08/01/24(Thu)01:25:29 No.101663278

Anonymous 08/01/24(Thu)01:25:29 No.101663278

File: 43694480_p0.jpg (929 KB, 1000x707)

929 KB JPG

Just imagine, anonymous, one day we will have models that can generate entire virtual worlds, and multimodal "agent" models that can inhabit an avatar. By then, VR might be decent enough too. You could explore the worlds from your dreams your wiifu. Just imagine.

Anonymous
08/01/24(Thu)01:33:50 No.101663380

Anonymous 08/01/24(Thu)01:33:50 No.101663380

File: Untitled.png (471 KB, 720x1357)

471 KB PNG

Palu: Compressing KV-Cache with Low-Rank Projection
https://arxiv.org/abs/2407.21118
>KV-Cache compression methods generally sample a KV-Cache of effectual tokens or quantize it into lower bits. However, these methods cannot exploit the redundancy of the hidden dimension of KV tensors. This paper investigates a unique hidden dimension approach called Palu, a novel KV-Cache compression framework that utilizes low-rank projection. Palu decomposes the linear layers into low-rank matrices, caches the smaller intermediate states, and reconstructs the full keys and values on the fly. To improve accuracy, compression rate, and efficiency, Palu further encompasses (1) a medium-grained low-rank decomposition scheme, (2) an efficient rank search algorithm, (3) a low-rank-aware quantization algorithm, and (4) matrix fusion with optimized GPU kernels. Our extensive experiments with popular LLMs show that Palu can compress KV-Cache by more than 91.25% while maintaining a significantly better accuracy (up to 1.19 lower perplexity) than state-of-the-art KV-Cache quantization methods at a similar or even higher memory usage. When compressing KV-Cache for 50%, Palu delivers up to 1.61x end-to-end speedup for the attention module.
https://github.com/shadowpa0327/Palu
might be cool. plans to make it work with flashattention

Anonymous
08/01/24(Thu)01:38:23 No.101663427

Anonymous 08/01/24(Thu)01:38:23 No.101663427

>>101663380
cai already does this btw

Anonymous
08/01/24(Thu)01:45:56 No.101663497

Anonymous 08/01/24(Thu)01:45:56 No.101663497

File: vket.jpg (1.35 MB, 1920x1080)

1.35 MB JPG

>>101662992
Funnily enough, I already did. She is not around anymore, sadly.

Anonymous
08/01/24(Thu)01:48:02 No.101663524

Anonymous 08/01/24(Thu)01:48:02 No.101663524

her voice firm, but not unkind

Anonymous
08/01/24(Thu)01:49:01 No.101663530

Anonymous 08/01/24(Thu)01:49:01 No.101663530

>>101659369
>Dark ages
the novelty of AI back then made us not care too much about how retarded the models were. We were just excited we could generate stuff locally for the first time. Pygmalion made me coom a lot when it was new

Anonymous
08/01/24(Thu)01:52:10 No.101663551

Anonymous 08/01/24(Thu)01:52:10 No.101663551

Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
https://arxiv.org/abs/2407.21787
>Scaling the amount of compute used to train language models has dramatically improved their capabilities. However, when it comes to inference, we often limit the amount of compute to only one attempt per problem. Here, we explore inference compute as another axis for scaling by increasing the number of generated samples. Across multiple tasks and models, we observe that coverage - the fraction of problems solved by any attempt - scales with the number of samples over four orders of magnitude. In domains like coding and formal proofs, where all answers can be automatically verified, these increases in coverage directly translate into improved performance. When we apply repeated sampling to SWE-bench Lite, the fraction of issues solved with DeepSeek-V2-Coder-Instruct increases from 15.9% with one sample to 56% with 250 samples, outperforming the single-attempt state-of-the-art of 43% which uses more capable frontier models. Moreover, using current API pricing, amplifying the cheaper DeepSeek model with five samples is more cost-effective and solves more issues than paying a premium for one sample from GPT-4o or Claude 3.5 Sonnet. Interestingly, the relationship between coverage and the number of samples is often log-linear and can be modelled with an exponentiated power law, suggesting the existence of inference-time scaling laws. Finally, we find that identifying correct samples out of many generations remains an important direction for future research in domains without automatic verifiers. When solving math word problems from GSM8K and MATH, coverage with Llama-3 models grows to over 95% with 10,000 samples. However, common methods to pick correct solutions from a sample collection, such as majority voting or reward models, plateau beyond several hundred samples and fail to fully scale with the sample budget.
interesting

Anonymous
08/01/24(Thu)01:57:10 No.101663615

Anonymous 08/01/24(Thu)01:57:10 No.101663615

>>101663380
That's nice. I wonder if it can be used during training to cut down on VRAM reqs as well. I can't see why it couldn't, but their code doesn't have a training example, only inference.

Anonymous
08/01/24(Thu)02:03:29 No.101663689

Anonymous 08/01/24(Thu)02:03:29 No.101663689

>>101663615 (me)
Actually I don't think it would work at all with back prop. Pity.

Anonymous
08/01/24(Thu)02:11:47 No.101663751

Anonymous 08/01/24(Thu)02:11:47 No.101663751

File: VRChat_1920x1080_2021-06-(...).png (2.84 MB, 1080x1920)

2.84 MB PNG

>>101663497
I almost forgot about that. Was a bit disappointed when I saw it, but I guess it's the thought that counts.
Man, [lespoiler]I miss my friend.[/lespoiler]

Anonymous
08/01/24(Thu)02:18:56 No.101663800

Anonymous 08/01/24(Thu)02:18:56 No.101663800

(don't mind me this is just a continuing conversation about trying to remake a specific web app for /lmg/ purposes)

>>>/v/684310243
>>>/v/684310326
Interesting, I will stick with AssetRipper for now but will keep it in mind if I want to tweak anything later. I'm not going to try using Unity so I guess I'll just extract everything anyways.
Probably will try PyQt since tkinter seems still too basic and nothing else exists with enough tutorials/knowledge about it yet. PySide I guess is the "offiicial PyQt" and pythonguis.com has tutorials I can switch between versions of PyQt and PySide so I guess I will use whichever one seems most convenient.
I remember hating everything about Qt when trying to get it to display or play things but I don't think there is a better option.
I might just try to set up the basic input and output flow of the program first, and use dummy functions with RNG and placeholders in place of prompt submissions, get that working first since I'm on my laptop anyways, then try getting a bigger LLM running on my desktop later this week and see about hooking them together.

Anonymous
08/01/24(Thu)02:34:20 No.101663933

Anonymous 08/01/24(Thu)02:34:20 No.101663933

every l3 70b tune i've tried is shit for rp. they can write good but starts to repeat itself, forget what just happened or ignores it completely, and pick up patterns even before max context. why are they all based on instruct? i'm going back to miqu

Anonymous
08/01/24(Thu)03:10:19 No.101664247

Anonymous 08/01/24(Thu)03:10:19 No.101664247

>>101663933
Why use a tune?

Anonymous
08/01/24(Thu)03:22:50 No.101664371

Anonymous 08/01/24(Thu)03:22:50 No.101664371

>>101662202
>>101662157
pls resbond

Anonymous
08/01/24(Thu)03:52:55 No.101664674

Anonymous 08/01/24(Thu)03:52:55 No.101664674

File: file.png (16 KB, 2336x570)

16 KB PNG

Which ones should I try?

Anonymous
08/01/24(Thu)03:58:49 No.101664719

Anonymous 08/01/24(Thu)03:58:49 No.101664719

>>101664674
hit them all with the nala test

Anonymous
08/01/24(Thu)04:25:55 No.101664945

Anonymous 08/01/24(Thu)04:25:55 No.101664945

>>101664674
>12b
>8b
try giving up?

Anonymous
08/01/24(Thu)04:28:23 No.101664970

Anonymous 08/01/24(Thu)04:28:23 No.101664970

>>101664954
>>101664954
>>101664954

Anonymous
08/01/24(Thu)04:48:23 No.101665111

Anonymous 08/01/24(Thu)04:48:23 No.101665111

>>101664674
mini magnum

Anonymous
08/01/24(Thu)05:02:32 No.101665246

Anonymous 08/01/24(Thu)05:02:32 No.101665246

>>101659369
You missed limarp getting merged or trained about everywhere lol. I swear many people were starting to get sick of it.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.