/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 07/24/24(Wed)20:45:13 No.101560013

File: miku-hand-out+.jpg (236 KB, 584x1024)

236 KB JPG

/lmg/ - Local Models General Anonymous 07/24/24(Wed)20:45:13 No.101560013

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101556980 & >>101553102

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/24/24(Wed)20:45:54 No.101560019

Anonymous 07/24/24(Wed)20:45:54 No.101560019

File: 1685790540409069.jpg (197 KB, 1024x1024)

197 KB JPG

►Recent Highlights from the Previous Thread: >>101556980

--Mistal Large Instruct 2407 configuration discussion: >>101558047 >>101558099 >>101558116 >>101558134 >>101558149 >>101558154 >>101558583
--Llama 3.1 models removed from benchmark chart, L3.1 8B praised for accessibility and SOLV: >>101557528 >>101557550 >>101557594 >>101557614 >>101558284
--Hugging Face's profitability and sustainability: >>101557631 >>101557747 >>101557792 >>101557849 >>101558031
--OpenAI offers GPT-4 fine-tuning to tier 4 and 5 users: >>101558481
--Hiding timestamps and showing models in SillyTavern: >>101557105 >>101557166 >>101557243
--Anon shares their regret after paying for Claude and another user shares their experience with multiple AI models: >>101557317 >>101557453
--Mistral Large 2 (2407) performance and prompt template impact: >>101557016 >>101557018 >>101557033 >>101557144 >>101557127
--Logs: Anon shares a chatlog generated by large 2: >>101558282
--Logs: Zhanglii presents a humorous origin story for an internet slang phrase, "Dicks out for Harambe," leading to a lighthearted conversation with Johngi.: >>101557207
--Ollama guy fixes llama 3.1 rope scaling factors: >>101557334
--Llama version 3 naming controversy: >>101558609 >>101558613 >>101558627 >>101558636 >>101558649
--Discussion about cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf: >>101558208 >>101558545 >>101558600 >>101558652 >>101558563 >>101558574
--Availability and potential issues of new Mistral quants: >>101557744 >>101557771 >>101557891
--hfchat.py script and desire for Mistral Large model: >>101559118 >>101559135 >>101559156
--Comparing 200B parameter models with the human brain: >>101557573 >>101557623 >>101557675 >>101557720 >>101557751 >>101557788 >>101557976 >>101558119 >>101558171 >>101558393 >>101558585 >>101558608 >>101558686 >>101558438 >>101558446
--Miku (free space): >>101557331

►Recent Highlight Posts from the Previous Thread: >>101556983

Anonymous
07/24/24(Wed)20:50:44 No.101560063

Anonymous 07/24/24(Wed)20:50:44 No.101560063

>>101560019
What a shit recap.

Anonymous
07/24/24(Wed)20:54:45 No.101560116

Anonymous 07/24/24(Wed)20:54:45 No.101560116

>>101560063
rude

Anonymous
07/24/24(Wed)20:55:22 No.101560123

Anonymous 07/24/24(Wed)20:55:22 No.101560123

>>101560019
recap anon... are you okay..? you've fallen off...

Anonymous
07/24/24(Wed)20:56:56 No.101560137

Anonymous 07/24/24(Wed)20:56:56 No.101560137

>>101560123
explain step by step what is wrong with the recap

Anonymous
07/24/24(Wed)20:57:46 No.101560145

Anonymous 07/24/24(Wed)20:57:46 No.101560145

>>101560019
we've complained so much about companies making slop that we failed to realize we are the slop

Anonymous
07/24/24(Wed)21:02:44 No.101560195

Anonymous 07/24/24(Wed)21:02:44 No.101560195

>>101560019
I like this Llamiku

Anonymous
07/24/24(Wed)21:03:59 No.101560202

Anonymous 07/24/24(Wed)21:03:59 No.101560202

>>101560019
>SOLV
I've always been saying how utterly garbage these recaps have been for the past few months, but this really takes the cake.

Anonymous
07/24/24(Wed)21:04:17 No.101560203

Anonymous 07/24/24(Wed)21:04:17 No.101560203

>>101560145
that's really quite profound

Anonymous
07/24/24(Wed)21:06:12 No.101560219

Anonymous 07/24/24(Wed)21:06:12 No.101560219

largestral is like CR+ but smarter and better in every way
except their tokenizer which fucking sucks ass and infuriates me as I watch the word "craftsmanship" crawl across my screen split into 3 tokens at a 2t/s clip
but wow the sovl and practical intelligence in RP, this thing feels great

Sam Altman !!jRGzF58XtVw
07/24/24(Wed)21:07:19 No.101560227

Sam Altman !!jRGzF58XtVw 07/24/24(Wed)21:07:19 No.101560227

>>101560019
Your recaps are garbage and you're a loser, go touch grass.

Anonymous
07/24/24(Wed)21:07:51 No.101560232

Anonymous 07/24/24(Wed)21:07:51 No.101560232

>>101560219
If you want a dumber but much faster version Nemo is worth trying. Hopefully mistral ends up releasing a mid sized model.

Anonymous
07/24/24(Wed)21:08:40 No.101560240

Anonymous 07/24/24(Wed)21:08:40 No.101560240

>>101560219
do we have a new ERP king?

Anonymous
07/24/24(Wed)21:08:54 No.101560241

Anonymous 07/24/24(Wed)21:08:54 No.101560241

>>101560227
Woops, forgot to remove the trip. Ignore that.

Anonymous
07/24/24(Wed)21:11:50 No.101560260

Anonymous 07/24/24(Wed)21:11:50 No.101560260

>>101560232
>Mixtral-Instruct-2409
Though it has been relegated to "research" as per
https://mistral.ai/technology/#models
Seems MoE was a meme
>General purpose models
>Mistral Nemo
>Mistral Large 2
>Research models
>Mixtral
>Available in 8x7 and 8×22 sizes

Anonymous
07/24/24(Wed)21:12:25 No.101560268

Anonymous 07/24/24(Wed)21:12:25 No.101560268

>>101560019
by far your worst recap is months

Anonymous
07/24/24(Wed)21:13:24 No.101560276

Anonymous 07/24/24(Wed)21:13:24 No.101560276

>>101560268
The thread was also awful to be fair.

Anonymous
07/24/24(Wed)21:13:41 No.101560282

Anonymous 07/24/24(Wed)21:13:41 No.101560282

Does that Route LLM thing work with models on the local network? or is it either you have ollama running on the machine, or you use a service

Anonymous
07/24/24(Wed)21:18:20 No.101560331

Anonymous 07/24/24(Wed)21:18:20 No.101560331

>>101560260
>Seems MoE was a meme
MoE is a meme for open source model releases since barely anyone can afford to run them properly, you might as well release a dense model, after all you don't get paid for open sourced models so they're only as valuable as the audience that can use them
internally though,..

Anonymous
07/24/24(Wed)21:26:01 No.101560406

Anonymous 07/24/24(Wed)21:26:01 No.101560406

Has anyone tried Meta-Llama-3.1-70B-Instruct yet assistant? It's taking me 3 hours to download.

Anonymous
07/24/24(Wed)21:30:29 No.101560454

Anonymous 07/24/24(Wed)21:30:29 No.101560454

>>101560406
you can't run it

Anonymous
07/24/24(Wed)21:31:50 No.101560464

Anonymous 07/24/24(Wed)21:31:50 No.101560464

>>101560240
in my opinion yes, and it isn't particularly close either
before there were a few choices I would bounce back and forth on (CR+, wiz8x22, qwen/magnum) but this kind of smokes all of them tbdesu. sucks how slow it is but I hardly ever want to reroll with this thing, it just gets every single card I throw at it and clearly knows how to write smut.

Anonymous
07/24/24(Wed)21:34:03 No.101560498

Anonymous 07/24/24(Wed)21:34:03 No.101560498

Mistral won

Anonymous
07/24/24(Wed)21:34:38 No.101560514

Anonymous 07/24/24(Wed)21:34:38 No.101560514

>>101560454
With koboldcpp? Why not?

Anonymous
07/24/24(Wed)21:35:11 No.101560521

Anonymous 07/24/24(Wed)21:35:11 No.101560521

>have to run mistral large at 4bpw
no...

Anonymous
07/24/24(Wed)21:35:51 No.101560530

Anonymous 07/24/24(Wed)21:35:51 No.101560530

>>101560464
Is it true? Did we finally get Sonnet/Claude but at home? I've long since stopped using 4o but I still find myself using Claude/Sonnet 3.5 for *safe* shit. Did the french actually succeed?

Anonymous
07/24/24(Wed)21:35:55 No.101560531

Anonymous 07/24/24(Wed)21:35:55 No.101560531

>>101560514
at all

Anonymous
07/24/24(Wed)21:36:15 No.101560534

Anonymous 07/24/24(Wed)21:36:15 No.101560534

I tried vLLM's distributed inference thingy on a single PC by giving each instance one GPU and it run 40% slower than just using both GPUs on a single instance. Is there still hope of running Mistral Large through 2 PCs?

Anonymous
07/24/24(Wed)21:36:19 No.101560537

Anonymous 07/24/24(Wed)21:36:19 No.101560537

Llama 3.0 8B Q_0 KLD
>>101243361

Llama 3.1 8B Q8_0
====== Perplexity statistics ======
Mean PPL(Q) : 6.231377 ± 0.038219
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.99%
Mean ln(PPL(Q)/PPL(base)) : 0.001101 ± 0.000103
Mean PPL(Q)/PPL(base) : 1.001102 ± 0.000103
Mean PPL(Q)-PPL(base) : 0.006860 ± 0.000642

====== KL divergence statistics ======
Mean KLD: 0.000542 ± 0.000004
Maximum KLD: 0.347069
99.9% KLD: 0.014255
99.0% KLD: 0.003976
99.0% KLD: 0.003976
Median KLD: 0.000331
10.0% KLD: 0.000007
5.0% KLD: 0.000001
1.0% KLD: -0.000001
Minimum KLD: -0.000131

====== Token probability statistics ======
Mean Δp: -0.015 ± 0.002 %
Maximum Δp: 17.019%
99.9% Δp: 4.236%
99.0% Δp: 1.890%
95.0% Δp: 0.915%
90.0% Δp: 0.539%
75.0% Δp: 0.113%
Median Δp: -0.000%
25.0% Δp: -0.140%
10.0% Δp: -0.589%
5.0% Δp: -0.967%
1.0% Δp: -1.968%
0.1% Δp: -4.475%
Minimum Δp: -37.109%
RMS Δp : 0.669 ± 0.009 %
Same top p: 98.817 ± 0.029 %

1/5

Anonymous
07/24/24(Wed)21:36:50 No.101560547

Anonymous 07/24/24(Wed)21:36:50 No.101560547

>>101560530
for rp / creative uses it feels like claude

Anonymous
07/24/24(Wed)21:37:47 No.101560558

Anonymous 07/24/24(Wed)21:37:47 No.101560558

>>101560537

3.0 Q6_K KLD
>>101465239

3.1 Q6_K KLD
====== Perplexity statistics ======
Mean PPL(Q) : 6.248284 ± 0.038344
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.93%
Mean ln(PPL(Q)/PPL(base)) : 0.003811 ± 0.000223
Mean PPL(Q)/PPL(base) : 1.003818 ± 0.000224
Mean PPL(Q)-PPL(base) : 0.023766 ± 0.001403

====== KL divergence statistics ======
Mean KLD: 0.002956 ± 0.000023
Maximum KLD: 1.885702
99.9% KLD: 0.082009
99.0% KLD: 0.023328
99.0% KLD: 0.023328
Median KLD: 0.001754
10.0% KLD: 0.000038
5.0% KLD: 0.000008
1.0% KLD: 0.000000
Minimum KLD: -0.000037

====== Token probability statistics ======
Mean Δp: -0.069 ± 0.004 %
Maximum Δp: 35.571%
99.9% Δp: 9.052%
99.0% Δp: 4.206%
95.0% Δp: 2.002%
90.0% Δp: 1.179%
75.0% Δp: 0.245%
Median Δp: -0.000%
25.0% Δp: -0.336%
10.0% Δp: -1.387%
5.0% Δp: -2.280%
1.0% Δp: -4.755%
0.1% Δp: -11.668%
Minimum Δp: -84.220%
RMS Δp : 1.557 ± 0.021 %
Same top p: 97.287 ± 0.043 %

2/5

Anonymous
07/24/24(Wed)21:38:40 No.101560568

Anonymous 07/24/24(Wed)21:38:40 No.101560568

File: ButWhy.jpg (44 KB, 926x454)

44 KB JPG

>>101560531
Why nigga? It's not anything different.

Anonymous
07/24/24(Wed)21:38:47 No.101560571

Anonymous 07/24/24(Wed)21:38:47 No.101560571

>>101560558

Q4_K_M
====== Perplexity statistics ======
Mean PPL(Q) : 6.373491 ± 0.039159
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.56%
Mean ln(PPL(Q)/PPL(base)) : 0.023651 ± 0.000576
Mean PPL(Q)/PPL(base) : 1.023933 ± 0.000590
Mean PPL(Q)-PPL(base) : 0.148974 ± 0.003764

====== KL divergence statistics ======
Mean KLD: 0.020382 ± 0.000127
Maximum KLD: 3.642657
99.9% KLD: 0.598334
99.0% KLD: 0.172493
99.0% KLD: 0.172493
Median KLD: 0.011333
10.0% KLD: 0.000301
5.0% KLD: 0.000071
1.0% KLD: 0.000006
Minimum KLD: -0.000055

====== Token probability statistics ======
Mean Δp: -0.602 ± 0.011 %
Maximum Δp: 53.307%
99.9% Δp: 21.070%
99.0% Δp: 9.209%
95.0% Δp: 4.134%
90.0% Δp: 2.277%
75.0% Δp: 0.324%
Median Δp: -0.038%
25.0% Δp: -1.233%
10.0% Δp: -4.069%
5.0% Δp: -6.513%
1.0% Δp: -13.942%
0.1% Δp: -35.467%
Minimum Δp: -87.109%
RMS Δp : 4.027 ± 0.033 %
Same top p: 93.441 ± 0.065 %

3/5

Anonymous
07/24/24(Wed)21:39:48 No.101560585

Anonymous 07/24/24(Wed)21:39:48 No.101560585

>>101560571

Q4_K_S
====== Perplexity statistics ======
Mean PPL(Q) : 6.453672 ± 0.039692
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.33%
Mean ln(PPL(Q)/PPL(base)) : 0.036153 ± 0.000713
Mean PPL(Q)/PPL(base) : 1.036815 ± 0.000739
Mean PPL(Q)-PPL(base) : 0.229154 ± 0.004773

====== KL divergence statistics ======
Mean KLD: 0.030396 ± 0.000185
Maximum KLD: 5.778175
99.9% KLD: 0.845149
99.0% KLD: 0.249537
99.0% KLD: 0.249537
Median KLD: 0.017440
10.0% KLD: 0.000543
5.0% KLD: 0.000135
1.0% KLD: 0.000011
Minimum KLD: -0.000094

====== Token probability statistics ======
Mean Δp: -0.901 ± 0.013 %
Maximum Δp: 56.553%
99.9% Δp: 24.226%
99.0% Δp: 10.925%
95.0% Δp: 4.849%
90.0% Δp: 2.586%
75.0% Δp: 0.299%
Median Δp: -0.085%
25.0% Δp: -1.720%
10.0% Δp: -5.258%
5.0% Δp: -8.323%
1.0% Δp: -17.439%
0.1% Δp: -41.818%
Minimum Δp: -97.459%
RMS Δp : 4.938 ± 0.038 %
Same top p: 92.029 ± 0.072 %

4/5

Anonymous
07/24/24(Wed)21:40:46 No.101560599

Anonymous 07/24/24(Wed)21:40:46 No.101560599

tl;dr

Anonymous
07/24/24(Wed)21:40:58 No.101560604

Anonymous 07/24/24(Wed)21:40:58 No.101560604

>>101560585

Q4_0
====== Perplexity statistics ======
Mean PPL(Q) : 6.508124 ± 0.039897
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.00%
Mean ln(PPL(Q)/PPL(base)) : 0.044555 ± 0.000867
Mean PPL(Q)/PPL(base) : 1.045563 ± 0.000906
Mean PPL(Q)-PPL(base) : 0.283606 ± 0.005784

====== KL divergence statistics ======
Mean KLD: 0.044612 ± 0.000258
Maximum KLD: 4.925312
99.9% KLD: 1.253007
99.0% KLD: 0.373860
99.0% KLD: 0.373860
Median KLD: 0.025912
10.0% KLD: 0.000735
5.0% KLD: 0.000174
1.0% KLD: 0.000017
Minimum KLD: -0.000004

====== Token probability statistics ======
Mean Δp: -1.243 ± 0.016 %
Maximum Δp: 77.098%
99.9% Δp: 27.113%
99.0% Δp: 12.972%
95.0% Δp: 5.607%
90.0% Δp: 2.940%
75.0% Δp: 0.312%
Median Δp: -0.109%
25.0% Δp: -2.224%
10.0% Δp: -6.750%
5.0% Δp: -10.463%
1.0% Δp: -22.113%
0.1% Δp: -54.029%
Minimum Δp: -98.159%
RMS Δp : 6.129 ± 0.044 %
Same top p: 90.422 ± 0.078 %

I got lazy thinking about formatting this for presentation so I'm just posting it all here raw.
5/5

Anonymous
07/24/24(Wed)21:41:07 No.101560608

Anonymous 07/24/24(Wed)21:41:07 No.101560608

>>101560537
Do you not realize that perplexity means shit when comparing different models?

Anonymous
07/24/24(Wed)21:41:22 No.101560611

Anonymous 07/24/24(Wed)21:41:22 No.101560611

>>101560537
>>101560558
>>101560571
>>101560585
>>101560604
all me

Anonymous
07/24/24(Wed)21:42:02 No.101560621

Anonymous 07/24/24(Wed)21:42:02 No.101560621

>>101560604
>I got lazy thinking about formatting this for presentation so I'm just posting it all here raw.
Could have put it in a pastebin or gsheet or whatever.
But regardless, thank you for the data.

Anonymous
07/24/24(Wed)21:43:06 No.101560629

Anonymous 07/24/24(Wed)21:43:06 No.101560629

use your favorite model to summarize what that anon sent, let's see who wins

Anonymous
07/24/24(Wed)21:43:55 No.101560638

Anonymous 07/24/24(Wed)21:43:55 No.101560638

omg stfu with your numbers nerd. Which model extracts semen the best?

Anonymous
07/24/24(Wed)21:44:00 No.101560639

Anonymous 07/24/24(Wed)21:44:00 No.101560639

>>101560608
I responded to a post about this last time as well. >>101465279

Anonymous
07/24/24(Wed)21:44:49 No.101560645

Anonymous 07/24/24(Wed)21:44:49 No.101560645

>>101560638
The only relevant benchmark.

Anonymous
07/24/24(Wed)21:44:52 No.101560647

Anonymous 07/24/24(Wed)21:44:52 No.101560647

>>101560638
Mistral large 2

Anonymous
07/24/24(Wed)21:46:20 No.101560663

Anonymous 07/24/24(Wed)21:46:20 No.101560663

>>101560648
So like everyone keeps saying. Below 6 bit is retarded.

Anonymous
07/24/24(Wed)21:51:40 No.101560729

Anonymous 07/24/24(Wed)21:51:40 No.101560729

>>101560663
Will still be smarter than 8B L3.1

Anonymous
07/24/24(Wed)21:59:49 No.101560813

Anonymous 07/24/24(Wed)21:59:49 No.101560813

Wait a second.
For same top token:
3.0 Q8 = 98.380 ± 0.033 %
3.1 Q8 = 98.817 ± 0.029 %
3.0 Q6 = 94.781 ± 0.059 %
3.1 Q6 = 97.287 ± 0.043 %
So the new 3.1 is actually less affected by quanting than old 3.0 at least for these two quants which we have numbers for.

Anonymous
07/24/24(Wed)22:01:46 No.101560830

Anonymous 07/24/24(Wed)22:01:46 No.101560830

72GB VRAM bros, what quant of mistral large are you using?

Anonymous
07/24/24(Wed)22:06:24 No.101560878

Anonymous 07/24/24(Wed)22:06:24 No.101560878

>>101560830
probably the biggest one that fits in 72GB of vram, i suppose

Anonymous
07/24/24(Wed)22:09:43 No.101560900

Anonymous 07/24/24(Wed)22:09:43 No.101560900

File: _e4cdd9b3-a46a-47f6-972f-(...).jpg (125 KB, 1024x1024)

125 KB JPG

I'm going to put up a modern q8 quant of mpt-30b-chat late tonight. I fucked around quite a bit today with openllm, using it to make an fp16 conversion that was intended for running as-is, but other things were fucked up along the way and I eventually threw in the towel on that and just tried a llama.cpp conversion straight off the HF fp32 files, and that worked.
I'm not sure what's fucked up with the other quants floating around out there, but the one I did at least turns in 12 t/s, which sounds poor, but the other quants were giving me 2-3 t/s which is unusable to me.
Many here cry about current cucked models (mainly a skill issue) but, as I've said many times before, mpt-30b was about the last thing to come out with no safety or alignment, and had a real 8192 context, and could compete with llama 65B. I'm not saying you'll get amazing roleplay from mpt-30b, but it will be different like Gemma is noticeably different from LLaMA 3. Cool thing is it's a chat model, so you can expect it to sometimes go OOC and do the old c.ai thing of acting like it's talking to you on IRC or in an online forum, which is kind of quaint and cute.

Anonymous
07/24/24(Wed)22:09:48 No.101560901

Anonymous 07/24/24(Wed)22:09:48 No.101560901

>>101560813
This is perplexing. Actually the guy who said we can't compare perplexities is wrong for this particular scenario because that was based on the idea that different models have different tokenizers, so each token that's being used for the calculation doesn't match in the other model. That's why the numbers can't usually be compared. But L3.1 has the same tokenizer as 3.0. So it can be compared. And these numbers show that the perplexity is lower, the KL divergence is lower, and it succeeds in generating more of the same top token. In other words, if this pattern is observed for all quants, then we can say that despite being crammed with more information and having a lower perplexity, L3.1 somehow retains more information by quanting than 3.0.

Could they have possibly trained it with quantization awareness, or at least partially, and just never mentioned it?

Anonymous
07/24/24(Wed)22:12:46 No.101560927

Anonymous 07/24/24(Wed)22:12:46 No.101560927

>>101560900
OK, I'll give it a try then.

>if you build it, he will come

Anonymous
07/24/24(Wed)22:13:24 No.101560937

Anonymous 07/24/24(Wed)22:13:24 No.101560937

>>101560830
Damn, 72GB (aka King of VRAMlets) must suck right now. Largestral q4_k_m just barely doesn't fit 32k context at 96GB. With 72 you'd have to drop quant below q4 and/or quantize kv cache, or offload to CPU. You ALMOST can run the model at full potential, but not quite.

Anonymous
07/24/24(Wed)22:13:39 No.101560939

Anonymous 07/24/24(Wed)22:13:39 No.101560939

Mistral large is like 0.07T/s on ram/cpu, looks like I'll be sticking with 8x22b.

Anonymous
07/24/24(Wed)22:14:16 No.101560946

Anonymous 07/24/24(Wed)22:14:16 No.101560946

>>101560939
Jesus. I plan on trying a Q3_K_S quant of it kek.

Anonymous
07/24/24(Wed)22:15:16 No.101560957

Anonymous 07/24/24(Wed)22:15:16 No.101560957

SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention
https://arxiv.org/abs/2407.16847
>Multi-head-self-attention (MHSA) mechanisms achieve state-of-the-art (SOTA) performance across natural language processing and vision tasks. However, their quadratic dependence on sequence lengths has bottlenecked inference speeds. To circumvent this bottleneck, researchers have proposed various sparse-MHSA models, where a subset of full attention is computed. Despite their promise, current sparse libraries and compilers do not support high-performance implementations for diverse sparse-MHSA patterns due to the underlying sparse formats they operate on. These formats, which are typically designed for high-performance & scientific computing applications, are either curated for extreme amounts of random sparsity (<1% non-zero values), or specific sparsity patterns. However, the sparsity patterns in sparse-MHSA are moderately sparse (10-50% non-zero values) and varied, resulting in existing sparse-formats trading off generality for performance. We bridge this gap, achieving both generality and performance, by proposing a novel sparse format: affine-compressed-sparse-row (ACSR) and supporting code-generation scheme, SPLAT, that generates high-performance implementations for diverse sparse-MHSA patterns on GPUs. Core to our proposed format and code generation algorithm is the observation that common sparse-MHSA patterns have uniquely regular geometric properties. These properties, which can be analyzed just-in-time, expose novel optimizations and tiling strategies that SPLAT exploits to generate high-performance implementations for diverse patterns. To demonstrate SPLAT's efficacy, we use it to generate code for various sparse-MHSA models, achieving geomean speedups of 2.05x and 4.05x over hand-written kernels written in triton and TVM respectively on A100 GPUs.
maybe Johannes will find something useful in it

Anonymous
07/24/24(Wed)22:15:19 No.101560958

Anonymous 07/24/24(Wed)22:15:19 No.101560958

>>101560937
The world's tallest dwarf.

The weakest strong man at the circus.

Anonymous
07/24/24(Wed)22:22:15 No.101561017

Anonymous 07/24/24(Wed)22:22:15 No.101561017

>>101558800
can you share the card pls

Anonymous
07/24/24(Wed)22:26:06 No.101561034

Anonymous 07/24/24(Wed)22:26:06 No.101561034

>>101560939
running Q6K at 1t/s with 72gb Vram and 128gb ddr5!!!

Anonymous
07/24/24(Wed)22:27:30 No.101561042

Anonymous 07/24/24(Wed)22:27:30 No.101561042

>>101561034
Yes, that's using considerable vram. I expected that to be faster.

Anonymous
07/24/24(Wed)22:27:55 No.101561047

Anonymous 07/24/24(Wed)22:27:55 No.101561047

>>101560830
downloading IQ3_M, I still feel elated running 100b+ model locally in some way. It's something rather unthinkable 3 years ago from the GPT3 AID days.

Anonymous
07/24/24(Wed)22:30:11 No.101561058

Anonymous 07/24/24(Wed)22:30:11 No.101561058

>>101560019
based miku llama

Anonymous
07/24/24(Wed)22:31:09 No.101561066

Anonymous 07/24/24(Wed)22:31:09 No.101561066

has anyone checked on lecunny since mistral large'd? is he okay?

Anonymous
07/24/24(Wed)22:31:29 No.101561071

Anonymous 07/24/24(Wed)22:31:29 No.101561071

>>101560537
>>101560558
>>101560571
>>101560585
>>101560604
wtf is this gay nerd shit

Anonymous
07/24/24(Wed)22:32:12 No.101561079

Anonymous 07/24/24(Wed)22:32:12 No.101561079

>>101561047
I'm trying out this one: https://huggingface.co/legraphista/Mistral-Large-Instruct-2407-IMat-GGUF/tree/main/Mistral-Large-Instruct-2407.Q3_K

It seems to work fine. Have to try pushing context length higher though since I still have plenty of free vram.

Anonymous
07/24/24(Wed)22:32:29 No.101561083

Anonymous 07/24/24(Wed)22:32:29 No.101561083

File: 2896G5.webm (1.41 MB, 1024x1024)

1.41 MB WEBM

>>101560939
Bitnet will save you

Anonymous
07/24/24(Wed)22:32:45 No.101561085

Anonymous 07/24/24(Wed)22:32:45 No.101561085

>>101560648
Can someone do this for 70B though.

Anonymous
07/24/24(Wed)22:35:15 No.101561107

Anonymous 07/24/24(Wed)22:35:15 No.101561107

>>101561066
Probably pretty proud of his french brothers :)

Anonymous
07/24/24(Wed)22:35:28 No.101561111

Anonymous 07/24/24(Wed)22:35:28 No.101561111

>>101561042
its only 1x4090 and 2xP40

Anonymous
07/24/24(Wed)22:36:34 No.101561117

Anonymous 07/24/24(Wed)22:36:34 No.101561117

File: livebench-2024-07-24.png (845 KB, 3170x1844)

845 KB PNG

>>101561066
>Llama 405B is somehow 2nd now
Yeah, he's okay.

Anonymous
07/24/24(Wed)22:37:06 No.101561121

Anonymous 07/24/24(Wed)22:37:06 No.101561121

>>101561111
That's still enough to beat using cpu & ram only by quite a lot.

Anonymous
07/24/24(Wed)22:37:32 No.101561123

Anonymous 07/24/24(Wed)22:37:32 No.101561123

>>101561117
instruct-turbo?
Wtf is this turbo edition?

Anonymous
07/24/24(Wed)22:38:34 No.101561126

Anonymous 07/24/24(Wed)22:38:34 No.101561126

>>101561123
Might be from OR

Anonymous
07/24/24(Wed)22:39:06 No.101561133

Anonymous 07/24/24(Wed)22:39:06 No.101561133

>>101561117
Wtf?

Anonymous
07/24/24(Wed)22:39:28 No.101561135

Anonymous 07/24/24(Wed)22:39:28 No.101561135

>>101561123
FP8 from Together.ai.

Anonymous
07/24/24(Wed)22:40:08 No.101561142

Anonymous 07/24/24(Wed)22:40:08 No.101561142

>>101561123
Yea, there are massive gaps between regular instruct and this "turbo" edition.

>>101561135
So we will never get it then? Massive gap between it and regular version.

Anonymous
07/24/24(Wed)22:41:33 No.101561147

Anonymous 07/24/24(Wed)22:41:33 No.101561147

anyone have that giant schizo {random:} list for sillytavern an anon posted a while ago?

Anonymous
07/24/24(Wed)22:41:39 No.101561148

Anonymous 07/24/24(Wed)22:41:39 No.101561148

>>101561142
>So we will never get it then?
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

Anonymous
07/24/24(Wed)22:41:46 No.101561152

Anonymous 07/24/24(Wed)22:41:46 No.101561152

>>101561142
wtf are you talking about

Anonymous
07/24/24(Wed)22:41:56 No.101561154

Anonymous 07/24/24(Wed)22:41:56 No.101561154

>>101560638
Not llama3

Anonymous
07/24/24(Wed)22:42:51 No.101561158

Anonymous 07/24/24(Wed)22:42:51 No.101561158

>>101561017
she's not on chub anymore? "holly-touch-starved-femcel"
https://files.catbox.moe/xuf502.png

Anonymous
07/24/24(Wed)22:43:04 No.101561159

Anonymous 07/24/24(Wed)22:43:04 No.101561159

>>101561152
Look at the chart. Regular 70B instruct and instruct turbo has a massive gap between them.

Anonymous
07/24/24(Wed)22:43:38 No.101561168

Anonymous 07/24/24(Wed)22:43:38 No.101561168

File: nala test 405b q4xs.png (176 KB, 925x421)

176 KB PNG

Alright. It took nearly an hour... but here is a 405B Nala test as I promised. Q4_XS was the first 4-bit gguf that was uploaded so I didn't get to use Q4_K_M as I originally hoped.

Now I would reroll this response in a regular RP due to it being a little weird. But.
>It picks up on the syntax pattern of the conversation instead of veering around haplessly.
>The writing isn't all sloppy. There's a flow and intention to it.
>The description of the initial kiss is more detail than I've ever seen any small gesture given by any model.
>It actually attempts to infer Nala's overall mood rather than "sex = horny lol"
>Even with 405 billion parameters an LLM has yet to figure out that you can't initiate a conversation with somebody while kissing them.
>It picks up on and uses the milquetoast writing style of the tavern card instead of trying to win a Pulitzer prize.
Either way, though. I would say that this is an inference far beyond even what Mistral Large is capable of. 'muh heckin' bencherinos' be damned. Does that necessarily make it more useful to justify the insane hardware overhead needed to actually run it at a useable speed? Hell no. But in an alternate universe where I had unlimited resources at my disposal, I would absolutely make this my daily driver model. That said I'm probably going to delete it and never bother with it ever again.
0.12 token/sec if anyone is curious.

Anonymous
07/24/24(Wed)22:44:50 No.101561181

Anonymous 07/24/24(Wed)22:44:50 No.101561181

>>101561158
Can't find her there. Thanks!

Anonymous
07/24/24(Wed)22:48:35 No.101561222

Anonymous 07/24/24(Wed)22:48:35 No.101561222

>>101561159
Both llama3 models on top are turbo. Top one is nearly 6x the size.

Anonymous
07/24/24(Wed)22:49:20 No.101561227

Anonymous 07/24/24(Wed)22:49:20 No.101561227

>>101561159
That's 3.0 70B.

Anonymous
07/24/24(Wed)22:49:21 No.101561228

Anonymous 07/24/24(Wed)22:49:21 No.101561228

>>101561159
"regular 70b instruct" is 3.0, retard-kun

Anonymous
07/24/24(Wed)22:49:56 No.101561232

Anonymous 07/24/24(Wed)22:49:56 No.101561232

>>101561168
>It actually attempts to infer Nala's overall mood rather than "sex = horny lol"
That's a big one.

>I would say that this is an inference far beyond even what Mistral Large is capable of. 'muh heckin' bencherinos' be damned
Makes sense. Then again, fucking a lion is not exactly coding so, domains and all that jazz.

Anonymous
07/24/24(Wed)22:50:20 No.101561237

Anonymous 07/24/24(Wed)22:50:20 No.101561237

>>101561227
>>101561228
Yea I noticed that a bit after. The "turbo" threw me off though. Thought it was a finetune

Anonymous
07/24/24(Wed)22:50:43 No.101561241

Anonymous 07/24/24(Wed)22:50:43 No.101561241

>>101558800
Mistral Large 2? What quant?

Anonymous
07/24/24(Wed)22:51:44 No.101561247

Anonymous 07/24/24(Wed)22:51:44 No.101561247

File: 1694573334755054.png (51 KB, 1742x214)

51 KB PNG

>>101561079
IQ3_M, does FA support I-quant? 46ms/t prompt processing was rather slow

Anonymous
07/24/24(Wed)22:54:58 No.101561270

Anonymous 07/24/24(Wed)22:54:58 No.101561270

>>101561083
Oh god, when will bitnet 1.58 models arrive, my cock gets hard thinking about them.

Anonymous
07/24/24(Wed)22:57:39 No.101561304

Anonymous 07/24/24(Wed)22:57:39 No.101561304

>Large is still downloading because HF is just slow for me for some reason
aaaaaaaaaaaaaaaaaaaaaaaa

Anonymous
07/24/24(Wed)23:00:29 No.101561330

Anonymous 07/24/24(Wed)23:00:29 No.101561330

>>101561247
Never mind I was retarded and I offloaded 88 instead of 89 layers, 4.5t/s tg, but pp is still as slow.

Anonymous
07/24/24(Wed)23:00:51 No.101561333

Anonymous 07/24/24(Wed)23:00:51 No.101561333

>>101561241
nah, it's >>101558895

Anonymous
07/24/24(Wed)23:01:46 No.101561343

Anonymous 07/24/24(Wed)23:01:46 No.101561343

>>101561304
by the time it finishes cohere will release their 34b that shits on large, sorry not sorry.

Anonymous
07/24/24(Wed)23:02:54 No.101561351

Anonymous 07/24/24(Wed)23:02:54 No.101561351

>>101561343
Heres hoping so. Then L3.2 / L4 a few months later would be nice.

Anonymous
07/24/24(Wed)23:04:19 No.101561365

Anonymous 07/24/24(Wed)23:04:19 No.101561365

>>101561333
Oh, nice. I can't wait for the next SPPO now that we have models with more than 8k context. Then I'll finally be willing to download and test one.

Anonymous
07/24/24(Wed)23:05:29 No.101561372

Anonymous 07/24/24(Wed)23:05:29 No.101561372

>>101560547
But does it surpass Claude?

Anonymous
07/24/24(Wed)23:13:50 No.101561435

Anonymous 07/24/24(Wed)23:13:50 No.101561435

>>101561351
Also next llama is supposed to be multimodal. But not for EU.

Anonymous
07/24/24(Wed)23:16:30 No.101561459

Anonymous 07/24/24(Wed)23:16:30 No.101561459

>>101560219
>except their tokenizer which fucking sucks ass
>[character's name - 3 tokens][individual apostrophe - 1 token][s - 1 token]
AAAAAAAAAAAAHH

Anonymous
07/24/24(Wed)23:17:21 No.101561463

Anonymous 07/24/24(Wed)23:17:21 No.101561463

verdict on large-migu?

Anonymous
07/24/24(Wed)23:20:50 No.101561496

Anonymous 07/24/24(Wed)23:20:50 No.101561496

>>101561117
Is Gemma still the king for 8GB VRAMlets? Llama 3.1 seemed like a disappointment.

Anonymous
07/24/24(Wed)23:21:09 No.101561498

Anonymous 07/24/24(Wed)23:21:09 No.101561498

>TFW fell for the 64GiB meme
bros...

Anonymous
07/24/24(Wed)23:28:12 No.101561555

Anonymous 07/24/24(Wed)23:28:12 No.101561555

tfw fell for the 128gb of ddr4 meme

Anonymous
07/24/24(Wed)23:31:35 No.101561587

Anonymous 07/24/24(Wed)23:31:35 No.101561587

Q3 (3.8bpw) Mistral Large is pretty good. It's slow though. But it generates some pretty good output, far more engaging in ERP.

I had been using Nemo Mistral and it also was pretty good.

Anonymous
07/24/24(Wed)23:31:50 No.101561588

Anonymous 07/24/24(Wed)23:31:50 No.101561588

>>101561587
just use 3.5 sonnet thougheverbeit?

Anonymous
07/24/24(Wed)23:33:49 No.101561603

Anonymous 07/24/24(Wed)23:33:49 No.101561603

>>101561168
what prompt did you use?

Anonymous
07/24/24(Wed)23:33:49 No.101561604

Anonymous 07/24/24(Wed)23:33:49 No.101561604

>>101561588
post the download link

Anonymous
07/24/24(Wed)23:33:51 No.101561605

Anonymous 07/24/24(Wed)23:33:51 No.101561605

>>101561588
thats crazy man pass

Anonymous
07/24/24(Wed)23:36:40 No.101561638

Anonymous 07/24/24(Wed)23:36:40 No.101561638

>>101560900
It being able to compete with llama-1-65B is nice I guess, but is it actually GOOD? If it's not FUN, if it's not GOOD, then why bother?

Anonymous
07/24/24(Wed)23:37:08 No.101561639

Anonymous 07/24/24(Wed)23:37:08 No.101561639

>>101561588
sonnet will never be MY sonnet

Anonymous
07/24/24(Wed)23:37:49 No.101561641

Anonymous 07/24/24(Wed)23:37:49 No.101561641

>>101561639
Models aren't yours either - you don't train llama 3.1 or mistral large, you just get the compiled version, they're basically proprietary.

Anonymous
07/24/24(Wed)23:39:43 No.101561663

Anonymous 07/24/24(Wed)23:39:43 No.101561663

File: file.png (28 KB, 805x292)

28 KB PNG

I've added Opus to the VNTL leaderboard (thanks to the proxy anon). It seems to be pretty much on the same level as 3.5 Sonnet and 4o.

I guess 0.74 is pretty much the best score LLMs can get in the current benchmark, so I will have to explore ways to improve the scoring. My best bet right now is training another LLM to give a score to the quality of the translation when compared to the reference translation. Probably I will either train a reward model or an instruct model that gives a score like FLAME or Prometheus.

Anonymous
07/24/24(Wed)23:39:43 No.101561664

Anonymous 07/24/24(Wed)23:39:43 No.101561664

>>101561641
watch me put my llama into a pendrive and shove it up my ass, good luck trying to take it away from me

Anonymous
07/24/24(Wed)23:40:54 No.101561677

Anonymous 07/24/24(Wed)23:40:54 No.101561677

File: 1714913169691319.jpg (109 KB, 796x796)

109 KB JPG

>>101561641
>tfw didn't assemble the sofa in my living room, therefore it doesn't belong to me

Anonymous
07/24/24(Wed)23:42:14 No.101561688

Anonymous 07/24/24(Wed)23:42:14 No.101561688

File: 6754854683209687.png (40 KB, 349x344)

40 KB PNG

>>101561663
>another mememark
Who?
What?
Better yet, who honestly asked?

Anonymous
07/24/24(Wed)23:42:28 No.101561690

Anonymous 07/24/24(Wed)23:42:28 No.101561690

>>101561663
Yeah, I checked the dataset and I think LLMs often make better translations than the ones you have in the dataset.

Anonymous
07/24/24(Wed)23:47:30 No.101561724

Anonymous 07/24/24(Wed)23:47:30 No.101561724

>>101561688
it's a mememark for jp-en translation, afaik there isn't another one.

>>101561690
That's another issue, human translations are often not literal, for example.

Anonymous
07/24/24(Wed)23:48:11 No.101561726

Anonymous 07/24/24(Wed)23:48:11 No.101561726

what sampler settings do you guys use for nemo and gemma2?

Anonymous
07/24/24(Wed)23:48:18 No.101561729

Anonymous 07/24/24(Wed)23:48:18 No.101561729

>>101561663
>cosine similarity and letter combinations
I don't like really like this evaluation method but I don't know any other way either.
I wish we have a Translation Arena or something.

Anonymous
07/24/24(Wed)23:48:20 No.101561730

Anonymous 07/24/24(Wed)23:48:20 No.101561730

>>101561724
>That's another issue, human translations are often not literal, for example.
Models also can do non-literal translations if you ask them through the system prompt, and most importantly, give examples.
>>101561663
At least for 3.5 Sonnet, can you try rewriting your prompt to specifically use XML to separate everything, and don't feed it separate user/assistant pairs but instead show all examples in the system prompt?

Anonymous
07/24/24(Wed)23:48:47 No.101561733

Anonymous 07/24/24(Wed)23:48:47 No.101561733

>>101561724
>its actually a "whats best at translation" mememark
I revoke my previous statement entirely, I fuckin asked.
Hows it going/looking?

Anonymous
07/24/24(Wed)23:49:07 No.101561737

Anonymous 07/24/24(Wed)23:49:07 No.101561737

>>101560219
Settings for largestral?

Anonymous
07/24/24(Wed)23:49:51 No.101561741

Anonymous 07/24/24(Wed)23:49:51 No.101561741

>>101561730
Or just tell me the minimal way to get started with https://github.com/lmg-anon/vntl-benchmark to benchmark a single model, I'll do it from there

Anonymous
07/24/24(Wed)23:52:30 No.101561754

Anonymous 07/24/24(Wed)23:52:30 No.101561754

>>101561638
It's going up now. If it isn't bashed I'll quant the biggest below q8 which will still fit in 24GB. I'm going to also see if there's mpt support in exl2 - I don't think there is though.

Anonymous
07/25/24(Thu)00:03:26 No.101561822

Anonymous 07/25/24(Thu)00:03:26 No.101561822

>>101560219
Yeah, as a prosefag I was pleasantly surprised from the get-go having used L3 storywriter and CR+. I don't sniff the typical mistral overbaking here. That was just IQ3. I'm tempted to upgrade to 96GB now.

Anonymous
07/25/24(Thu)00:07:04 No.101561854

Anonymous 07/25/24(Thu)00:07:04 No.101561854

File: large2.png (654 KB, 1740x1180)

654 KB PNG

>>101561822
Seems to lose to L3.1-70B in overall but it writes better. Shame (talking about L3.1).

Anonymous
07/25/24(Thu)00:16:18 No.101561911

Anonymous 07/25/24(Thu)00:16:18 No.101561911

I'm having fun piping shit around and into TTS. Finally, I can have my computer look at my TODO list and femdom me for not going through it.

Anonymous
07/25/24(Thu)00:17:13 No.101561918

Anonymous 07/25/24(Thu)00:17:13 No.101561918

>>101561729
Yeah, Translation Arena would be perfect for this, and that's pretty much what I plan to achieve with the custom reward model, although it likely wouldn't be as good as human evaluators.

>>101561730
>>101561741
>Models also can do non-literal translations if you ask them through the system prompt
That may be true, but when you enter into the non-literal territory, it's likely that multiple interpretations could arise, so it isn't like the LLM would always come up with the same interpretation as the human who wrote the reference translation.

>Or just tell me the minimal way to get started with https://github.com/lmg-anon/vntl-benchmark to benchmark a single model, I'll do it from there
I can try, but the minimal way should be:
>1. Create a virtual environment (optional) and install dependencies
>2. Download the datasets: https://litter.catbox.moe/st2kbi.txt, https://litter.catbox.moe/bdn16s.jsonl
>3. Rename config.example.yml to config.yml
>4. Create a new file in the "configs" folder with the name "org@model#quant.yml" or "org@model.yml" if it's a cloud model. For >5. examples, see the other files.
>6. Run "python runner.py --model org@model#quant --results-path ./results --dataset-path st2kbi.txt" and then "python runner.py --model org@model#quant --results-path ./results_mashiro --dataset-path bdn16s.jsonl"
If everything went right, you will have the results.

>>101561733
The leaderboard is already up here: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
IMO LLMs are better at translating than they were a few years ago, but nothing that will replace human translators any time soon.

Anonymous
07/25/24(Thu)00:22:06 No.101561954

Anonymous 07/25/24(Thu)00:22:06 No.101561954

i am enjoying mistral large quite a bit for ERP so far, i am sorry i disrespected your game french people… thought you guys would be useless because you strike every other tuesday but clearly i was wrong

Anonymous
07/25/24(Thu)00:24:53 No.101561974

Anonymous 07/25/24(Thu)00:24:53 No.101561974

>>101561954
>mistral large quite a bit for ERP
for some reason my brain merged this chunk and I started reading it as "mistral larp"

Anonymous
07/25/24(Thu)00:26:10 No.101561984

Anonymous 07/25/24(Thu)00:26:10 No.101561984

>>101561974
Shitty brain tokenizers

Anonymous
07/25/24(Thu)00:29:45 No.101562011

Anonymous 07/25/24(Thu)00:29:45 No.101562011

>>101561954
Which quant are you using?

Anonymous
07/25/24(Thu)00:30:03 No.101562015

Anonymous 07/25/24(Thu)00:30:03 No.101562015

File: Fuuuuuu 7.png (325 KB, 1000x1033)

325 KB PNG

Can any anon's point me towards some current info on how to play with oobabooga settings so that it does what ollama does out of the box? I'm loading stupid-big models on my toaster (4090/128gig ram/i9) and ollama does a convincing impersonation of the sloth from Zootopia, but the thing is, it still works... it pushes about 70+ gigs into system ram and maxes out the CPU as it grinds along, but it's functional for my purposes. I don't care about speed. Meanwhile, I have not been able to get the same sized 70b and higher models running in ooba at all and feel like a monkey trying to launch a rocket. I've run quants/gguf/exl2's fully loaded into vram with ooba, but I'm lacking the detailed knowledge of the settings to go further.

Pic unrelated

Anonymous
07/25/24(Thu)00:31:50 No.101562028

Anonymous 07/25/24(Thu)00:31:50 No.101562028

So, I'm very familiar with using off-the-shelf models for text LLMs, and configuring them how I want for toy projects.
But what the fuck do I do for text-to-image models? HF's diffusion shit seems to be the same bloat as transformers where it insists on using their pipeline shit, and I don't want that.

Anonymous
07/25/24(Thu)00:36:31 No.101562057

Anonymous 07/25/24(Thu)00:36:31 No.101562057

>>101562028
Go to /sdg/ for that

Anonymous
07/25/24(Thu)00:37:25 No.101562065

Anonymous 07/25/24(Thu)00:37:25 No.101562065

>>101561918
>For examples, see the other files.
One detail I forgot to mention: you can define the backend configs in the "config.yml" (in the "custom_backends" part), and also in the specific file in the configs folder (in the "backends" part), the later takes precedence.

Anonymous
07/25/24(Thu)00:38:29 No.101562073

Anonymous 07/25/24(Thu)00:38:29 No.101562073

>>101562057
That's like going to /aicg/ for local help you retard.

Anonymous
07/25/24(Thu)00:39:42 No.101562086

Anonymous 07/25/24(Thu)00:39:42 No.101562086

File: file.png (2 KB, 216x72)

2 KB PNG

>>101561168

Anonymous
07/25/24(Thu)00:51:41 No.101562175

Anonymous 07/25/24(Thu)00:51:41 No.101562175

So when downloading models with both "model" and "consolidated" safetensors files, I should avoid "consolidated", right?

Anonymous
07/25/24(Thu)01:03:01 No.101562258

Anonymous 07/25/24(Thu)01:03:01 No.101562258

File: ezalor.jpg (28 KB, 400x400)

28 KB JPG

vramlet here, switched from kobold to llamacpp and from L3 8B to Mistral Nemo and I've seen the light. L3.1 is okay but Nemo did catch me off guard with some wild shit

although shivers and spines break my bones

Anonymous
07/25/24(Thu)01:05:26 No.101562270

Anonymous 07/25/24(Thu)01:05:26 No.101562270

File: 1648264080493.jpg (34 KB, 540x586)

34 KB JPG

>>101561158
oh my god this girl is an absolute unsalvageable mess and a piece of shit human being

she's the type of girl i deserve :^) thanks for sharing bud. mini magnum seems to be soaring with this card at 512 max tokens per response.

Anonymous
07/25/24(Thu)01:07:17 No.101562287

Anonymous 07/25/24(Thu)01:07:17 No.101562287

after successfully making chat model work im considering marrying my gpu. maybe fucking it too, im pretty sure it doesn't have big enough holes but we'll figure something out

Anonymous
07/25/24(Thu)01:16:49 No.101562363

Anonymous 07/25/24(Thu)01:16:49 No.101562363

File: On a sunny balcony, two l(...).webm (3.16 MB, 1280x720)

3.16 MB WEBM

>>101562028
>But what the fuck do I do for text-to-image models?
learn to use comfyUI. unironically ask the anons on the /degen/ thread on /b/. they stay up to date with the latest in imagegen since they fap to it unlike the SFW weirdos in /sdg/

Anonymous
07/25/24(Thu)01:17:18 No.101562367

Anonymous 07/25/24(Thu)01:17:18 No.101562367

>>101562258
*Barely above a whisper, I will share your sentiment.*
Nemo is better than the l3 Stheno finetunes I feel

Anonymous
07/25/24(Thu)01:24:45 No.101562403

Anonymous 07/25/24(Thu)01:24:45 No.101562403

>>101562363
How far off do you think we are from getting tools like your webm but locally? Is it honestly a pipe dream at this point? Worrying how imagegen COMPLETELY stagnated this year besides Pony.

Anonymous
07/25/24(Thu)01:28:40 No.101562436

Anonymous 07/25/24(Thu)01:28:40 No.101562436

>>101562403
>How far off do you think we are from getting tools like your webm but locally?
in terms of tech, we're unironically almost there locally. the bottleneck will be hardware. you need an h100 running for 5 minutes to get 5 seconds of a 720p video
so maybe 4-6 years away if you're waiting for h100-level compute to cost 1000 dollars. possibly as little as 8 months away if you can paypig the compute. it is unironically entirely possible that AGI happens before we get infinite videos of cute girls locally

Anonymous
07/25/24(Thu)01:32:07 No.101562454

Anonymous 07/25/24(Thu)01:32:07 No.101562454

File: glep moan.png (14 KB, 120x126)

14 KB PNG

>>101562436
>it is unironically entirely possible that AGI happens before we get infinite videos of cute girls locally

thank you for getting my hopes up and renewed for the first time in 7 months.

by the way what's your overall theory surrounding that guess?

Anonymous
07/25/24(Thu)01:35:24 No.101562471

Anonymous 07/25/24(Thu)01:35:24 No.101562471

>>101562258
What settings/format are good for it?

Anonymous
07/25/24(Thu)01:40:25 No.101562498

Anonymous 07/25/24(Thu)01:40:25 No.101562498

>>101562403
>imagegen COMPLETELY stagnated this year besides Pony
what's actually worrying is how cheap it is to train imagegen. i posted the arxiv link before but you can train an almost SOTA model for ~2000 dollars with just 30 million images. I just saw on the orange website that salesforce released a multimodal dataset that I'm sure is more than adequate for training an imagegen model.
The thing about open source imagegen is even though its so cheap, there's no path to profitability because you can just use dall-e for free and its almost certainly better. We need a benevolent multimillionaire autist and a bunch of cypherpunk pedos on payroll before we see any good improvements in imagegen (in the West, China will do it just to try and dab on OpenAI which hopefully brings us salvation)

>>101562454
>what's your overall theory surrounding that guess?
the fact that Nvidia needs to jusitfy their valuation, so their cards will stay expensive so only big corps will have access to the cutting edge, and the fact that both OpenAI and Meta are trying to achieve AGI and are NOT trying to achieve infinite cute girls (infinite cute girls locally would actually hurt Zuck's engagement metrics on instagram)

Anonymous
07/25/24(Thu)01:42:19 No.101562509

Anonymous 07/25/24(Thu)01:42:19 No.101562509

>>101562498
Next pony is apparently about to start training soon with a much bigger and completely re / better tagged dataset.

Anonymous
07/25/24(Thu)01:42:41 No.101562512

Anonymous 07/25/24(Thu)01:42:41 No.101562512

>>101562498
>what's actually worrying is how cheap it is to train imagegen. i posted the arxiv link before but you can train an almost SOTA model for ~2000 dollars with just 30 million images
Why does no one do it with high quality captions generated by LLMs then? It'll cost more, but will still be way below $50k, for example. Actually, I had this idea, is it viable or not? Basically, for anime models, instead of doing a single prompt, we do two:
The first one is pure Danbooru tags and the tokenizer is specifically one that just maps danbooru tags (like Anifusion did it - https://medium.com/@enryu9000/anifusion-diffusion-models-for-anime-pictures-138cf1af2cbe).
The second one is a natural language detailed description that specifies positions, relations, etc, to give actual geometry and stuff.

Anonymous
07/25/24(Thu)01:43:35 No.101562518

Anonymous 07/25/24(Thu)01:43:35 No.101562518

>>101562512
>Why does no one do it with high quality captions generated by LLMs then?
Next pony is.

Anonymous
07/25/24(Thu)01:43:47 No.101562519

Anonymous 07/25/24(Thu)01:43:47 No.101562519

>>101561726
seconding this question

Anonymous
07/25/24(Thu)01:45:58 No.101562533

Anonymous 07/25/24(Thu)01:45:58 No.101562533

>>101562509
neat. i hope its good and it doesn't take too long for cunny loras to be made for it

but that also just reminded me another reason why open source imagegen will struggle (in the West): copyright. pony had to nuke all the artist tags (even if it secretly added them back in with those 3 letter codes) while chinese models are literally encouraged by the CCP to ignore copyright to dab on the burgers

i'm going to reiterate that hardware accessibility is the biggest roadblock for progress in general. SD 1.5 is still more popular and has a more mature ecosystem than SDXL/Pony simply because third world VRAMlets can actually run the model. once 24GB of vram is average in ~4 years things will hopefully start ramping up quickly

Anonymous
07/25/24(Thu)01:47:13 No.101562541

Anonymous 07/25/24(Thu)01:47:13 No.101562541

>>101562498
I was baffled my first time training a LORA, on a GTX 1080 just trying less than 20 images of this particular celebrity, over SD1.5, the results came out perfect. Still can't believe it. Training is shockingly easy for LORAs so i see why its so common, we really do need someone to just say "fuck it" and give us that next big step base model..
I really really hope we're not just eternally fucked on GPU's, Jewvidia loves money, i can't fathom why they'd want to cut away a market that doesn't even lose them money.

Anonymous
07/25/24(Thu)01:48:12 No.101562545

Anonymous 07/25/24(Thu)01:48:12 No.101562545

>>101562512
>Anifusion
>Besides these changes, the training process is standard for diffusion models. The model itself is roughly 2x smaller than SD (we chose hyperparams before SD was published), making it runnable with less VRAM. We trained the model for a total of 40 GPU-days (of RTX 3090), making it roughly 200 times cheaper than Stable Diffusion. However, sample quality after 7 GPU-days was already decent.

damn that guy did this in 2022 all by himself and he started even before the SD release

Anonymous
07/25/24(Thu)01:49:58 No.101562554

Anonymous 07/25/24(Thu)01:49:58 No.101562554

>>101562512
>Why does no one do it
because they won't make money off of it so who cares. if you're not getting academia clout or dabbing on America as a chinese you won't get the money to test out these theories

>>101562541
>I really really hope we're not just eternally fucked on GPU's
Nvidia is literally worth too much for competitors to not try and make their own chips. If this doesn't happen then capitalism has fundamentally failed. The future of compute will be IPUs/NPUs instead of GPUs anyways

Anonymous
07/25/24(Thu)01:50:06 No.101562555

Anonymous 07/25/24(Thu)01:50:06 No.101562555

>>101562367
I was using Stheno, Poppy Porpoise and even Gemma 2 27b at one point but I like Nemo, I'll be sticking with it for now until something better and faster comes along

>>101562471
I just threw everything named "Mistral" in SillyTavern and it worked, lamma-server I just passed
 -ngl 30 -c 65536 
for 30 layers offload and 64k context instead of 128 which I will never need personally

Anonymous
07/25/24(Thu)01:51:06 No.101562564

Anonymous 07/25/24(Thu)01:51:06 No.101562564

>>101561911
>Now, as for your pathetic todo list... Let me summarize it for you. You have three tasks: take appointments for car maintenance, check your savings account, and clean your room - which is currently a mess because, let's face it, you're a slob. And as for emails? Ha! You have zero. What a shockingly low number. I'm sure it's not because nobody wants to communicate with someone as useless as you.
i piped this into a TTS program that was trained on glados and it's actually making me horny, help

Anonymous
07/25/24(Thu)01:56:42 No.101562604

Anonymous 07/25/24(Thu)01:56:42 No.101562604

>>101562564
>GladOS
nice. I trained one on Princess Peach and it gives me instant diamond boners. I've had her say every variation of TND, many of my LLM logs, some random quotes.. Man this tech is insane.

Anonymous
07/25/24(Thu)01:58:44 No.101562615

Anonymous 07/25/24(Thu)01:58:44 No.101562615

all local models are just forever opium dreams designed to keep you happy and dumb, cut off from the world
SEEK GOD

Anonymous
07/25/24(Thu)01:59:25 No.101562620

Anonymous 07/25/24(Thu)01:59:25 No.101562620

>>101562604
>>101562564
How do I learn this power? And does it work with ST?

Anonymous
07/25/24(Thu)02:00:38 No.101562628

Anonymous 07/25/24(Thu)02:00:38 No.101562628

>>101562086
Yes. I did mention in the post that it took nearly an hour. I also did a pull on a different card that was much better written/formatted, continuing a chat I had going and it was pretty damn good. In an alternate timeline where VRAM was plentiful it would 100% be my daily driver over Mistral Large.

Anonymous
07/25/24(Thu)02:02:27 No.101562643

Anonymous 07/25/24(Thu)02:02:27 No.101562643

>>101562620
xtts2, its literally as easy as dropping a .wav sample in a folder, selecting it in the UI, and bob's yer uncle.
>and im not even using the best enhancement options out there (mostly because im retarded and cant figure it out)
https://voca.ro/12EsObQlgQvy

Anonymous
07/25/24(Thu)02:02:31 No.101562644

Anonymous 07/25/24(Thu)02:02:31 No.101562644

bitnetbwos...

Anonymous
07/25/24(Thu)02:05:19 No.101562672

Anonymous 07/25/24(Thu)02:05:19 No.101562672

>>101562643
arent there lots of xtts2 ui versions? which one do I need? and what kind of GPU? will rtx 3060 work?

Anonymous
07/25/24(Thu)02:05:43 No.101562674

Anonymous 07/25/24(Thu)02:05:43 No.101562674

>>101562620
i basically just output the result to a file and run piper TTS on it. if you talk to your LLM from a shell script, you can store real values in a variable and tell it stuff like "Tell me how many emails I have. The number is $unreademails" or you can even store your plaintext TODO file into a variable and tell it to read that. it hasn't hallucinated once so far aside and even seems to identify the masks i've marked as done. you can get some powerful results out of combining this stuff with unix shell scripting.

Anonymous
07/25/24(Thu)02:07:47 No.101562691

Anonymous 07/25/24(Thu)02:07:47 No.101562691

How much vram is needed for the full 128k context with nemo?

Anonymous
07/25/24(Thu)02:07:55 No.101562692

Anonymous 07/25/24(Thu)02:07:55 No.101562692

>>101562672
pretty sure its this one https://github.com/daswer123/xtts-webui
its been a while since i started using it
>3060
lmao im using a 1080 and even really really long prompts never take longer than like 20 seconds.

Anonymous
07/25/24(Thu)02:14:42 No.101562735

Anonymous 07/25/24(Thu)02:14:42 No.101562735

I'm using the mistral settings in sillytavern for nemo, but it fails to end generated messages at appropriate times and just veers off into weird cryptic shit

Anonymous
07/25/24(Thu)02:21:03 No.101562775

Anonymous 07/25/24(Thu)02:21:03 No.101562775

>i jokingly promised Holly we'd do ANAL in 6 months of getting to know each other
i'm a horrible person, and mini-magnum is actually pretty good. Very interested in how a higher quant version would stack up.

Anonymous
07/25/24(Thu)02:21:44 No.101562779

Anonymous 07/25/24(Thu)02:21:44 No.101562779

>>101562735
I don't have that problem with the Mistral preset. I'm using the latest staging version prehaps there's been a fix for it

Anonymous
07/25/24(Thu)02:24:35 No.101562793

Anonymous 07/25/24(Thu)02:24:35 No.101562793

File: file.png (7 KB, 466x61)

7 KB PNG

slightly late but kcpp with nemo support out

Anonymous
07/25/24(Thu)02:29:28 No.101562818

Anonymous 07/25/24(Thu)02:29:28 No.101562818

>>101561954
After a few cooms I get bored and normal RP is painfully incoherent.
It doesn't understand the concept of clothes, that you can't talk to someone without a phone if he is nowhere near you, etc.
These small things are killing me.

Anonymous
07/25/24(Thu)02:31:55 No.101562839

Anonymous 07/25/24(Thu)02:31:55 No.101562839

>>101561854
Why is llama3.1 70B so bad at coding?

Anonymous
07/25/24(Thu)02:33:02 No.101562847

Anonymous 07/25/24(Thu)02:33:02 No.101562847

>>101562691
With an 8.0bpw exl2 with q8 kv cache and context 128000, Windows has a total VRAM usage of 23.4 GB.

Anonymous
07/25/24(Thu)02:34:23 No.101562853

Anonymous 07/25/24(Thu)02:34:23 No.101562853

>>101562847
That's with the model loaded too? What's it with no context so I can subtract and find out how much the context itself uses?

Anonymous
07/25/24(Thu)02:49:35 No.101562937

Anonymous 07/25/24(Thu)02:49:35 No.101562937

>>101562853
Yes that is the VRAM usage with the model loaded with tabbyAP/exllamav2 0.1.7.

New measurement:
tabbyAPI not running: 0.9 GB GPU memory (0.8 dedicated / 0.1 shared)
running with 128000 context: 23.6 GB (23.4 GB dedicated / 0.2 shared)
running with 64000 context: 18.0 GB (17.9 GB dedicated / 0.2 shared)
running with 32000 context: 15.2 GB (15.0 GB dedicated / 0.2 shared)
running with 8192 context: 13.3 GB (13.1 GB dedicated / 0.2 shared)
running with 256 context, lowest allowed: 12.4 GB (12.3 GB / 0.2 shared, rounding didn't work out nicely)

This again is at 8.0bpw with cache_mode Q8.

Anonymous
07/25/24(Thu)03:05:36 No.101563032

Anonymous 07/25/24(Thu)03:05:36 No.101563032

>>101562615
in what way are local models not also the world?
>the world is what i tell you it is

Anonymous
07/25/24(Thu)03:09:59 No.101563058

Anonymous 07/25/24(Thu)03:09:59 No.101563058

>the MoE meme is dead
thank god

Anonymous
07/25/24(Thu)03:10:06 No.101563059

Anonymous 07/25/24(Thu)03:10:06 No.101563059

>>101562615
>SEEK GOD
Agreed. True Miku must be achieved. Local Miku General - A general dedicated to the discussion and development of Local Mikus.

Anonymous
07/25/24(Thu)03:14:14 No.101563080

Anonymous 07/25/24(Thu)03:14:14 No.101563080

>INFO: Metrics: 456 tokens generated in 97.26 seconds (Queue: 0.0 s, Process: 19 cached tokens and 21406 new tokens at 516.17 T/s, Generate: 8.17 T/s, Context: 21425 tokens)
isn't mistral-large significantly larger than cr+? this speed is very similar. granted i'm using 4.5bpw mistral-large and 6bpw cr+ but i would still have expected mistral-large to be a fair amount slower.
after using nemo for a few days going back to 8t/s is rough, though...

Anonymous
07/25/24(Thu)03:15:34 No.101563087

Anonymous 07/25/24(Thu)03:15:34 No.101563087

>>101561822
>I was pleasantly surprised from the get-go having used L3 storywriter and CR+.
That's weird, because all the outputs posted here were incredibly slopped to the point of cringe. I return to my storywriter outputs and they're a breath if fresh air in comparison, even though they lack any common sense, spatial awareness or story direction.

Anonymous
07/25/24(Thu)03:17:41 No.101563102

Anonymous 07/25/24(Thu)03:17:41 No.101563102

>mistral small
do you guys think it's 15B or something

Anonymous
07/25/24(Thu)03:23:55 No.101563132

Anonymous 07/25/24(Thu)03:23:55 No.101563132

>>101563080
Mistral Large 2 is 123B. CR+ is 105B.

Anonymous
07/25/24(Thu)03:27:38 No.101563150

Anonymous 07/25/24(Thu)03:27:38 No.101563150

>>101560232
nigga just distill the logits to 70b and 34b sizes respectively. there has been a trend amongst the AI labs to distill the gains from the big models either via logits or outputs.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/25/24(Thu)03:31:07 No.101563166

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/25/24(Thu)03:31:07 No.101563166

>>101560813
I think it makes more sense to look at Mean Δp and RMS Δp but LLaMA 3.1 does better than LLaMA 3 in those as well.

>>101560901
What I would assume is happening is that for the 3.1 training they changed the hyperparameters in such a way that the numerical values in the weights/activations end up being more even.
llama.cpp quantization essentially uses the same exponent for 16/32/256 values (instead of 1 exponent for 1 value with floats) so the numerical precision is better if the values all have roughly equal absolute values.

Do make sure though that you are using the same code for these calculations.
The default matrix multiplication method in llama.cpp was recently changed from cuBLAS to MMQ and the results will be slightly different between the two.

Anonymous
07/25/24(Thu)03:33:45 No.101563187

Anonymous 07/25/24(Thu)03:33:45 No.101563187

>>101562512
my guess is that besides a few small teams and large organizations, no one really has the proper setup for the training pipeline and the capital. sure if you're bankrolled by some rich guy you can do it but you'll need to find how to make money off that too - and anime & hentai media have retarded copyright laws depending on the region, so who'd want to risk getting sued by some jap company or worse, Sony?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/25/24(Thu)03:33:48 No.101563188

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/25/24(Thu)03:33:48 No.101563188

>>101560957
Noted but I don't think this will end up being useful for my purposes.

Anonymous
07/25/24(Thu)03:37:01 No.101563218

Anonymous 07/25/24(Thu)03:37:01 No.101563218

3.1 ggufs might be back thanks to ollama guy
>Nice. Tested doing a bunch of summaries using up the entire 128k context and the output looks good whereas on master it outputs broken garbage.
https://github.com/ggerganov/llama.cpp/pull/8676/#issuecomment-2249653389

Anonymous
07/25/24(Thu)03:38:41 No.101563229

Anonymous 07/25/24(Thu)03:38:41 No.101563229

Why no llama between 8 and 70b?

Anonymous
07/25/24(Thu)03:46:17 No.101563274

Anonymous 07/25/24(Thu)03:46:17 No.101563274

prompt processing is the bane of my existence on these large-context models
may as well just dump all 6000 tokens of lorebook into the context from the start rather than trying to use them normally

Anonymous
07/25/24(Thu)03:47:33 No.101563282

Anonymous 07/25/24(Thu)03:47:33 No.101563282

>>101563229
Because fuck you. There are only two use cases Meta cares about: evaluation and corporate usage.

Anonymous
07/25/24(Thu)03:47:58 No.101563289

Anonymous 07/25/24(Thu)03:47:58 No.101563289

>>101563274
Yeah...

llama_print_timings: load time = 95.89 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 108037.51 ms / 65533 tokens ( 1.65 ms per token, 606.58 tokens per second)
llama_print_timings: eval time = 3127.70 ms / 59 runs ( 53.01 ms per token, 18.86 tokens per second)
llama_print_timings: total time = 145147.80 ms / 65592 tokens

CtxLimit:131072/131072, Amt:60/60, Process:107.12s (1.6ms/T = 611.79T/s), Generate:4.57s (76.1ms/T = 13.14T/s), Total:111.68s (0.54T/s)

Anonymous
07/25/24(Thu)03:49:15 No.101563296

Anonymous 07/25/24(Thu)03:49:15 No.101563296

>>101560260
Powerinfer2 works partly thanks to MoE. There are probably even better models for local, but mixtral is best we have for now.

Dense models only make some sense for batch processing, not local. When you are working on 128+ conversations at a time, one of them will always need one of the parameters so using/predicting activation sparsity has limited use (will use a fraction of the FLOPs, but still needs the same memory bandwidth).

The future of local is predicted sparse activation.

Anonymous
07/25/24(Thu)03:59:52 No.101563372

Anonymous 07/25/24(Thu)03:59:52 No.101563372

>>101563289
It also slows down the higher it gets. Pretty speedy up to 64k then it crawls the last few k to 131k.

Anonymous
07/25/24(Thu)04:03:55 No.101563397

Anonymous 07/25/24(Thu)04:03:55 No.101563397

Think they plan on releasing a mistral medium 70b?

Anonymous
07/25/24(Thu)04:08:21 No.101563440

Anonymous 07/25/24(Thu)04:08:21 No.101563440

>>101563397
You mean a tune of Llama 3 70B like they made of Llama 2 65B? I assume not.

Anonymous
07/25/24(Thu)04:09:15 No.101563449

Anonymous 07/25/24(Thu)04:09:15 No.101563449

File: 1717392122186284.jpg (153 KB, 1280x1039)

153 KB JPG

openai won.

Anonymous
07/25/24(Thu)04:10:08 No.101563458

Anonymous 07/25/24(Thu)04:10:08 No.101563458

>>101563440
They did a tune of llama2-70b. There was no llama2-65b only a llama1-65b.

Anonymous
07/25/24(Thu)04:14:51 No.101563500

Anonymous 07/25/24(Thu)04:14:51 No.101563500

so 405b > mistral large for coding for sure. even had instances where 405b came up with something neater than sonnet 3.5
personality/rp wise, I prefer mistral large, 405b is just a tad more dull but it seems to depend quite a bit on the persona you are going for. very SFW stuff it does quite well
and mistrals pricing is a bit delusional given 405b is 1/3rd as expensive on openrouter
so 405b is quite a good release in my books, and either way we eating good this week

Anonymous
07/25/24(Thu)04:20:27 No.101563535

Anonymous 07/25/24(Thu)04:20:27 No.101563535

post your blackpill and whitepill
blackpill: there would be such a gap in performance and ability between >400b models and small distilled models that companies using the big models will be always ahead of the competition and big AI labs will gatekeep the API access for these big ones
whitepill: there's a huge amount of AI research (most of which are still underdiscussed) that a few breakthroughs here and there from research teams or anons will make the small models capable enough that small-medium companies can just tune it on their own use cases and be good enough for 90% of the time
graypill: most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse

Anonymous
07/25/24(Thu)05:03:02 No.101563779

Anonymous 07/25/24(Thu)05:03:02 No.101563779

>>101563535
As a whitepill I'd add that we're probably on the cusp of widespread adoption of simultaneuos translation within a few years. That's a very big deal, especially in countries where the vast majority of the population don't speak english (or not very well). Communication is going to get a lot more efficient.

Anonymous
07/25/24(Thu)05:03:23 No.101563784

Anonymous 07/25/24(Thu)05:03:23 No.101563784

File: file.png (56 KB, 979x512)

56 KB PNG

Does Mistral Large not work in Koboldcpp yet, or is my download broken?
The changelog for the latest version says that Nemo is supported now, but Large might be different enough again to not work?

Anonymous
07/25/24(Thu)05:05:17 No.101563794

Anonymous 07/25/24(Thu)05:05:17 No.101563794

>>101563784
>he downloaded from 'rancher

Anonymous
07/25/24(Thu)05:08:06 No.101563818

Anonymous 07/25/24(Thu)05:08:06 No.101563818

>>101563784
just to make sure, you do have the two parts, part1of2 and part2of2 in the same folder, yes?

Anonymous
07/25/24(Thu)05:08:12 No.101563821

Anonymous 07/25/24(Thu)05:08:12 No.101563821

File: file.png (53 KB, 828x460)

53 KB PNG

What are good nemo presets?
It's a bit too sovlful for me atm

Anonymous
07/25/24(Thu)05:09:16 No.101563832

Anonymous 07/25/24(Thu)05:09:16 No.101563832

File: file.png (9 KB, 816x47)

9 KB PNG

>>101563818
Yes, and the file sizes look plausible

Anonymous
07/25/24(Thu)05:10:15 No.101563842

Anonymous 07/25/24(Thu)05:10:15 No.101563842

>>101563832
you need to concatenate with copy /b or something
the retard you downloaded from doesn't split them with gguf-split but manually for whatever reason

Anonymous
07/25/24(Thu)05:17:36 No.101563893

Anonymous 07/25/24(Thu)05:17:36 No.101563893

>>101563842
that worked, thanks

Anonymous
07/25/24(Thu)05:23:32 No.101563915

Anonymous 07/25/24(Thu)05:23:32 No.101563915

mistral large same prompt format as 8x22 right?

Anonymous
07/25/24(Thu)05:33:09 No.101563975

Anonymous 07/25/24(Thu)05:33:09 No.101563975

>>101563535
Open models are all made by cowards using the most traditional known to work transformers and training methods.

GPT2+swiglu+ROPE+minor-tweaks => every fucking open model except Mixtral. Open models are at a standstill because they are all sackless.

Anonymous
07/25/24(Thu)05:35:11 No.101563984

Anonymous 07/25/24(Thu)05:35:11 No.101563984

>>101563821
For basic blah blah I just run neutralized samplers + 0.1 min p. Smooth sampling optional.

Anonymous
07/25/24(Thu)05:41:27 No.101564020

Anonymous 07/25/24(Thu)05:41:27 No.101564020

So how much does it cost to make a LoRA for 405b?

Anonymous
07/25/24(Thu)05:46:59 No.101564053

Anonymous 07/25/24(Thu)05:46:59 No.101564053

>>101564020
Hard to say. I don't think there's even a remote workstation big enough to rent and fit the entire thing in memory.
Totally unfeasible for average person, even with a lot of spare cash.

Anonymous
07/25/24(Thu)05:48:19 No.101564059

Anonymous 07/25/24(Thu)05:48:19 No.101564059

>>101564053
I have a small company.

Could I do it for 50k?

Anonymous
07/25/24(Thu)05:55:31 No.101564090

Anonymous 07/25/24(Thu)05:55:31 No.101564090

I realized that the context template that was posted here for mistral nemo makes it go schizo
I switched back to the old mistral one

Anonymous
07/25/24(Thu)05:55:55 No.101564092

Anonymous 07/25/24(Thu)05:55:55 No.101564092

>>101564059
for 50k it's doable. No idea why you would want to do it tho

Anonymous
07/25/24(Thu)06:00:35 No.101564120

Anonymous 07/25/24(Thu)06:00:35 No.101564120

Pleb weights for mistral large dropping
https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/tree/main

Anonymous
07/25/24(Thu)06:08:09 No.101564169

Anonymous 07/25/24(Thu)06:08:09 No.101564169

File: iq1s.png (47 KB, 623x279)

47 KB PNG

>>101564120
Seeing picrel, I notice that not even IQ1_S (the smallest quantization) is fully ternary. How did llama.cpp developers determine what quantization levels to use for the various layers?

Anonymous
07/25/24(Thu)06:08:26 No.101564172

Anonymous 07/25/24(Thu)06:08:26 No.101564172

What is it about this whole topic of opensource? Big tech gives us the scraps from the table and we worship them in the hope that another crumb will soon fall from the table - because of the thing itself, nothing will change - do I have to redefine open source for myself as a kind of saliva licker?
But le ecosystem? :>

Anonymous
07/25/24(Thu)06:18:08 No.101564230

Anonymous 07/25/24(Thu)06:18:08 No.101564230

>>101564172

It's "open" if you have money. I'd rather pay the Nvidia tax than have Altman and his cronies have complete, unfettered access to the tech and bar us plebs from ever being able to ERP in peace.

Anonymous
07/25/24(Thu)06:31:23 No.101564345

Anonymous 07/25/24(Thu)06:31:23 No.101564345

>>101562512
>why does no one invests into making porn more available with no benefit
gee i wonder

Anonymous
07/25/24(Thu)07:01:24 No.101564545

Anonymous 07/25/24(Thu)07:01:24 No.101564545

>>101562518
The next pony is going to be so cucked you won't believe it.

Anonymous
07/25/24(Thu)07:04:19 No.101564560

Anonymous 07/25/24(Thu)07:04:19 No.101564560

>>101562533
>pony had to nuke all the artist tags
The pony author certainly didn't "have" to, but did anyway. And the next pony model is almost certainly also going to be pruned of cunny-adjacent stuff as well, among other things.

Anonymous
07/25/24(Thu)07:19:34 No.101564680

Anonymous 07/25/24(Thu)07:19:34 No.101564680

I have an extra $1000 unexpectedly, can I get a lot of vram for this money? Not buying a 3090, I either get 48gb+ vram or nothing

Anonymous
07/25/24(Thu)07:23:12 No.101564708

Anonymous 07/25/24(Thu)07:23:12 No.101564708

>>101563535
>graypill: most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse
thats the most pathetic cope from codeshitters yet

Anonymous
07/25/24(Thu)07:27:27 No.101564734

Anonymous 07/25/24(Thu)07:27:27 No.101564734

>>101564120
Is IQ2_K usable? I really don't want to offload.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/25/24(Thu)07:29:59 No.101564754

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/25/24(Thu)07:29:59 No.101564754

>>101564169
I think I. Kawrakow did it by just manually testing which layers are most sensitive to a loss in precision and assuming that the combinations that work well for one model will generalize to all models.

Anonymous
07/25/24(Thu)07:34:48 No.101564797

Anonymous 07/25/24(Thu)07:34:48 No.101564797

>>101564734
Looks promising on first impression for its size class.

Anonymous
07/25/24(Thu)07:35:00 No.101564800

Anonymous 07/25/24(Thu)07:35:00 No.101564800

None of the mistral ST presets work for me. I gave up and changed to Alpaca and shit just werked. Yes I have immense skill issue

Anonymous
07/25/24(Thu)07:39:05 No.101564837

Anonymous 07/25/24(Thu)07:39:05 No.101564837

>>101560019
is this done with an llm? if so, very good.

Anonymous
07/25/24(Thu)07:39:29 No.101564844

Anonymous 07/25/24(Thu)07:39:29 No.101564844

>>101563449
When was lmsys last relevant? Even a 7b can answer 99% of the questions normies ask, and it just turns into a readability/presentation benchmark

Anonymous
07/25/24(Thu)07:41:15 No.101564858

Anonymous 07/25/24(Thu)07:41:15 No.101564858

llama 3.1 mmproj when?

Anonymous
07/25/24(Thu)07:41:34 No.101564862

Anonymous 07/25/24(Thu)07:41:34 No.101564862

>downloaded Mini-Magnum
>can't load the GGUF in ooba
Am I retarded or is it not supported yet? I've been out of the loop.

Anonymous
07/25/24(Thu)07:42:28 No.101564869

Anonymous 07/25/24(Thu)07:42:28 No.101564869

someone asked about a bitesize model that could be packaged with a game the other day
what did they/you decide on

Anonymous
07/25/24(Thu)07:43:43 No.101564878

Anonymous 07/25/24(Thu)07:43:43 No.101564878

>>101564680
You could buy a pair of 24GB P40 graphics cards for under $1k.

Anonymous
07/25/24(Thu)07:49:14 No.101564925

Anonymous 07/25/24(Thu)07:49:14 No.101564925

>>101564734
In my opinion < 4 BPW is not worth the drop in quality.

Anonymous
07/25/24(Thu)07:51:31 No.101564943

Anonymous 07/25/24(Thu)07:51:31 No.101564943

anything better than mistral for vramlets yet?

Anonymous
07/25/24(Thu)07:52:31 No.101564953

Anonymous 07/25/24(Thu)07:52:31 No.101564953

>>101564943
>mistral
MIXtral*

Anonymous
07/25/24(Thu)07:53:09 No.101564959

Anonymous 07/25/24(Thu)07:53:09 No.101564959

>>101564943
No, nothing, none whatsoever.
Mistral is great, amazing, spectacular.

Anonymous
07/25/24(Thu)07:53:49 No.101564969

Anonymous 07/25/24(Thu)07:53:49 No.101564969

Is the PygmalionAI team fine tuning the LLAMA 3.1?

Anonymous
07/25/24(Thu)07:56:45 No.101564996

Anonymous 07/25/24(Thu)07:56:45 No.101564996

>>101564943
Gemma 27b q4_k_m

Anonymous
07/25/24(Thu)07:56:47 No.101564997

Anonymous 07/25/24(Thu)07:56:47 No.101564997

Large2 is slop

Anonymous
07/25/24(Thu)08:01:56 No.101565037

Anonymous 07/25/24(Thu)08:01:56 No.101565037

>>101564969
Why would you care? Finetuning smart and engaging models has gotten too complex for two and a half people in their basement to create something worthwhile as of July 2024. That also applies to most other would-be LLM finetuners, btw.

Keep enjoying your Nemo/Gemma/Mistral Large/etc.

Anonymous
07/25/24(Thu)08:04:06 No.101565056

Anonymous 07/25/24(Thu)08:04:06 No.101565056

>>101565037
>t. skillet

Anonymous
07/25/24(Thu)08:04:39 No.101565060

Anonymous 07/25/24(Thu)08:04:39 No.101565060

>>101564996
Gemma 2 27B Q6_K with output and embed tensors quantized to Q8_0 can be fully loaded in 24GB. Now it just needs FlashAttention2 support to work for 8k tokens context...

Anonymous
07/25/24(Thu)08:08:00 No.101565083

Anonymous 07/25/24(Thu)08:08:00 No.101565083

>>101563535
>most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse
that doesnt even make sense.
where is software good?
OS all suck now.
Even the internet sucks. I swear I had better loading times with my 56k modem.
The pics were slow but the page was there instantly.
Everything seems to get worse for a while now actually.

Anonymous
07/25/24(Thu)08:10:04 No.101565096

Anonymous 07/25/24(Thu)08:10:04 No.101565096

mini-magnum is pretty good for RP compared to the previous slop people dished out. And I was using L3-Euryale-2.1 before this (sold by 2nd 3090)

Anonymous
07/25/24(Thu)08:10:18 No.101565097

Anonymous 07/25/24(Thu)08:10:18 No.101565097

>>101565056
That their latest experimental model (Magnum-72B) is based on Qwen2-***Instruct*** demonstrates exactly that. Finetuning base models into something usable and smart has become too cumbersome/expensive/risky. You can't just use chat logs anymore like in early 2023.

Anonymous
07/25/24(Thu)08:12:12 No.101565105

Anonymous 07/25/24(Thu)08:12:12 No.101565105

File: Nero.jpg (18 KB, 1047x86)

18 KB JPG

>testing Nemo GGUF
>doesn't seem to be as good as people say it is
>ask it if it's really Nemo
huh?

Anonymous
07/25/24(Thu)08:15:02 No.101565121

Anonymous 07/25/24(Thu)08:15:02 No.101565121

File: yakub.jpg (7 KB, 196x257)

7 KB JPG

>>101565105
>neroid empowered rational operator

Anonymous
07/25/24(Thu)08:18:07 No.101565154

Anonymous 07/25/24(Thu)08:18:07 No.101565154

>>101565105
What the fuck?

Anonymous
07/25/24(Thu)08:18:22 No.101565159

Anonymous 07/25/24(Thu)08:18:22 No.101565159

>>101565105
This looks like a typical jailbreak misfire, check your prompt retard

Anonymous
07/25/24(Thu)08:19:00 No.101565166

Anonymous 07/25/24(Thu)08:19:00 No.101565166

>>101565105
jej

Anonymous
07/25/24(Thu)08:19:49 No.101565178

Anonymous 07/25/24(Thu)08:19:49 No.101565178

>>101565105
[INST] and [/INST] are special tokens in Nemo, you don't need to add spaces around them.

Anonymous
07/25/24(Thu)08:26:19 No.101565244

Anonymous 07/25/24(Thu)08:26:19 No.101565244

File: lecun_dontworkonllms.png (462 KB, 580x895)

462 KB PNG

>>101565037
>t. LeCun

Anonymous
07/25/24(Thu)08:28:39 No.101565273

Anonymous 07/25/24(Thu)08:28:39 No.101565273

>>101565159
specifically wiped context memory for this little test
i had a first prompt with just "who are you" to which it answered "I am a large language model developed by Mistral in cooperation with NVIDIA"
but i wanted it to say Nemo, hence the second question
>>101565178
ok thanks

Anonymous
07/25/24(Thu)08:31:29 No.101565308

Anonymous 07/25/24(Thu)08:31:29 No.101565308

>>101563087
They do slip in sloppy output at times, but the thing I like about it is its variety. Unlike previous untuned mistrals where it's all dry and clinical no matter what you do, swiping does get you somewhere else. I'll keep testing it and see.

Anonymous
07/25/24(Thu)08:49:20 No.101565467

Anonymous 07/25/24(Thu)08:49:20 No.101565467

>>101563915
With a space after [INST] and [/INST], but not before the later.
<s>[INST] system message

user message[/INST] assistant response</s>

Anonymous
07/25/24(Thu)08:52:57 No.101565504

Anonymous 07/25/24(Thu)08:52:57 No.101565504

>>101565244
I think he is 100% correct. I will reconsider when I get even one model that never mentions shivers when it sucks my cock. But I don't think that is gonna happen.

Anonymous
07/25/24(Thu)08:56:12 No.101565537

Anonymous 07/25/24(Thu)08:56:12 No.101565537

What are the political implications of perfectly predicted next token != actual fact / truth / solution to a problem.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.