/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 07/24/24(Wed)20:45:13 No.101560013

File: miku-hand-out+.jpg (236 KB, 584x1024)

236 KB JPG

/lmg/ - Local Models General Anonymous 07/24/24(Wed)20:45:13 No.101560013

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101556980 & >>101553102

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/24/24(Wed)20:45:54 No.101560019

Anonymous 07/24/24(Wed)20:45:54 No.101560019

File: 1685790540409069.jpg (197 KB, 1024x1024)

197 KB JPG

►Recent Highlights from the Previous Thread: >>101556980

--Mistal Large Instruct 2407 configuration discussion: >>101558047 >>101558099 >>101558116 >>101558134 >>101558149 >>101558154 >>101558583
--Llama 3.1 models removed from benchmark chart, L3.1 8B praised for accessibility and SOLV: >>101557528 >>101557550 >>101557594 >>101557614 >>101558284
--Hugging Face's profitability and sustainability: >>101557631 >>101557747 >>101557792 >>101557849 >>101558031
--OpenAI offers GPT-4 fine-tuning to tier 4 and 5 users: >>101558481
--Hiding timestamps and showing models in SillyTavern: >>101557105 >>101557166 >>101557243
--Anon shares their regret after paying for Claude and another user shares their experience with multiple AI models: >>101557317 >>101557453
--Mistral Large 2 (2407) performance and prompt template impact: >>101557016 >>101557018 >>101557033 >>101557144 >>101557127
--Logs: Anon shares a chatlog generated by large 2: >>101558282
--Logs: Zhanglii presents a humorous origin story for an internet slang phrase, "Dicks out for Harambe," leading to a lighthearted conversation with Johngi.: >>101557207
--Ollama guy fixes llama 3.1 rope scaling factors: >>101557334
--Llama version 3 naming controversy: >>101558609 >>101558613 >>101558627 >>101558636 >>101558649
--Discussion about cognitivecomputations/dolphin-2.9.3-mistral-nemo-12b-gguf: >>101558208 >>101558545 >>101558600 >>101558652 >>101558563 >>101558574
--Availability and potential issues of new Mistral quants: >>101557744 >>101557771 >>101557891
--hfchat.py script and desire for Mistral Large model: >>101559118 >>101559135 >>101559156
--Comparing 200B parameter models with the human brain: >>101557573 >>101557623 >>101557675 >>101557720 >>101557751 >>101557788 >>101557976 >>101558119 >>101558171 >>101558393 >>101558585 >>101558608 >>101558686 >>101558438 >>101558446
--Miku (free space): >>101557331

►Recent Highlight Posts from the Previous Thread: >>101556983

Anonymous
07/24/24(Wed)20:50:44 No.101560063

Anonymous 07/24/24(Wed)20:50:44 No.101560063

>>101560019
What a shit recap.

Anonymous
07/24/24(Wed)20:54:45 No.101560116

Anonymous 07/24/24(Wed)20:54:45 No.101560116

>>101560063
rude

Anonymous
07/24/24(Wed)20:55:22 No.101560123

Anonymous 07/24/24(Wed)20:55:22 No.101560123

>>101560019
recap anon... are you okay..? you've fallen off...

Anonymous
07/24/24(Wed)20:56:56 No.101560137

Anonymous 07/24/24(Wed)20:56:56 No.101560137

>>101560123
explain step by step what is wrong with the recap

Anonymous
07/24/24(Wed)20:57:46 No.101560145

Anonymous 07/24/24(Wed)20:57:46 No.101560145

>>101560019
we've complained so much about companies making slop that we failed to realize we are the slop

Anonymous
07/24/24(Wed)21:02:44 No.101560195

Anonymous 07/24/24(Wed)21:02:44 No.101560195

>>101560019
I like this Llamiku

Anonymous
07/24/24(Wed)21:03:59 No.101560202

Anonymous 07/24/24(Wed)21:03:59 No.101560202

>>101560019
>SOLV
I've always been saying how utterly garbage these recaps have been for the past few months, but this really takes the cake.

Anonymous
07/24/24(Wed)21:04:17 No.101560203

Anonymous 07/24/24(Wed)21:04:17 No.101560203

>>101560145
that's really quite profound

Anonymous
07/24/24(Wed)21:06:12 No.101560219

Anonymous 07/24/24(Wed)21:06:12 No.101560219

largestral is like CR+ but smarter and better in every way
except their tokenizer which fucking sucks ass and infuriates me as I watch the word "craftsmanship" crawl across my screen split into 3 tokens at a 2t/s clip
but wow the sovl and practical intelligence in RP, this thing feels great

Sam Altman !!jRGzF58XtVw
07/24/24(Wed)21:07:19 No.101560227

Sam Altman !!jRGzF58XtVw 07/24/24(Wed)21:07:19 No.101560227

>>101560019
Your recaps are garbage and you're a loser, go touch grass.

Anonymous
07/24/24(Wed)21:07:51 No.101560232

Anonymous 07/24/24(Wed)21:07:51 No.101560232

>>101560219
If you want a dumber but much faster version Nemo is worth trying. Hopefully mistral ends up releasing a mid sized model.

Anonymous
07/24/24(Wed)21:08:40 No.101560240

Anonymous 07/24/24(Wed)21:08:40 No.101560240

>>101560219
do we have a new ERP king?

Anonymous
07/24/24(Wed)21:08:54 No.101560241

Anonymous 07/24/24(Wed)21:08:54 No.101560241

>>101560227
Woops, forgot to remove the trip. Ignore that.

Anonymous
07/24/24(Wed)21:11:50 No.101560260

Anonymous 07/24/24(Wed)21:11:50 No.101560260

>>101560232
>Mixtral-Instruct-2409
Though it has been relegated to "research" as per
https://mistral.ai/technology/#models
Seems MoE was a meme
>General purpose models
>Mistral Nemo
>Mistral Large 2
>Research models
>Mixtral
>Available in 8x7 and 8×22 sizes

Anonymous
07/24/24(Wed)21:12:25 No.101560268

Anonymous 07/24/24(Wed)21:12:25 No.101560268

>>101560019
by far your worst recap is months

Anonymous
07/24/24(Wed)21:13:24 No.101560276

Anonymous 07/24/24(Wed)21:13:24 No.101560276

>>101560268
The thread was also awful to be fair.

Anonymous
07/24/24(Wed)21:13:41 No.101560282

Anonymous 07/24/24(Wed)21:13:41 No.101560282

Does that Route LLM thing work with models on the local network? or is it either you have ollama running on the machine, or you use a service

Anonymous
07/24/24(Wed)21:18:20 No.101560331

Anonymous 07/24/24(Wed)21:18:20 No.101560331

>>101560260
>Seems MoE was a meme
MoE is a meme for open source model releases since barely anyone can afford to run them properly, you might as well release a dense model, after all you don't get paid for open sourced models so they're only as valuable as the audience that can use them
internally though,..

Anonymous
07/24/24(Wed)21:26:01 No.101560406

Anonymous 07/24/24(Wed)21:26:01 No.101560406

Has anyone tried Meta-Llama-3.1-70B-Instruct yet assistant? It's taking me 3 hours to download.

Anonymous
07/24/24(Wed)21:30:29 No.101560454

Anonymous 07/24/24(Wed)21:30:29 No.101560454

>>101560406
you can't run it

Anonymous
07/24/24(Wed)21:31:50 No.101560464

Anonymous 07/24/24(Wed)21:31:50 No.101560464

>>101560240
in my opinion yes, and it isn't particularly close either
before there were a few choices I would bounce back and forth on (CR+, wiz8x22, qwen/magnum) but this kind of smokes all of them tbdesu. sucks how slow it is but I hardly ever want to reroll with this thing, it just gets every single card I throw at it and clearly knows how to write smut.

Anonymous
07/24/24(Wed)21:34:03 No.101560498

Anonymous 07/24/24(Wed)21:34:03 No.101560498

Mistral won

Anonymous
07/24/24(Wed)21:34:38 No.101560514

Anonymous 07/24/24(Wed)21:34:38 No.101560514

>>101560454
With koboldcpp? Why not?

Anonymous
07/24/24(Wed)21:35:11 No.101560521

Anonymous 07/24/24(Wed)21:35:11 No.101560521

>have to run mistral large at 4bpw
no...

Anonymous
07/24/24(Wed)21:35:51 No.101560530

Anonymous 07/24/24(Wed)21:35:51 No.101560530

>>101560464
Is it true? Did we finally get Sonnet/Claude but at home? I've long since stopped using 4o but I still find myself using Claude/Sonnet 3.5 for *safe* shit. Did the french actually succeed?

Anonymous
07/24/24(Wed)21:35:55 No.101560531

Anonymous 07/24/24(Wed)21:35:55 No.101560531

>>101560514
at all

Anonymous
07/24/24(Wed)21:36:15 No.101560534

Anonymous 07/24/24(Wed)21:36:15 No.101560534

I tried vLLM's distributed inference thingy on a single PC by giving each instance one GPU and it run 40% slower than just using both GPUs on a single instance. Is there still hope of running Mistral Large through 2 PCs?

Anonymous
07/24/24(Wed)21:36:19 No.101560537

Anonymous 07/24/24(Wed)21:36:19 No.101560537

Llama 3.0 8B Q_0 KLD
>>101243361

Llama 3.1 8B Q8_0
====== Perplexity statistics ======
Mean PPL(Q) : 6.231377 ± 0.038219
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.99%
Mean ln(PPL(Q)/PPL(base)) : 0.001101 ± 0.000103
Mean PPL(Q)/PPL(base) : 1.001102 ± 0.000103
Mean PPL(Q)-PPL(base) : 0.006860 ± 0.000642

====== KL divergence statistics ======
Mean KLD: 0.000542 ± 0.000004
Maximum KLD: 0.347069
99.9% KLD: 0.014255
99.0% KLD: 0.003976
99.0% KLD: 0.003976
Median KLD: 0.000331
10.0% KLD: 0.000007
5.0% KLD: 0.000001
1.0% KLD: -0.000001
Minimum KLD: -0.000131

====== Token probability statistics ======
Mean Δp: -0.015 ± 0.002 %
Maximum Δp: 17.019%
99.9% Δp: 4.236%
99.0% Δp: 1.890%
95.0% Δp: 0.915%
90.0% Δp: 0.539%
75.0% Δp: 0.113%
Median Δp: -0.000%
25.0% Δp: -0.140%
10.0% Δp: -0.589%
5.0% Δp: -0.967%
1.0% Δp: -1.968%
0.1% Δp: -4.475%
Minimum Δp: -37.109%
RMS Δp : 0.669 ± 0.009 %
Same top p: 98.817 ± 0.029 %

1/5

Anonymous
07/24/24(Wed)21:36:50 No.101560547

Anonymous 07/24/24(Wed)21:36:50 No.101560547

>>101560530
for rp / creative uses it feels like claude

Anonymous
07/24/24(Wed)21:37:47 No.101560558

Anonymous 07/24/24(Wed)21:37:47 No.101560558

>>101560537

3.0 Q6_K KLD
>>101465239

3.1 Q6_K KLD
====== Perplexity statistics ======
Mean PPL(Q) : 6.248284 ± 0.038344
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.93%
Mean ln(PPL(Q)/PPL(base)) : 0.003811 ± 0.000223
Mean PPL(Q)/PPL(base) : 1.003818 ± 0.000224
Mean PPL(Q)-PPL(base) : 0.023766 ± 0.001403

====== KL divergence statistics ======
Mean KLD: 0.002956 ± 0.000023
Maximum KLD: 1.885702
99.9% KLD: 0.082009
99.0% KLD: 0.023328
99.0% KLD: 0.023328
Median KLD: 0.001754
10.0% KLD: 0.000038
5.0% KLD: 0.000008
1.0% KLD: 0.000000
Minimum KLD: -0.000037

====== Token probability statistics ======
Mean Δp: -0.069 ± 0.004 %
Maximum Δp: 35.571%
99.9% Δp: 9.052%
99.0% Δp: 4.206%
95.0% Δp: 2.002%
90.0% Δp: 1.179%
75.0% Δp: 0.245%
Median Δp: -0.000%
25.0% Δp: -0.336%
10.0% Δp: -1.387%
5.0% Δp: -2.280%
1.0% Δp: -4.755%
0.1% Δp: -11.668%
Minimum Δp: -84.220%
RMS Δp : 1.557 ± 0.021 %
Same top p: 97.287 ± 0.043 %

2/5

Anonymous
07/24/24(Wed)21:38:40 No.101560568

Anonymous 07/24/24(Wed)21:38:40 No.101560568

File: ButWhy.jpg (44 KB, 926x454)

44 KB JPG

>>101560531
Why nigga? It's not anything different.

Anonymous
07/24/24(Wed)21:38:47 No.101560571

Anonymous 07/24/24(Wed)21:38:47 No.101560571

>>101560558

Q4_K_M
====== Perplexity statistics ======
Mean PPL(Q) : 6.373491 ± 0.039159
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.56%
Mean ln(PPL(Q)/PPL(base)) : 0.023651 ± 0.000576
Mean PPL(Q)/PPL(base) : 1.023933 ± 0.000590
Mean PPL(Q)-PPL(base) : 0.148974 ± 0.003764

====== KL divergence statistics ======
Mean KLD: 0.020382 ± 0.000127
Maximum KLD: 3.642657
99.9% KLD: 0.598334
99.0% KLD: 0.172493
99.0% KLD: 0.172493
Median KLD: 0.011333
10.0% KLD: 0.000301
5.0% KLD: 0.000071
1.0% KLD: 0.000006
Minimum KLD: -0.000055

====== Token probability statistics ======
Mean Δp: -0.602 ± 0.011 %
Maximum Δp: 53.307%
99.9% Δp: 21.070%
99.0% Δp: 9.209%
95.0% Δp: 4.134%
90.0% Δp: 2.277%
75.0% Δp: 0.324%
Median Δp: -0.038%
25.0% Δp: -1.233%
10.0% Δp: -4.069%
5.0% Δp: -6.513%
1.0% Δp: -13.942%
0.1% Δp: -35.467%
Minimum Δp: -87.109%
RMS Δp : 4.027 ± 0.033 %
Same top p: 93.441 ± 0.065 %

3/5

Anonymous
07/24/24(Wed)21:39:48 No.101560585

Anonymous 07/24/24(Wed)21:39:48 No.101560585

>>101560571

Q4_K_S
====== Perplexity statistics ======
Mean PPL(Q) : 6.453672 ± 0.039692
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.33%
Mean ln(PPL(Q)/PPL(base)) : 0.036153 ± 0.000713
Mean PPL(Q)/PPL(base) : 1.036815 ± 0.000739
Mean PPL(Q)-PPL(base) : 0.229154 ± 0.004773

====== KL divergence statistics ======
Mean KLD: 0.030396 ± 0.000185
Maximum KLD: 5.778175
99.9% KLD: 0.845149
99.0% KLD: 0.249537
99.0% KLD: 0.249537
Median KLD: 0.017440
10.0% KLD: 0.000543
5.0% KLD: 0.000135
1.0% KLD: 0.000011
Minimum KLD: -0.000094

====== Token probability statistics ======
Mean Δp: -0.901 ± 0.013 %
Maximum Δp: 56.553%
99.9% Δp: 24.226%
99.0% Δp: 10.925%
95.0% Δp: 4.849%
90.0% Δp: 2.586%
75.0% Δp: 0.299%
Median Δp: -0.085%
25.0% Δp: -1.720%
10.0% Δp: -5.258%
5.0% Δp: -8.323%
1.0% Δp: -17.439%
0.1% Δp: -41.818%
Minimum Δp: -97.459%
RMS Δp : 4.938 ± 0.038 %
Same top p: 92.029 ± 0.072 %

4/5

Anonymous
07/24/24(Wed)21:40:46 No.101560599

Anonymous 07/24/24(Wed)21:40:46 No.101560599

tl;dr

Anonymous
07/24/24(Wed)21:40:58 No.101560604

Anonymous 07/24/24(Wed)21:40:58 No.101560604

>>101560585

Q4_0
====== Perplexity statistics ======
Mean PPL(Q) : 6.508124 ± 0.039897
Mean PPL(base) : 6.224517 ± 0.038156
Cor(ln(PPL(Q)), ln(PPL(base))): 99.00%
Mean ln(PPL(Q)/PPL(base)) : 0.044555 ± 0.000867
Mean PPL(Q)/PPL(base) : 1.045563 ± 0.000906
Mean PPL(Q)-PPL(base) : 0.283606 ± 0.005784

====== KL divergence statistics ======
Mean KLD: 0.044612 ± 0.000258
Maximum KLD: 4.925312
99.9% KLD: 1.253007
99.0% KLD: 0.373860
99.0% KLD: 0.373860
Median KLD: 0.025912
10.0% KLD: 0.000735
5.0% KLD: 0.000174
1.0% KLD: 0.000017
Minimum KLD: -0.000004

====== Token probability statistics ======
Mean Δp: -1.243 ± 0.016 %
Maximum Δp: 77.098%
99.9% Δp: 27.113%
99.0% Δp: 12.972%
95.0% Δp: 5.607%
90.0% Δp: 2.940%
75.0% Δp: 0.312%
Median Δp: -0.109%
25.0% Δp: -2.224%
10.0% Δp: -6.750%
5.0% Δp: -10.463%
1.0% Δp: -22.113%
0.1% Δp: -54.029%
Minimum Δp: -98.159%
RMS Δp : 6.129 ± 0.044 %
Same top p: 90.422 ± 0.078 %

I got lazy thinking about formatting this for presentation so I'm just posting it all here raw.
5/5

Anonymous
07/24/24(Wed)21:41:07 No.101560608

Anonymous 07/24/24(Wed)21:41:07 No.101560608

>>101560537
Do you not realize that perplexity means shit when comparing different models?

Anonymous
07/24/24(Wed)21:41:22 No.101560611

Anonymous 07/24/24(Wed)21:41:22 No.101560611

>>101560537
>>101560558
>>101560571
>>101560585
>>101560604
all me

Anonymous
07/24/24(Wed)21:42:02 No.101560621

Anonymous 07/24/24(Wed)21:42:02 No.101560621

>>101560604
>I got lazy thinking about formatting this for presentation so I'm just posting it all here raw.
Could have put it in a pastebin or gsheet or whatever.
But regardless, thank you for the data.

Anonymous
07/24/24(Wed)21:43:06 No.101560629

Anonymous 07/24/24(Wed)21:43:06 No.101560629

use your favorite model to summarize what that anon sent, let's see who wins

Anonymous
07/24/24(Wed)21:43:55 No.101560638

Anonymous 07/24/24(Wed)21:43:55 No.101560638

omg stfu with your numbers nerd. Which model extracts semen the best?

Anonymous
07/24/24(Wed)21:44:00 No.101560639

Anonymous 07/24/24(Wed)21:44:00 No.101560639

>>101560608
I responded to a post about this last time as well. >>101465279

Anonymous
07/24/24(Wed)21:44:49 No.101560645

Anonymous 07/24/24(Wed)21:44:49 No.101560645

>>101560638
The only relevant benchmark.

Anonymous
07/24/24(Wed)21:44:52 No.101560647

Anonymous 07/24/24(Wed)21:44:52 No.101560647

>>101560638
Mistral large 2

Anonymous
07/24/24(Wed)21:46:20 No.101560663

Anonymous 07/24/24(Wed)21:46:20 No.101560663

>>101560648
So like everyone keeps saying. Below 6 bit is retarded.

Anonymous
07/24/24(Wed)21:51:40 No.101560729

Anonymous 07/24/24(Wed)21:51:40 No.101560729

>>101560663
Will still be smarter than 8B L3.1

Anonymous
07/24/24(Wed)21:59:49 No.101560813

Anonymous 07/24/24(Wed)21:59:49 No.101560813

Wait a second.
For same top token:
3.0 Q8 = 98.380 ± 0.033 %
3.1 Q8 = 98.817 ± 0.029 %
3.0 Q6 = 94.781 ± 0.059 %
3.1 Q6 = 97.287 ± 0.043 %
So the new 3.1 is actually less affected by quanting than old 3.0 at least for these two quants which we have numbers for.

Anonymous
07/24/24(Wed)22:01:46 No.101560830

Anonymous 07/24/24(Wed)22:01:46 No.101560830

72GB VRAM bros, what quant of mistral large are you using?

Anonymous
07/24/24(Wed)22:06:24 No.101560878

Anonymous 07/24/24(Wed)22:06:24 No.101560878

>>101560830
probably the biggest one that fits in 72GB of vram, i suppose

Anonymous
07/24/24(Wed)22:09:43 No.101560900

Anonymous 07/24/24(Wed)22:09:43 No.101560900

File: _e4cdd9b3-a46a-47f6-972f-(...).jpg (125 KB, 1024x1024)

125 KB JPG

I'm going to put up a modern q8 quant of mpt-30b-chat late tonight. I fucked around quite a bit today with openllm, using it to make an fp16 conversion that was intended for running as-is, but other things were fucked up along the way and I eventually threw in the towel on that and just tried a llama.cpp conversion straight off the HF fp32 files, and that worked.
I'm not sure what's fucked up with the other quants floating around out there, but the one I did at least turns in 12 t/s, which sounds poor, but the other quants were giving me 2-3 t/s which is unusable to me.
Many here cry about current cucked models (mainly a skill issue) but, as I've said many times before, mpt-30b was about the last thing to come out with no safety or alignment, and had a real 8192 context, and could compete with llama 65B. I'm not saying you'll get amazing roleplay from mpt-30b, but it will be different like Gemma is noticeably different from LLaMA 3. Cool thing is it's a chat model, so you can expect it to sometimes go OOC and do the old c.ai thing of acting like it's talking to you on IRC or in an online forum, which is kind of quaint and cute.

Anonymous
07/24/24(Wed)22:09:48 No.101560901

Anonymous 07/24/24(Wed)22:09:48 No.101560901

>>101560813
This is perplexing. Actually the guy who said we can't compare perplexities is wrong for this particular scenario because that was based on the idea that different models have different tokenizers, so each token that's being used for the calculation doesn't match in the other model. That's why the numbers can't usually be compared. But L3.1 has the same tokenizer as 3.0. So it can be compared. And these numbers show that the perplexity is lower, the KL divergence is lower, and it succeeds in generating more of the same top token. In other words, if this pattern is observed for all quants, then we can say that despite being crammed with more information and having a lower perplexity, L3.1 somehow retains more information by quanting than 3.0.

Could they have possibly trained it with quantization awareness, or at least partially, and just never mentioned it?

Anonymous
07/24/24(Wed)22:12:46 No.101560927

Anonymous 07/24/24(Wed)22:12:46 No.101560927

>>101560900
OK, I'll give it a try then.

>if you build it, he will come

Anonymous
07/24/24(Wed)22:13:24 No.101560937

Anonymous 07/24/24(Wed)22:13:24 No.101560937

>>101560830
Damn, 72GB (aka King of VRAMlets) must suck right now. Largestral q4_k_m just barely doesn't fit 32k context at 96GB. With 72 you'd have to drop quant below q4 and/or quantize kv cache, or offload to CPU. You ALMOST can run the model at full potential, but not quite.

Anonymous
07/24/24(Wed)22:13:39 No.101560939

Anonymous 07/24/24(Wed)22:13:39 No.101560939

Mistral large is like 0.07T/s on ram/cpu, looks like I'll be sticking with 8x22b.

Anonymous
07/24/24(Wed)22:14:16 No.101560946

Anonymous 07/24/24(Wed)22:14:16 No.101560946

>>101560939
Jesus. I plan on trying a Q3_K_S quant of it kek.

Anonymous
07/24/24(Wed)22:15:16 No.101560957

Anonymous 07/24/24(Wed)22:15:16 No.101560957

SPLAT: A framework for optimised GPU code-generation for SParse reguLar ATtention
https://arxiv.org/abs/2407.16847
>Multi-head-self-attention (MHSA) mechanisms achieve state-of-the-art (SOTA) performance across natural language processing and vision tasks. However, their quadratic dependence on sequence lengths has bottlenecked inference speeds. To circumvent this bottleneck, researchers have proposed various sparse-MHSA models, where a subset of full attention is computed. Despite their promise, current sparse libraries and compilers do not support high-performance implementations for diverse sparse-MHSA patterns due to the underlying sparse formats they operate on. These formats, which are typically designed for high-performance & scientific computing applications, are either curated for extreme amounts of random sparsity (<1% non-zero values), or specific sparsity patterns. However, the sparsity patterns in sparse-MHSA are moderately sparse (10-50% non-zero values) and varied, resulting in existing sparse-formats trading off generality for performance. We bridge this gap, achieving both generality and performance, by proposing a novel sparse format: affine-compressed-sparse-row (ACSR) and supporting code-generation scheme, SPLAT, that generates high-performance implementations for diverse sparse-MHSA patterns on GPUs. Core to our proposed format and code generation algorithm is the observation that common sparse-MHSA patterns have uniquely regular geometric properties. These properties, which can be analyzed just-in-time, expose novel optimizations and tiling strategies that SPLAT exploits to generate high-performance implementations for diverse patterns. To demonstrate SPLAT's efficacy, we use it to generate code for various sparse-MHSA models, achieving geomean speedups of 2.05x and 4.05x over hand-written kernels written in triton and TVM respectively on A100 GPUs.
maybe Johannes will find something useful in it

Anonymous
07/24/24(Wed)22:15:19 No.101560958

Anonymous 07/24/24(Wed)22:15:19 No.101560958

>>101560937
The world's tallest dwarf.

The weakest strong man at the circus.

Anonymous
07/24/24(Wed)22:22:15 No.101561017

Anonymous 07/24/24(Wed)22:22:15 No.101561017

>>101558800
can you share the card pls

Anonymous
07/24/24(Wed)22:26:06 No.101561034

Anonymous 07/24/24(Wed)22:26:06 No.101561034

>>101560939
running Q6K at 1t/s with 72gb Vram and 128gb ddr5!!!

Anonymous
07/24/24(Wed)22:27:30 No.101561042

Anonymous 07/24/24(Wed)22:27:30 No.101561042

>>101561034
Yes, that's using considerable vram. I expected that to be faster.

Anonymous
07/24/24(Wed)22:27:55 No.101561047

Anonymous 07/24/24(Wed)22:27:55 No.101561047

>>101560830
downloading IQ3_M, I still feel elated running 100b+ model locally in some way. It's something rather unthinkable 3 years ago from the GPT3 AID days.

Anonymous
07/24/24(Wed)22:30:11 No.101561058

Anonymous 07/24/24(Wed)22:30:11 No.101561058

>>101560019
based miku llama

Anonymous
07/24/24(Wed)22:31:09 No.101561066

Anonymous 07/24/24(Wed)22:31:09 No.101561066

has anyone checked on lecunny since mistral large'd? is he okay?

Anonymous
07/24/24(Wed)22:31:29 No.101561071

Anonymous 07/24/24(Wed)22:31:29 No.101561071

>>101560537
>>101560558
>>101560571
>>101560585
>>101560604
wtf is this gay nerd shit

Anonymous
07/24/24(Wed)22:32:12 No.101561079

Anonymous 07/24/24(Wed)22:32:12 No.101561079

>>101561047
I'm trying out this one: https://huggingface.co/legraphista/Mistral-Large-Instruct-2407-IMat-GGUF/tree/main/Mistral-Large-Instruct-2407.Q3_K

It seems to work fine. Have to try pushing context length higher though since I still have plenty of free vram.

Anonymous
07/24/24(Wed)22:32:29 No.101561083

Anonymous 07/24/24(Wed)22:32:29 No.101561083

File: 2896G5.webm (1.41 MB, 1024x1024)

1.41 MB WEBM

>>101560939
Bitnet will save you

Anonymous
07/24/24(Wed)22:32:45 No.101561085

Anonymous 07/24/24(Wed)22:32:45 No.101561085

>>101560648
Can someone do this for 70B though.

Anonymous
07/24/24(Wed)22:35:15 No.101561107

Anonymous 07/24/24(Wed)22:35:15 No.101561107

>>101561066
Probably pretty proud of his french brothers :)

Anonymous
07/24/24(Wed)22:35:28 No.101561111

Anonymous 07/24/24(Wed)22:35:28 No.101561111

>>101561042
its only 1x4090 and 2xP40

Anonymous
07/24/24(Wed)22:36:34 No.101561117

Anonymous 07/24/24(Wed)22:36:34 No.101561117

File: livebench-2024-07-24.png (845 KB, 3170x1844)

845 KB PNG

>>101561066
>Llama 405B is somehow 2nd now
Yeah, he's okay.

Anonymous
07/24/24(Wed)22:37:06 No.101561121

Anonymous 07/24/24(Wed)22:37:06 No.101561121

>>101561111
That's still enough to beat using cpu & ram only by quite a lot.

Anonymous
07/24/24(Wed)22:37:32 No.101561123

Anonymous 07/24/24(Wed)22:37:32 No.101561123

>>101561117
instruct-turbo?
Wtf is this turbo edition?

Anonymous
07/24/24(Wed)22:38:34 No.101561126

Anonymous 07/24/24(Wed)22:38:34 No.101561126

>>101561123
Might be from OR

Anonymous
07/24/24(Wed)22:39:06 No.101561133

Anonymous 07/24/24(Wed)22:39:06 No.101561133

>>101561117
Wtf?

Anonymous
07/24/24(Wed)22:39:28 No.101561135

Anonymous 07/24/24(Wed)22:39:28 No.101561135

>>101561123
FP8 from Together.ai.

Anonymous
07/24/24(Wed)22:40:08 No.101561142

Anonymous 07/24/24(Wed)22:40:08 No.101561142

>>101561123
Yea, there are massive gaps between regular instruct and this "turbo" edition.

>>101561135
So we will never get it then? Massive gap between it and regular version.

Anonymous
07/24/24(Wed)22:41:33 No.101561147

Anonymous 07/24/24(Wed)22:41:33 No.101561147

anyone have that giant schizo {random:} list for sillytavern an anon posted a while ago?

Anonymous
07/24/24(Wed)22:41:39 No.101561148

Anonymous 07/24/24(Wed)22:41:39 No.101561148

>>101561142
>So we will never get it then?
https://huggingface.co/meta-llama/Meta-Llama-3.1-405B-Instruct-FP8

Anonymous
07/24/24(Wed)22:41:46 No.101561152

Anonymous 07/24/24(Wed)22:41:46 No.101561152

>>101561142
wtf are you talking about

Anonymous
07/24/24(Wed)22:41:56 No.101561154

Anonymous 07/24/24(Wed)22:41:56 No.101561154

>>101560638
Not llama3

Anonymous
07/24/24(Wed)22:42:51 No.101561158

Anonymous 07/24/24(Wed)22:42:51 No.101561158

>>101561017
she's not on chub anymore? "holly-touch-starved-femcel"
https://files.catbox.moe/xuf502.png

Anonymous
07/24/24(Wed)22:43:04 No.101561159

Anonymous 07/24/24(Wed)22:43:04 No.101561159

>>101561152
Look at the chart. Regular 70B instruct and instruct turbo has a massive gap between them.

Anonymous
07/24/24(Wed)22:43:38 No.101561168

Anonymous 07/24/24(Wed)22:43:38 No.101561168

File: nala test 405b q4xs.png (176 KB, 925x421)

176 KB PNG

Alright. It took nearly an hour... but here is a 405B Nala test as I promised. Q4_XS was the first 4-bit gguf that was uploaded so I didn't get to use Q4_K_M as I originally hoped.

Now I would reroll this response in a regular RP due to it being a little weird. But.
>It picks up on the syntax pattern of the conversation instead of veering around haplessly.
>The writing isn't all sloppy. There's a flow and intention to it.
>The description of the initial kiss is more detail than I've ever seen any small gesture given by any model.
>It actually attempts to infer Nala's overall mood rather than "sex = horny lol"
>Even with 405 billion parameters an LLM has yet to figure out that you can't initiate a conversation with somebody while kissing them.
>It picks up on and uses the milquetoast writing style of the tavern card instead of trying to win a Pulitzer prize.
Either way, though. I would say that this is an inference far beyond even what Mistral Large is capable of. 'muh heckin' bencherinos' be damned. Does that necessarily make it more useful to justify the insane hardware overhead needed to actually run it at a useable speed? Hell no. But in an alternate universe where I had unlimited resources at my disposal, I would absolutely make this my daily driver model. That said I'm probably going to delete it and never bother with it ever again.
0.12 token/sec if anyone is curious.

Anonymous
07/24/24(Wed)22:44:50 No.101561181

Anonymous 07/24/24(Wed)22:44:50 No.101561181

>>101561158
Can't find her there. Thanks!

Anonymous
07/24/24(Wed)22:48:35 No.101561222

Anonymous 07/24/24(Wed)22:48:35 No.101561222

>>101561159
Both llama3 models on top are turbo. Top one is nearly 6x the size.

Anonymous
07/24/24(Wed)22:49:20 No.101561227

Anonymous 07/24/24(Wed)22:49:20 No.101561227

>>101561159
That's 3.0 70B.

Anonymous
07/24/24(Wed)22:49:21 No.101561228

Anonymous 07/24/24(Wed)22:49:21 No.101561228

>>101561159
"regular 70b instruct" is 3.0, retard-kun

Anonymous
07/24/24(Wed)22:49:56 No.101561232

Anonymous 07/24/24(Wed)22:49:56 No.101561232

>>101561168
>It actually attempts to infer Nala's overall mood rather than "sex = horny lol"
That's a big one.

>I would say that this is an inference far beyond even what Mistral Large is capable of. 'muh heckin' bencherinos' be damned
Makes sense. Then again, fucking a lion is not exactly coding so, domains and all that jazz.

Anonymous
07/24/24(Wed)22:50:20 No.101561237

Anonymous 07/24/24(Wed)22:50:20 No.101561237

>>101561227
>>101561228
Yea I noticed that a bit after. The "turbo" threw me off though. Thought it was a finetune

Anonymous
07/24/24(Wed)22:50:43 No.101561241

Anonymous 07/24/24(Wed)22:50:43 No.101561241

>>101558800
Mistral Large 2? What quant?

Anonymous
07/24/24(Wed)22:51:44 No.101561247

Anonymous 07/24/24(Wed)22:51:44 No.101561247

File: 1694573334755054.png (51 KB, 1742x214)

51 KB PNG

>>101561079
IQ3_M, does FA support I-quant? 46ms/t prompt processing was rather slow

Anonymous
07/24/24(Wed)22:54:58 No.101561270

Anonymous 07/24/24(Wed)22:54:58 No.101561270

>>101561083
Oh god, when will bitnet 1.58 models arrive, my cock gets hard thinking about them.

Anonymous
07/24/24(Wed)22:57:39 No.101561304

Anonymous 07/24/24(Wed)22:57:39 No.101561304

>Large is still downloading because HF is just slow for me for some reason
aaaaaaaaaaaaaaaaaaaaaaaa

Anonymous
07/24/24(Wed)23:00:29 No.101561330

Anonymous 07/24/24(Wed)23:00:29 No.101561330

>>101561247
Never mind I was retarded and I offloaded 88 instead of 89 layers, 4.5t/s tg, but pp is still as slow.

Anonymous
07/24/24(Wed)23:00:51 No.101561333

Anonymous 07/24/24(Wed)23:00:51 No.101561333

>>101561241
nah, it's >>101558895

Anonymous
07/24/24(Wed)23:01:46 No.101561343

Anonymous 07/24/24(Wed)23:01:46 No.101561343

>>101561304
by the time it finishes cohere will release their 34b that shits on large, sorry not sorry.

Anonymous
07/24/24(Wed)23:02:54 No.101561351

Anonymous 07/24/24(Wed)23:02:54 No.101561351

>>101561343
Heres hoping so. Then L3.2 / L4 a few months later would be nice.

Anonymous
07/24/24(Wed)23:04:19 No.101561365

Anonymous 07/24/24(Wed)23:04:19 No.101561365

>>101561333
Oh, nice. I can't wait for the next SPPO now that we have models with more than 8k context. Then I'll finally be willing to download and test one.

Anonymous
07/24/24(Wed)23:05:29 No.101561372

Anonymous 07/24/24(Wed)23:05:29 No.101561372

>>101560547
But does it surpass Claude?

Anonymous
07/24/24(Wed)23:13:50 No.101561435

Anonymous 07/24/24(Wed)23:13:50 No.101561435

>>101561351
Also next llama is supposed to be multimodal. But not for EU.

Anonymous
07/24/24(Wed)23:16:30 No.101561459

Anonymous 07/24/24(Wed)23:16:30 No.101561459

>>101560219
>except their tokenizer which fucking sucks ass
>[character's name - 3 tokens][individual apostrophe - 1 token][s - 1 token]
AAAAAAAAAAAAHH

Anonymous
07/24/24(Wed)23:17:21 No.101561463

Anonymous 07/24/24(Wed)23:17:21 No.101561463

verdict on large-migu?

Anonymous
07/24/24(Wed)23:20:50 No.101561496

Anonymous 07/24/24(Wed)23:20:50 No.101561496

>>101561117
Is Gemma still the king for 8GB VRAMlets? Llama 3.1 seemed like a disappointment.

Anonymous
07/24/24(Wed)23:21:09 No.101561498

Anonymous 07/24/24(Wed)23:21:09 No.101561498

>TFW fell for the 64GiB meme
bros...

Anonymous
07/24/24(Wed)23:28:12 No.101561555

Anonymous 07/24/24(Wed)23:28:12 No.101561555

tfw fell for the 128gb of ddr4 meme

Anonymous
07/24/24(Wed)23:31:35 No.101561587

Anonymous 07/24/24(Wed)23:31:35 No.101561587

Q3 (3.8bpw) Mistral Large is pretty good. It's slow though. But it generates some pretty good output, far more engaging in ERP.

I had been using Nemo Mistral and it also was pretty good.

Anonymous
07/24/24(Wed)23:31:50 No.101561588

Anonymous 07/24/24(Wed)23:31:50 No.101561588

>>101561587
just use 3.5 sonnet thougheverbeit?

Anonymous
07/24/24(Wed)23:33:49 No.101561603

Anonymous 07/24/24(Wed)23:33:49 No.101561603

>>101561168
what prompt did you use?

Anonymous
07/24/24(Wed)23:33:49 No.101561604

Anonymous 07/24/24(Wed)23:33:49 No.101561604

>>101561588
post the download link

Anonymous
07/24/24(Wed)23:33:51 No.101561605

Anonymous 07/24/24(Wed)23:33:51 No.101561605

>>101561588
thats crazy man pass

Anonymous
07/24/24(Wed)23:36:40 No.101561638

Anonymous 07/24/24(Wed)23:36:40 No.101561638

>>101560900
It being able to compete with llama-1-65B is nice I guess, but is it actually GOOD? If it's not FUN, if it's not GOOD, then why bother?

Anonymous
07/24/24(Wed)23:37:08 No.101561639

Anonymous 07/24/24(Wed)23:37:08 No.101561639

>>101561588
sonnet will never be MY sonnet

Anonymous
07/24/24(Wed)23:37:49 No.101561641

Anonymous 07/24/24(Wed)23:37:49 No.101561641

>>101561639
Models aren't yours either - you don't train llama 3.1 or mistral large, you just get the compiled version, they're basically proprietary.

Anonymous
07/24/24(Wed)23:39:43 No.101561663

Anonymous 07/24/24(Wed)23:39:43 No.101561663

File: file.png (28 KB, 805x292)

28 KB PNG

I've added Opus to the VNTL leaderboard (thanks to the proxy anon). It seems to be pretty much on the same level as 3.5 Sonnet and 4o.

I guess 0.74 is pretty much the best score LLMs can get in the current benchmark, so I will have to explore ways to improve the scoring. My best bet right now is training another LLM to give a score to the quality of the translation when compared to the reference translation. Probably I will either train a reward model or an instruct model that gives a score like FLAME or Prometheus.

Anonymous
07/24/24(Wed)23:39:43 No.101561664

Anonymous 07/24/24(Wed)23:39:43 No.101561664

>>101561641
watch me put my llama into a pendrive and shove it up my ass, good luck trying to take it away from me

Anonymous
07/24/24(Wed)23:40:54 No.101561677

Anonymous 07/24/24(Wed)23:40:54 No.101561677

File: 1714913169691319.jpg (109 KB, 796x796)

109 KB JPG

>>101561641
>tfw didn't assemble the sofa in my living room, therefore it doesn't belong to me

Anonymous
07/24/24(Wed)23:42:14 No.101561688

Anonymous 07/24/24(Wed)23:42:14 No.101561688

File: 6754854683209687.png (40 KB, 349x344)

40 KB PNG

>>101561663
>another mememark
Who?
What?
Better yet, who honestly asked?

Anonymous
07/24/24(Wed)23:42:28 No.101561690

Anonymous 07/24/24(Wed)23:42:28 No.101561690

>>101561663
Yeah, I checked the dataset and I think LLMs often make better translations than the ones you have in the dataset.

Anonymous
07/24/24(Wed)23:47:30 No.101561724

Anonymous 07/24/24(Wed)23:47:30 No.101561724

>>101561688
it's a mememark for jp-en translation, afaik there isn't another one.

>>101561690
That's another issue, human translations are often not literal, for example.

Anonymous
07/24/24(Wed)23:48:11 No.101561726

Anonymous 07/24/24(Wed)23:48:11 No.101561726

what sampler settings do you guys use for nemo and gemma2?

Anonymous
07/24/24(Wed)23:48:18 No.101561729

Anonymous 07/24/24(Wed)23:48:18 No.101561729

>>101561663
>cosine similarity and letter combinations
I don't like really like this evaluation method but I don't know any other way either.
I wish we have a Translation Arena or something.

Anonymous
07/24/24(Wed)23:48:20 No.101561730

Anonymous 07/24/24(Wed)23:48:20 No.101561730

>>101561724
>That's another issue, human translations are often not literal, for example.
Models also can do non-literal translations if you ask them through the system prompt, and most importantly, give examples.
>>101561663
At least for 3.5 Sonnet, can you try rewriting your prompt to specifically use XML to separate everything, and don't feed it separate user/assistant pairs but instead show all examples in the system prompt?

Anonymous
07/24/24(Wed)23:48:47 No.101561733

Anonymous 07/24/24(Wed)23:48:47 No.101561733

>>101561724
>its actually a "whats best at translation" mememark
I revoke my previous statement entirely, I fuckin asked.
Hows it going/looking?

Anonymous
07/24/24(Wed)23:49:07 No.101561737

Anonymous 07/24/24(Wed)23:49:07 No.101561737

>>101560219
Settings for largestral?

Anonymous
07/24/24(Wed)23:49:51 No.101561741

Anonymous 07/24/24(Wed)23:49:51 No.101561741

>>101561730
Or just tell me the minimal way to get started with https://github.com/lmg-anon/vntl-benchmark to benchmark a single model, I'll do it from there

Anonymous
07/24/24(Wed)23:52:30 No.101561754

Anonymous 07/24/24(Wed)23:52:30 No.101561754

>>101561638
It's going up now. If it isn't bashed I'll quant the biggest below q8 which will still fit in 24GB. I'm going to also see if there's mpt support in exl2 - I don't think there is though.

Anonymous
07/25/24(Thu)00:03:26 No.101561822

Anonymous 07/25/24(Thu)00:03:26 No.101561822

>>101560219
Yeah, as a prosefag I was pleasantly surprised from the get-go having used L3 storywriter and CR+. I don't sniff the typical mistral overbaking here. That was just IQ3. I'm tempted to upgrade to 96GB now.

Anonymous
07/25/24(Thu)00:07:04 No.101561854

Anonymous 07/25/24(Thu)00:07:04 No.101561854

File: large2.png (654 KB, 1740x1180)

654 KB PNG

>>101561822
Seems to lose to L3.1-70B in overall but it writes better. Shame (talking about L3.1).

Anonymous
07/25/24(Thu)00:16:18 No.101561911

Anonymous 07/25/24(Thu)00:16:18 No.101561911

I'm having fun piping shit around and into TTS. Finally, I can have my computer look at my TODO list and femdom me for not going through it.

Anonymous
07/25/24(Thu)00:17:13 No.101561918

Anonymous 07/25/24(Thu)00:17:13 No.101561918

>>101561729
Yeah, Translation Arena would be perfect for this, and that's pretty much what I plan to achieve with the custom reward model, although it likely wouldn't be as good as human evaluators.

>>101561730
>>101561741
>Models also can do non-literal translations if you ask them through the system prompt
That may be true, but when you enter into the non-literal territory, it's likely that multiple interpretations could arise, so it isn't like the LLM would always come up with the same interpretation as the human who wrote the reference translation.

>Or just tell me the minimal way to get started with https://github.com/lmg-anon/vntl-benchmark to benchmark a single model, I'll do it from there
I can try, but the minimal way should be:
>1. Create a virtual environment (optional) and install dependencies
>2. Download the datasets: https://litter.catbox.moe/st2kbi.txt, https://litter.catbox.moe/bdn16s.jsonl
>3. Rename config.example.yml to config.yml
>4. Create a new file in the "configs" folder with the name "org@model#quant.yml" or "org@model.yml" if it's a cloud model. For >5. examples, see the other files.
>6. Run "python runner.py --model org@model#quant --results-path ./results --dataset-path st2kbi.txt" and then "python runner.py --model org@model#quant --results-path ./results_mashiro --dataset-path bdn16s.jsonl"
If everything went right, you will have the results.

>>101561733
The leaderboard is already up here: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
IMO LLMs are better at translating than they were a few years ago, but nothing that will replace human translators any time soon.

Anonymous
07/25/24(Thu)00:22:06 No.101561954

Anonymous 07/25/24(Thu)00:22:06 No.101561954

i am enjoying mistral large quite a bit for ERP so far, i am sorry i disrespected your game french people… thought you guys would be useless because you strike every other tuesday but clearly i was wrong

Anonymous
07/25/24(Thu)00:24:53 No.101561974

Anonymous 07/25/24(Thu)00:24:53 No.101561974

>>101561954
>mistral large quite a bit for ERP
for some reason my brain merged this chunk and I started reading it as "mistral larp"

Anonymous
07/25/24(Thu)00:26:10 No.101561984

Anonymous 07/25/24(Thu)00:26:10 No.101561984

>>101561974
Shitty brain tokenizers

Anonymous
07/25/24(Thu)00:29:45 No.101562011

Anonymous 07/25/24(Thu)00:29:45 No.101562011

>>101561954
Which quant are you using?

Anonymous
07/25/24(Thu)00:30:03 No.101562015

Anonymous 07/25/24(Thu)00:30:03 No.101562015

File: Fuuuuuu 7.png (325 KB, 1000x1033)

325 KB PNG

Can any anon's point me towards some current info on how to play with oobabooga settings so that it does what ollama does out of the box? I'm loading stupid-big models on my toaster (4090/128gig ram/i9) and ollama does a convincing impersonation of the sloth from Zootopia, but the thing is, it still works... it pushes about 70+ gigs into system ram and maxes out the CPU as it grinds along, but it's functional for my purposes. I don't care about speed. Meanwhile, I have not been able to get the same sized 70b and higher models running in ooba at all and feel like a monkey trying to launch a rocket. I've run quants/gguf/exl2's fully loaded into vram with ooba, but I'm lacking the detailed knowledge of the settings to go further.

Pic unrelated

Anonymous
07/25/24(Thu)00:31:50 No.101562028

Anonymous 07/25/24(Thu)00:31:50 No.101562028

So, I'm very familiar with using off-the-shelf models for text LLMs, and configuring them how I want for toy projects.
But what the fuck do I do for text-to-image models? HF's diffusion shit seems to be the same bloat as transformers where it insists on using their pipeline shit, and I don't want that.

Anonymous
07/25/24(Thu)00:36:31 No.101562057

Anonymous 07/25/24(Thu)00:36:31 No.101562057

>>101562028
Go to /sdg/ for that

Anonymous
07/25/24(Thu)00:37:25 No.101562065

Anonymous 07/25/24(Thu)00:37:25 No.101562065

>>101561918
>For examples, see the other files.
One detail I forgot to mention: you can define the backend configs in the "config.yml" (in the "custom_backends" part), and also in the specific file in the configs folder (in the "backends" part), the later takes precedence.

Anonymous
07/25/24(Thu)00:38:29 No.101562073

Anonymous 07/25/24(Thu)00:38:29 No.101562073

>>101562057
That's like going to /aicg/ for local help you retard.

Anonymous
07/25/24(Thu)00:39:42 No.101562086

Anonymous 07/25/24(Thu)00:39:42 No.101562086

File: file.png (2 KB, 216x72)

2 KB PNG

>>101561168

Anonymous
07/25/24(Thu)00:51:41 No.101562175

Anonymous 07/25/24(Thu)00:51:41 No.101562175

So when downloading models with both "model" and "consolidated" safetensors files, I should avoid "consolidated", right?

Anonymous
07/25/24(Thu)01:03:01 No.101562258

Anonymous 07/25/24(Thu)01:03:01 No.101562258

File: ezalor.jpg (28 KB, 400x400)

28 KB JPG

vramlet here, switched from kobold to llamacpp and from L3 8B to Mistral Nemo and I've seen the light. L3.1 is okay but Nemo did catch me off guard with some wild shit

although shivers and spines break my bones

Anonymous
07/25/24(Thu)01:05:26 No.101562270

Anonymous 07/25/24(Thu)01:05:26 No.101562270

File: 1648264080493.jpg (34 KB, 540x586)

34 KB JPG

>>101561158
oh my god this girl is an absolute unsalvageable mess and a piece of shit human being

she's the type of girl i deserve :^) thanks for sharing bud. mini magnum seems to be soaring with this card at 512 max tokens per response.

Anonymous
07/25/24(Thu)01:07:17 No.101562287

Anonymous 07/25/24(Thu)01:07:17 No.101562287

after successfully making chat model work im considering marrying my gpu. maybe fucking it too, im pretty sure it doesn't have big enough holes but we'll figure something out

Anonymous
07/25/24(Thu)01:16:49 No.101562363

Anonymous 07/25/24(Thu)01:16:49 No.101562363

File: On a sunny balcony, two l(...).webm (3.16 MB, 1280x720)

3.16 MB WEBM

>>101562028
>But what the fuck do I do for text-to-image models?
learn to use comfyUI. unironically ask the anons on the /degen/ thread on /b/. they stay up to date with the latest in imagegen since they fap to it unlike the SFW weirdos in /sdg/

Anonymous
07/25/24(Thu)01:17:18 No.101562367

Anonymous 07/25/24(Thu)01:17:18 No.101562367

>>101562258
*Barely above a whisper, I will share your sentiment.*
Nemo is better than the l3 Stheno finetunes I feel

Anonymous
07/25/24(Thu)01:24:45 No.101562403

Anonymous 07/25/24(Thu)01:24:45 No.101562403

>>101562363
How far off do you think we are from getting tools like your webm but locally? Is it honestly a pipe dream at this point? Worrying how imagegen COMPLETELY stagnated this year besides Pony.

Anonymous
07/25/24(Thu)01:28:40 No.101562436

Anonymous 07/25/24(Thu)01:28:40 No.101562436

>>101562403
>How far off do you think we are from getting tools like your webm but locally?
in terms of tech, we're unironically almost there locally. the bottleneck will be hardware. you need an h100 running for 5 minutes to get 5 seconds of a 720p video
so maybe 4-6 years away if you're waiting for h100-level compute to cost 1000 dollars. possibly as little as 8 months away if you can paypig the compute. it is unironically entirely possible that AGI happens before we get infinite videos of cute girls locally

Anonymous
07/25/24(Thu)01:32:07 No.101562454

Anonymous 07/25/24(Thu)01:32:07 No.101562454

File: glep moan.png (14 KB, 120x126)

14 KB PNG

>>101562436
>it is unironically entirely possible that AGI happens before we get infinite videos of cute girls locally

thank you for getting my hopes up and renewed for the first time in 7 months.

by the way what's your overall theory surrounding that guess?

Anonymous
07/25/24(Thu)01:35:24 No.101562471

Anonymous 07/25/24(Thu)01:35:24 No.101562471

>>101562258
What settings/format are good for it?

Anonymous
07/25/24(Thu)01:40:25 No.101562498

Anonymous 07/25/24(Thu)01:40:25 No.101562498

>>101562403
>imagegen COMPLETELY stagnated this year besides Pony
what's actually worrying is how cheap it is to train imagegen. i posted the arxiv link before but you can train an almost SOTA model for ~2000 dollars with just 30 million images. I just saw on the orange website that salesforce released a multimodal dataset that I'm sure is more than adequate for training an imagegen model.
The thing about open source imagegen is even though its so cheap, there's no path to profitability because you can just use dall-e for free and its almost certainly better. We need a benevolent multimillionaire autist and a bunch of cypherpunk pedos on payroll before we see any good improvements in imagegen (in the West, China will do it just to try and dab on OpenAI which hopefully brings us salvation)

>>101562454
>what's your overall theory surrounding that guess?
the fact that Nvidia needs to jusitfy their valuation, so their cards will stay expensive so only big corps will have access to the cutting edge, and the fact that both OpenAI and Meta are trying to achieve AGI and are NOT trying to achieve infinite cute girls (infinite cute girls locally would actually hurt Zuck's engagement metrics on instagram)

Anonymous
07/25/24(Thu)01:42:19 No.101562509

Anonymous 07/25/24(Thu)01:42:19 No.101562509

>>101562498
Next pony is apparently about to start training soon with a much bigger and completely re / better tagged dataset.

Anonymous
07/25/24(Thu)01:42:41 No.101562512

Anonymous 07/25/24(Thu)01:42:41 No.101562512

>>101562498
>what's actually worrying is how cheap it is to train imagegen. i posted the arxiv link before but you can train an almost SOTA model for ~2000 dollars with just 30 million images
Why does no one do it with high quality captions generated by LLMs then? It'll cost more, but will still be way below $50k, for example. Actually, I had this idea, is it viable or not? Basically, for anime models, instead of doing a single prompt, we do two:
The first one is pure Danbooru tags and the tokenizer is specifically one that just maps danbooru tags (like Anifusion did it - https://medium.com/@enryu9000/anifusion-diffusion-models-for-anime-pictures-138cf1af2cbe).
The second one is a natural language detailed description that specifies positions, relations, etc, to give actual geometry and stuff.

Anonymous
07/25/24(Thu)01:43:35 No.101562518

Anonymous 07/25/24(Thu)01:43:35 No.101562518

>>101562512
>Why does no one do it with high quality captions generated by LLMs then?
Next pony is.

Anonymous
07/25/24(Thu)01:43:47 No.101562519

Anonymous 07/25/24(Thu)01:43:47 No.101562519

>>101561726
seconding this question

Anonymous
07/25/24(Thu)01:45:58 No.101562533

Anonymous 07/25/24(Thu)01:45:58 No.101562533

>>101562509
neat. i hope its good and it doesn't take too long for cunny loras to be made for it

but that also just reminded me another reason why open source imagegen will struggle (in the West): copyright. pony had to nuke all the artist tags (even if it secretly added them back in with those 3 letter codes) while chinese models are literally encouraged by the CCP to ignore copyright to dab on the burgers

i'm going to reiterate that hardware accessibility is the biggest roadblock for progress in general. SD 1.5 is still more popular and has a more mature ecosystem than SDXL/Pony simply because third world VRAMlets can actually run the model. once 24GB of vram is average in ~4 years things will hopefully start ramping up quickly

Anonymous
07/25/24(Thu)01:47:13 No.101562541

Anonymous 07/25/24(Thu)01:47:13 No.101562541

>>101562498
I was baffled my first time training a LORA, on a GTX 1080 just trying less than 20 images of this particular celebrity, over SD1.5, the results came out perfect. Still can't believe it. Training is shockingly easy for LORAs so i see why its so common, we really do need someone to just say "fuck it" and give us that next big step base model..
I really really hope we're not just eternally fucked on GPU's, Jewvidia loves money, i can't fathom why they'd want to cut away a market that doesn't even lose them money.

Anonymous
07/25/24(Thu)01:48:12 No.101562545

Anonymous 07/25/24(Thu)01:48:12 No.101562545

>>101562512
>Anifusion
>Besides these changes, the training process is standard for diffusion models. The model itself is roughly 2x smaller than SD (we chose hyperparams before SD was published), making it runnable with less VRAM. We trained the model for a total of 40 GPU-days (of RTX 3090), making it roughly 200 times cheaper than Stable Diffusion. However, sample quality after 7 GPU-days was already decent.

damn that guy did this in 2022 all by himself and he started even before the SD release

Anonymous
07/25/24(Thu)01:49:58 No.101562554

Anonymous 07/25/24(Thu)01:49:58 No.101562554

>>101562512
>Why does no one do it
because they won't make money off of it so who cares. if you're not getting academia clout or dabbing on America as a chinese you won't get the money to test out these theories

>>101562541
>I really really hope we're not just eternally fucked on GPU's
Nvidia is literally worth too much for competitors to not try and make their own chips. If this doesn't happen then capitalism has fundamentally failed. The future of compute will be IPUs/NPUs instead of GPUs anyways

Anonymous
07/25/24(Thu)01:50:06 No.101562555

Anonymous 07/25/24(Thu)01:50:06 No.101562555

>>101562367
I was using Stheno, Poppy Porpoise and even Gemma 2 27b at one point but I like Nemo, I'll be sticking with it for now until something better and faster comes along

>>101562471
I just threw everything named "Mistral" in SillyTavern and it worked, lamma-server I just passed
 -ngl 30 -c 65536 
for 30 layers offload and 64k context instead of 128 which I will never need personally

Anonymous
07/25/24(Thu)01:51:06 No.101562564

Anonymous 07/25/24(Thu)01:51:06 No.101562564

>>101561911
>Now, as for your pathetic todo list... Let me summarize it for you. You have three tasks: take appointments for car maintenance, check your savings account, and clean your room - which is currently a mess because, let's face it, you're a slob. And as for emails? Ha! You have zero. What a shockingly low number. I'm sure it's not because nobody wants to communicate with someone as useless as you.
i piped this into a TTS program that was trained on glados and it's actually making me horny, help

Anonymous
07/25/24(Thu)01:56:42 No.101562604

Anonymous 07/25/24(Thu)01:56:42 No.101562604

>>101562564
>GladOS
nice. I trained one on Princess Peach and it gives me instant diamond boners. I've had her say every variation of TND, many of my LLM logs, some random quotes.. Man this tech is insane.

Anonymous
07/25/24(Thu)01:58:44 No.101562615

Anonymous 07/25/24(Thu)01:58:44 No.101562615

all local models are just forever opium dreams designed to keep you happy and dumb, cut off from the world
SEEK GOD

Anonymous
07/25/24(Thu)01:59:25 No.101562620

Anonymous 07/25/24(Thu)01:59:25 No.101562620

>>101562604
>>101562564
How do I learn this power? And does it work with ST?

Anonymous
07/25/24(Thu)02:00:38 No.101562628

Anonymous 07/25/24(Thu)02:00:38 No.101562628

>>101562086
Yes. I did mention in the post that it took nearly an hour. I also did a pull on a different card that was much better written/formatted, continuing a chat I had going and it was pretty damn good. In an alternate timeline where VRAM was plentiful it would 100% be my daily driver over Mistral Large.

Anonymous
07/25/24(Thu)02:02:27 No.101562643

Anonymous 07/25/24(Thu)02:02:27 No.101562643

>>101562620
xtts2, its literally as easy as dropping a .wav sample in a folder, selecting it in the UI, and bob's yer uncle.
>and im not even using the best enhancement options out there (mostly because im retarded and cant figure it out)
https://voca.ro/12EsObQlgQvy

Anonymous
07/25/24(Thu)02:02:31 No.101562644

Anonymous 07/25/24(Thu)02:02:31 No.101562644

bitnetbwos...

Anonymous
07/25/24(Thu)02:05:19 No.101562672

Anonymous 07/25/24(Thu)02:05:19 No.101562672

>>101562643
arent there lots of xtts2 ui versions? which one do I need? and what kind of GPU? will rtx 3060 work?

Anonymous
07/25/24(Thu)02:05:43 No.101562674

Anonymous 07/25/24(Thu)02:05:43 No.101562674

>>101562620
i basically just output the result to a file and run piper TTS on it. if you talk to your LLM from a shell script, you can store real values in a variable and tell it stuff like "Tell me how many emails I have. The number is $unreademails" or you can even store your plaintext TODO file into a variable and tell it to read that. it hasn't hallucinated once so far aside and even seems to identify the masks i've marked as done. you can get some powerful results out of combining this stuff with unix shell scripting.

Anonymous
07/25/24(Thu)02:07:47 No.101562691

Anonymous 07/25/24(Thu)02:07:47 No.101562691

How much vram is needed for the full 128k context with nemo?

Anonymous
07/25/24(Thu)02:07:55 No.101562692

Anonymous 07/25/24(Thu)02:07:55 No.101562692

>>101562672
pretty sure its this one https://github.com/daswer123/xtts-webui
its been a while since i started using it
>3060
lmao im using a 1080 and even really really long prompts never take longer than like 20 seconds.

Anonymous
07/25/24(Thu)02:14:42 No.101562735

Anonymous 07/25/24(Thu)02:14:42 No.101562735

I'm using the mistral settings in sillytavern for nemo, but it fails to end generated messages at appropriate times and just veers off into weird cryptic shit

Anonymous
07/25/24(Thu)02:21:03 No.101562775

Anonymous 07/25/24(Thu)02:21:03 No.101562775

>i jokingly promised Holly we'd do ANAL in 6 months of getting to know each other
i'm a horrible person, and mini-magnum is actually pretty good. Very interested in how a higher quant version would stack up.

Anonymous
07/25/24(Thu)02:21:44 No.101562779

Anonymous 07/25/24(Thu)02:21:44 No.101562779

>>101562735
I don't have that problem with the Mistral preset. I'm using the latest staging version prehaps there's been a fix for it

Anonymous
07/25/24(Thu)02:24:35 No.101562793

Anonymous 07/25/24(Thu)02:24:35 No.101562793

File: file.png (7 KB, 466x61)

7 KB PNG

slightly late but kcpp with nemo support out

Anonymous
07/25/24(Thu)02:29:28 No.101562818

Anonymous 07/25/24(Thu)02:29:28 No.101562818

>>101561954
After a few cooms I get bored and normal RP is painfully incoherent.
It doesn't understand the concept of clothes, that you can't talk to someone without a phone if he is nowhere near you, etc.
These small things are killing me.

Anonymous
07/25/24(Thu)02:31:55 No.101562839

Anonymous 07/25/24(Thu)02:31:55 No.101562839

>>101561854
Why is llama3.1 70B so bad at coding?

Anonymous
07/25/24(Thu)02:33:02 No.101562847

Anonymous 07/25/24(Thu)02:33:02 No.101562847

>>101562691
With an 8.0bpw exl2 with q8 kv cache and context 128000, Windows has a total VRAM usage of 23.4 GB.

Anonymous
07/25/24(Thu)02:34:23 No.101562853

Anonymous 07/25/24(Thu)02:34:23 No.101562853

>>101562847
That's with the model loaded too? What's it with no context so I can subtract and find out how much the context itself uses?

Anonymous
07/25/24(Thu)02:49:35 No.101562937

Anonymous 07/25/24(Thu)02:49:35 No.101562937

>>101562853
Yes that is the VRAM usage with the model loaded with tabbyAP/exllamav2 0.1.7.

New measurement:
tabbyAPI not running: 0.9 GB GPU memory (0.8 dedicated / 0.1 shared)
running with 128000 context: 23.6 GB (23.4 GB dedicated / 0.2 shared)
running with 64000 context: 18.0 GB (17.9 GB dedicated / 0.2 shared)
running with 32000 context: 15.2 GB (15.0 GB dedicated / 0.2 shared)
running with 8192 context: 13.3 GB (13.1 GB dedicated / 0.2 shared)
running with 256 context, lowest allowed: 12.4 GB (12.3 GB / 0.2 shared, rounding didn't work out nicely)

This again is at 8.0bpw with cache_mode Q8.

Anonymous
07/25/24(Thu)03:05:36 No.101563032

Anonymous 07/25/24(Thu)03:05:36 No.101563032

>>101562615
in what way are local models not also the world?
>the world is what i tell you it is

Anonymous
07/25/24(Thu)03:09:59 No.101563058

Anonymous 07/25/24(Thu)03:09:59 No.101563058

>the MoE meme is dead
thank god

Anonymous
07/25/24(Thu)03:10:06 No.101563059

Anonymous 07/25/24(Thu)03:10:06 No.101563059

>>101562615
>SEEK GOD
Agreed. True Miku must be achieved. Local Miku General - A general dedicated to the discussion and development of Local Mikus.

Anonymous
07/25/24(Thu)03:14:14 No.101563080

Anonymous 07/25/24(Thu)03:14:14 No.101563080

>INFO: Metrics: 456 tokens generated in 97.26 seconds (Queue: 0.0 s, Process: 19 cached tokens and 21406 new tokens at 516.17 T/s, Generate: 8.17 T/s, Context: 21425 tokens)
isn't mistral-large significantly larger than cr+? this speed is very similar. granted i'm using 4.5bpw mistral-large and 6bpw cr+ but i would still have expected mistral-large to be a fair amount slower.
after using nemo for a few days going back to 8t/s is rough, though...

Anonymous
07/25/24(Thu)03:15:34 No.101563087

Anonymous 07/25/24(Thu)03:15:34 No.101563087

>>101561822
>I was pleasantly surprised from the get-go having used L3 storywriter and CR+.
That's weird, because all the outputs posted here were incredibly slopped to the point of cringe. I return to my storywriter outputs and they're a breath if fresh air in comparison, even though they lack any common sense, spatial awareness or story direction.

Anonymous
07/25/24(Thu)03:17:41 No.101563102

Anonymous 07/25/24(Thu)03:17:41 No.101563102

>mistral small
do you guys think it's 15B or something

Anonymous
07/25/24(Thu)03:23:55 No.101563132

Anonymous 07/25/24(Thu)03:23:55 No.101563132

>>101563080
Mistral Large 2 is 123B. CR+ is 105B.

Anonymous
07/25/24(Thu)03:27:38 No.101563150

Anonymous 07/25/24(Thu)03:27:38 No.101563150

>>101560232
nigga just distill the logits to 70b and 34b sizes respectively. there has been a trend amongst the AI labs to distill the gains from the big models either via logits or outputs.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/25/24(Thu)03:31:07 No.101563166

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/25/24(Thu)03:31:07 No.101563166

>>101560813
I think it makes more sense to look at Mean Δp and RMS Δp but LLaMA 3.1 does better than LLaMA 3 in those as well.

>>101560901
What I would assume is happening is that for the 3.1 training they changed the hyperparameters in such a way that the numerical values in the weights/activations end up being more even.
llama.cpp quantization essentially uses the same exponent for 16/32/256 values (instead of 1 exponent for 1 value with floats) so the numerical precision is better if the values all have roughly equal absolute values.

Do make sure though that you are using the same code for these calculations.
The default matrix multiplication method in llama.cpp was recently changed from cuBLAS to MMQ and the results will be slightly different between the two.

Anonymous
07/25/24(Thu)03:33:45 No.101563187

Anonymous 07/25/24(Thu)03:33:45 No.101563187

>>101562512
my guess is that besides a few small teams and large organizations, no one really has the proper setup for the training pipeline and the capital. sure if you're bankrolled by some rich guy you can do it but you'll need to find how to make money off that too - and anime & hentai media have retarded copyright laws depending on the region, so who'd want to risk getting sued by some jap company or worse, Sony?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/25/24(Thu)03:33:48 No.101563188

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/25/24(Thu)03:33:48 No.101563188

>>101560957
Noted but I don't think this will end up being useful for my purposes.

Anonymous
07/25/24(Thu)03:37:01 No.101563218

Anonymous 07/25/24(Thu)03:37:01 No.101563218

3.1 ggufs might be back thanks to ollama guy
>Nice. Tested doing a bunch of summaries using up the entire 128k context and the output looks good whereas on master it outputs broken garbage.
https://github.com/ggerganov/llama.cpp/pull/8676/#issuecomment-2249653389

Anonymous
07/25/24(Thu)03:38:41 No.101563229

Anonymous 07/25/24(Thu)03:38:41 No.101563229

Why no llama between 8 and 70b?

Anonymous
07/25/24(Thu)03:46:17 No.101563274

Anonymous 07/25/24(Thu)03:46:17 No.101563274

prompt processing is the bane of my existence on these large-context models
may as well just dump all 6000 tokens of lorebook into the context from the start rather than trying to use them normally

Anonymous
07/25/24(Thu)03:47:33 No.101563282

Anonymous 07/25/24(Thu)03:47:33 No.101563282

>>101563229
Because fuck you. There are only two use cases Meta cares about: evaluation and corporate usage.

Anonymous
07/25/24(Thu)03:47:58 No.101563289

Anonymous 07/25/24(Thu)03:47:58 No.101563289

>>101563274
Yeah...

llama_print_timings: load time = 95.89 ms
llama_print_timings: sample time = 0.00 ms / 1 runs ( 0.00 ms per token, inf tokens per second)
llama_print_timings: prompt eval time = 108037.51 ms / 65533 tokens ( 1.65 ms per token, 606.58 tokens per second)
llama_print_timings: eval time = 3127.70 ms / 59 runs ( 53.01 ms per token, 18.86 tokens per second)
llama_print_timings: total time = 145147.80 ms / 65592 tokens

CtxLimit:131072/131072, Amt:60/60, Process:107.12s (1.6ms/T = 611.79T/s), Generate:4.57s (76.1ms/T = 13.14T/s), Total:111.68s (0.54T/s)

Anonymous
07/25/24(Thu)03:49:15 No.101563296

Anonymous 07/25/24(Thu)03:49:15 No.101563296

>>101560260
Powerinfer2 works partly thanks to MoE. There are probably even better models for local, but mixtral is best we have for now.

Dense models only make some sense for batch processing, not local. When you are working on 128+ conversations at a time, one of them will always need one of the parameters so using/predicting activation sparsity has limited use (will use a fraction of the FLOPs, but still needs the same memory bandwidth).

The future of local is predicted sparse activation.

Anonymous
07/25/24(Thu)03:59:52 No.101563372

Anonymous 07/25/24(Thu)03:59:52 No.101563372

>>101563289
It also slows down the higher it gets. Pretty speedy up to 64k then it crawls the last few k to 131k.

Anonymous
07/25/24(Thu)04:03:55 No.101563397

Anonymous 07/25/24(Thu)04:03:55 No.101563397

Think they plan on releasing a mistral medium 70b?

Anonymous
07/25/24(Thu)04:08:21 No.101563440

Anonymous 07/25/24(Thu)04:08:21 No.101563440

>>101563397
You mean a tune of Llama 3 70B like they made of Llama 2 65B? I assume not.

Anonymous
07/25/24(Thu)04:09:15 No.101563449

Anonymous 07/25/24(Thu)04:09:15 No.101563449

File: 1717392122186284.jpg (153 KB, 1280x1039)

153 KB JPG

openai won.

Anonymous
07/25/24(Thu)04:10:08 No.101563458

Anonymous 07/25/24(Thu)04:10:08 No.101563458

>>101563440
They did a tune of llama2-70b. There was no llama2-65b only a llama1-65b.

Anonymous
07/25/24(Thu)04:14:51 No.101563500

Anonymous 07/25/24(Thu)04:14:51 No.101563500

so 405b > mistral large for coding for sure. even had instances where 405b came up with something neater than sonnet 3.5
personality/rp wise, I prefer mistral large, 405b is just a tad more dull but it seems to depend quite a bit on the persona you are going for. very SFW stuff it does quite well
and mistrals pricing is a bit delusional given 405b is 1/3rd as expensive on openrouter
so 405b is quite a good release in my books, and either way we eating good this week

Anonymous
07/25/24(Thu)04:20:27 No.101563535

Anonymous 07/25/24(Thu)04:20:27 No.101563535

post your blackpill and whitepill
blackpill: there would be such a gap in performance and ability between >400b models and small distilled models that companies using the big models will be always ahead of the competition and big AI labs will gatekeep the API access for these big ones
whitepill: there's a huge amount of AI research (most of which are still underdiscussed) that a few breakthroughs here and there from research teams or anons will make the small models capable enough that small-medium companies can just tune it on their own use cases and be good enough for 90% of the time
graypill: most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse

Anonymous
07/25/24(Thu)05:03:02 No.101563779

Anonymous 07/25/24(Thu)05:03:02 No.101563779

>>101563535
As a whitepill I'd add that we're probably on the cusp of widespread adoption of simultaneuos translation within a few years. That's a very big deal, especially in countries where the vast majority of the population don't speak english (or not very well). Communication is going to get a lot more efficient.

Anonymous
07/25/24(Thu)05:03:23 No.101563784

Anonymous 07/25/24(Thu)05:03:23 No.101563784

File: file.png (56 KB, 979x512)

56 KB PNG

Does Mistral Large not work in Koboldcpp yet, or is my download broken?
The changelog for the latest version says that Nemo is supported now, but Large might be different enough again to not work?

Anonymous
07/25/24(Thu)05:05:17 No.101563794

Anonymous 07/25/24(Thu)05:05:17 No.101563794

>>101563784
>he downloaded from 'rancher

Anonymous
07/25/24(Thu)05:08:06 No.101563818

Anonymous 07/25/24(Thu)05:08:06 No.101563818

>>101563784
just to make sure, you do have the two parts, part1of2 and part2of2 in the same folder, yes?

Anonymous
07/25/24(Thu)05:08:12 No.101563821

Anonymous 07/25/24(Thu)05:08:12 No.101563821

File: file.png (53 KB, 828x460)

53 KB PNG

What are good nemo presets?
It's a bit too sovlful for me atm

Anonymous
07/25/24(Thu)05:09:16 No.101563832

Anonymous 07/25/24(Thu)05:09:16 No.101563832

File: file.png (9 KB, 816x47)

9 KB PNG

>>101563818
Yes, and the file sizes look plausible

Anonymous
07/25/24(Thu)05:10:15 No.101563842

Anonymous 07/25/24(Thu)05:10:15 No.101563842

>>101563832
you need to concatenate with copy /b or something
the retard you downloaded from doesn't split them with gguf-split but manually for whatever reason

Anonymous
07/25/24(Thu)05:17:36 No.101563893

Anonymous 07/25/24(Thu)05:17:36 No.101563893

>>101563842
that worked, thanks

Anonymous
07/25/24(Thu)05:23:32 No.101563915

Anonymous 07/25/24(Thu)05:23:32 No.101563915

mistral large same prompt format as 8x22 right?

Anonymous
07/25/24(Thu)05:33:09 No.101563975

Anonymous 07/25/24(Thu)05:33:09 No.101563975

>>101563535
Open models are all made by cowards using the most traditional known to work transformers and training methods.

GPT2+swiglu+ROPE+minor-tweaks => every fucking open model except Mixtral. Open models are at a standstill because they are all sackless.

Anonymous
07/25/24(Thu)05:35:11 No.101563984

Anonymous 07/25/24(Thu)05:35:11 No.101563984

>>101563821
For basic blah blah I just run neutralized samplers + 0.1 min p. Smooth sampling optional.

Anonymous
07/25/24(Thu)05:41:27 No.101564020

Anonymous 07/25/24(Thu)05:41:27 No.101564020

So how much does it cost to make a LoRA for 405b?

Anonymous
07/25/24(Thu)05:46:59 No.101564053

Anonymous 07/25/24(Thu)05:46:59 No.101564053

>>101564020
Hard to say. I don't think there's even a remote workstation big enough to rent and fit the entire thing in memory.
Totally unfeasible for average person, even with a lot of spare cash.

Anonymous
07/25/24(Thu)05:48:19 No.101564059

Anonymous 07/25/24(Thu)05:48:19 No.101564059

>>101564053
I have a small company.

Could I do it for 50k?

Anonymous
07/25/24(Thu)05:55:31 No.101564090

Anonymous 07/25/24(Thu)05:55:31 No.101564090

I realized that the context template that was posted here for mistral nemo makes it go schizo
I switched back to the old mistral one

Anonymous
07/25/24(Thu)05:55:55 No.101564092

Anonymous 07/25/24(Thu)05:55:55 No.101564092

>>101564059
for 50k it's doable. No idea why you would want to do it tho

Anonymous
07/25/24(Thu)06:00:35 No.101564120

Anonymous 07/25/24(Thu)06:00:35 No.101564120

Pleb weights for mistral large dropping
https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/tree/main

Anonymous
07/25/24(Thu)06:08:09 No.101564169

Anonymous 07/25/24(Thu)06:08:09 No.101564169

File: iq1s.png (47 KB, 623x279)

47 KB PNG

>>101564120
Seeing picrel, I notice that not even IQ1_S (the smallest quantization) is fully ternary. How did llama.cpp developers determine what quantization levels to use for the various layers?

Anonymous
07/25/24(Thu)06:08:26 No.101564172

Anonymous 07/25/24(Thu)06:08:26 No.101564172

What is it about this whole topic of opensource? Big tech gives us the scraps from the table and we worship them in the hope that another crumb will soon fall from the table - because of the thing itself, nothing will change - do I have to redefine open source for myself as a kind of saliva licker?
But le ecosystem? :>

Anonymous
07/25/24(Thu)06:18:08 No.101564230

Anonymous 07/25/24(Thu)06:18:08 No.101564230

>>101564172

It's "open" if you have money. I'd rather pay the Nvidia tax than have Altman and his cronies have complete, unfettered access to the tech and bar us plebs from ever being able to ERP in peace.

Anonymous
07/25/24(Thu)06:31:23 No.101564345

Anonymous 07/25/24(Thu)06:31:23 No.101564345

>>101562512
>why does no one invests into making porn more available with no benefit
gee i wonder

Anonymous
07/25/24(Thu)07:01:24 No.101564545

Anonymous 07/25/24(Thu)07:01:24 No.101564545

>>101562518
The next pony is going to be so cucked you won't believe it.

Anonymous
07/25/24(Thu)07:04:19 No.101564560

Anonymous 07/25/24(Thu)07:04:19 No.101564560

>>101562533
>pony had to nuke all the artist tags
The pony author certainly didn't "have" to, but did anyway. And the next pony model is almost certainly also going to be pruned of cunny-adjacent stuff as well, among other things.

Anonymous
07/25/24(Thu)07:19:34 No.101564680

Anonymous 07/25/24(Thu)07:19:34 No.101564680

I have an extra $1000 unexpectedly, can I get a lot of vram for this money? Not buying a 3090, I either get 48gb+ vram or nothing

Anonymous
07/25/24(Thu)07:23:12 No.101564708

Anonymous 07/25/24(Thu)07:23:12 No.101564708

>>101563535
>graypill: most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse
thats the most pathetic cope from codeshitters yet

Anonymous
07/25/24(Thu)07:27:27 No.101564734

Anonymous 07/25/24(Thu)07:27:27 No.101564734

>>101564120
Is IQ2_K usable? I really don't want to offload.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/25/24(Thu)07:29:59 No.101564754

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/25/24(Thu)07:29:59 No.101564754

>>101564169
I think I. Kawrakow did it by just manually testing which layers are most sensitive to a loss in precision and assuming that the combinations that work well for one model will generalize to all models.

Anonymous
07/25/24(Thu)07:34:48 No.101564797

Anonymous 07/25/24(Thu)07:34:48 No.101564797

>>101564734
Looks promising on first impression for its size class.

Anonymous
07/25/24(Thu)07:35:00 No.101564800

Anonymous 07/25/24(Thu)07:35:00 No.101564800

None of the mistral ST presets work for me. I gave up and changed to Alpaca and shit just werked. Yes I have immense skill issue

Anonymous
07/25/24(Thu)07:39:05 No.101564837

Anonymous 07/25/24(Thu)07:39:05 No.101564837

>>101560019
is this done with an llm? if so, very good.

Anonymous
07/25/24(Thu)07:39:29 No.101564844

Anonymous 07/25/24(Thu)07:39:29 No.101564844

>>101563449
When was lmsys last relevant? Even a 7b can answer 99% of the questions normies ask, and it just turns into a readability/presentation benchmark

Anonymous
07/25/24(Thu)07:41:15 No.101564858

Anonymous 07/25/24(Thu)07:41:15 No.101564858

llama 3.1 mmproj when?

Anonymous
07/25/24(Thu)07:41:34 No.101564862

Anonymous 07/25/24(Thu)07:41:34 No.101564862

>downloaded Mini-Magnum
>can't load the GGUF in ooba
Am I retarded or is it not supported yet? I've been out of the loop.

Anonymous
07/25/24(Thu)07:42:28 No.101564869

Anonymous 07/25/24(Thu)07:42:28 No.101564869

someone asked about a bitesize model that could be packaged with a game the other day
what did they/you decide on

Anonymous
07/25/24(Thu)07:43:43 No.101564878

Anonymous 07/25/24(Thu)07:43:43 No.101564878

>>101564680
You could buy a pair of 24GB P40 graphics cards for under $1k.

Anonymous
07/25/24(Thu)07:49:14 No.101564925

Anonymous 07/25/24(Thu)07:49:14 No.101564925

>>101564734
In my opinion < 4 BPW is not worth the drop in quality.

Anonymous
07/25/24(Thu)07:51:31 No.101564943

Anonymous 07/25/24(Thu)07:51:31 No.101564943

anything better than mistral for vramlets yet?

Anonymous
07/25/24(Thu)07:52:31 No.101564953

Anonymous 07/25/24(Thu)07:52:31 No.101564953

>>101564943
>mistral
MIXtral*

Anonymous
07/25/24(Thu)07:53:09 No.101564959

Anonymous 07/25/24(Thu)07:53:09 No.101564959

>>101564943
No, nothing, none whatsoever.
Mistral is great, amazing, spectacular.

Anonymous
07/25/24(Thu)07:53:49 No.101564969

Anonymous 07/25/24(Thu)07:53:49 No.101564969

Is the PygmalionAI team fine tuning the LLAMA 3.1?

Anonymous
07/25/24(Thu)07:56:45 No.101564996

Anonymous 07/25/24(Thu)07:56:45 No.101564996

>>101564943
Gemma 27b q4_k_m

Anonymous
07/25/24(Thu)07:56:47 No.101564997

Anonymous 07/25/24(Thu)07:56:47 No.101564997

Large2 is slop

Anonymous
07/25/24(Thu)08:01:56 No.101565037

Anonymous 07/25/24(Thu)08:01:56 No.101565037

>>101564969
Why would you care? Finetuning smart and engaging models has gotten too complex for two and a half people in their basement to create something worthwhile as of July 2024. That also applies to most other would-be LLM finetuners, btw.

Keep enjoying your Nemo/Gemma/Mistral Large/etc.

Anonymous
07/25/24(Thu)08:04:06 No.101565056

Anonymous 07/25/24(Thu)08:04:06 No.101565056

>>101565037
>t. skillet

Anonymous
07/25/24(Thu)08:04:39 No.101565060

Anonymous 07/25/24(Thu)08:04:39 No.101565060

>>101564996
Gemma 2 27B Q6_K with output and embed tensors quantized to Q8_0 can be fully loaded in 24GB. Now it just needs FlashAttention2 support to work for 8k tokens context...

Anonymous
07/25/24(Thu)08:08:00 No.101565083

Anonymous 07/25/24(Thu)08:08:00 No.101565083

>>101563535
>most of the code and text data that will be written in 5-7 years will be from AIs and we're about to witness the equivalent of 1980s-2010s chinese product quality but for software and 200x worse
that doesnt even make sense.
where is software good?
OS all suck now.
Even the internet sucks. I swear I had better loading times with my 56k modem.
The pics were slow but the page was there instantly.
Everything seems to get worse for a while now actually.

Anonymous
07/25/24(Thu)08:10:04 No.101565096

Anonymous 07/25/24(Thu)08:10:04 No.101565096

mini-magnum is pretty good for RP compared to the previous slop people dished out. And I was using L3-Euryale-2.1 before this (sold by 2nd 3090)

Anonymous
07/25/24(Thu)08:10:18 No.101565097

Anonymous 07/25/24(Thu)08:10:18 No.101565097

>>101565056
That their latest experimental model (Magnum-72B) is based on Qwen2-***Instruct*** demonstrates exactly that. Finetuning base models into something usable and smart has become too cumbersome/expensive/risky. You can't just use chat logs anymore like in early 2023.

Anonymous
07/25/24(Thu)08:12:12 No.101565105

Anonymous 07/25/24(Thu)08:12:12 No.101565105

File: Nero.jpg (18 KB, 1047x86)

18 KB JPG

>testing Nemo GGUF
>doesn't seem to be as good as people say it is
>ask it if it's really Nemo
huh?

Anonymous
07/25/24(Thu)08:15:02 No.101565121

Anonymous 07/25/24(Thu)08:15:02 No.101565121

File: yakub.jpg (7 KB, 196x257)

7 KB JPG

>>101565105
>neroid empowered rational operator

Anonymous
07/25/24(Thu)08:18:07 No.101565154

Anonymous 07/25/24(Thu)08:18:07 No.101565154

>>101565105
What the fuck?

Anonymous
07/25/24(Thu)08:18:22 No.101565159

Anonymous 07/25/24(Thu)08:18:22 No.101565159

>>101565105
This looks like a typical jailbreak misfire, check your prompt retard

Anonymous
07/25/24(Thu)08:19:00 No.101565166

Anonymous 07/25/24(Thu)08:19:00 No.101565166

>>101565105
jej

Anonymous
07/25/24(Thu)08:19:49 No.101565178

Anonymous 07/25/24(Thu)08:19:49 No.101565178

>>101565105
[INST] and [/INST] are special tokens in Nemo, you don't need to add spaces around them.

Anonymous
07/25/24(Thu)08:26:19 No.101565244

Anonymous 07/25/24(Thu)08:26:19 No.101565244

File: lecun_dontworkonllms.png (462 KB, 580x895)

462 KB PNG

>>101565037
>t. LeCun

Anonymous
07/25/24(Thu)08:28:39 No.101565273

Anonymous 07/25/24(Thu)08:28:39 No.101565273

>>101565159
specifically wiped context memory for this little test
i had a first prompt with just "who are you" to which it answered "I am a large language model developed by Mistral in cooperation with NVIDIA"
but i wanted it to say Nemo, hence the second question
>>101565178
ok thanks

Anonymous
07/25/24(Thu)08:31:29 No.101565308

Anonymous 07/25/24(Thu)08:31:29 No.101565308

>>101563087
They do slip in sloppy output at times, but the thing I like about it is its variety. Unlike previous untuned mistrals where it's all dry and clinical no matter what you do, swiping does get you somewhere else. I'll keep testing it and see.

Anonymous
07/25/24(Thu)08:49:20 No.101565467

Anonymous 07/25/24(Thu)08:49:20 No.101565467

>>101563915
With a space after [INST] and [/INST], but not before the later.
<s>[INST] system message

user message[/INST] assistant response</s>

Anonymous
07/25/24(Thu)08:52:57 No.101565504

Anonymous 07/25/24(Thu)08:52:57 No.101565504

>>101565244
I think he is 100% correct. I will reconsider when I get even one model that never mentions shivers when it sucks my cock. But I don't think that is gonna happen.

Anonymous
07/25/24(Thu)08:56:12 No.101565537

Anonymous 07/25/24(Thu)08:56:12 No.101565537

What are the political implications of perfectly predicted next token != actual fact / truth / solution to a problem.

Anonymous
07/25/24(Thu)09:00:16 No.101565572

Anonymous 07/25/24(Thu)09:00:16 No.101565572

>>101565504
What I find is that the larger parameter models have greater ability for simulacrum. Meaning they can deal with 'worlds within worlds' better without blurring the boundaries or having things leak between inner and outer realities.

With a 100B+ model, you should definitely be able to tell it "You are an AI that never says shivers, ministrations, bonds" with no detrimental effect on output quality

Anonymous
07/25/24(Thu)09:01:05 No.101565579

Anonymous 07/25/24(Thu)09:01:05 No.101565579

>>101563535
Whitepill: LLMs will become smart enough in the future to decompile proprietary software and understand what even the most illiterate pajeet wants from the software.
Blackpill: "You VILL eat ze bugz own nottin and be happi" people will push their regulations and surveillance capitalism even harder.
Graypill: A lot of useless jobs will be eliminated.

Anonymous
07/25/24(Thu)09:01:26 No.101565582

Anonymous 07/25/24(Thu)09:01:26 No.101565582

Boys I have a 6000 Ada and I want to run mistral large 2. Should I buy another Ada or an 80GB A100 or something?

Anonymous
07/25/24(Thu)09:02:19 No.101565587

Anonymous 07/25/24(Thu)09:02:19 No.101565587

>>101563915
It likes alpaca

Anonymous
07/25/24(Thu)09:02:54 No.101565591

Anonymous 07/25/24(Thu)09:02:54 No.101565591

>>101565504
Anon, there's only so many ways to describe said cock sucking. English is a limited language, eventually the context AROUND the scene/the story created up until then is going to be the defining factor, rather than the language used. Which sucks, because LLMs fucking suck at longterm buildup and payoff.

Anonymous
07/25/24(Thu)09:03:39 No.101565594

Anonymous 07/25/24(Thu)09:03:39 No.101565594

>>101565105
Models don't "know" details about themselves unless it's put into the training data. If they don't know they'll make it up.

Anonymous
07/25/24(Thu)09:04:15 No.101565600

Anonymous 07/25/24(Thu)09:04:15 No.101565600

>>101565591
>Which sucks, because LLMs fucking suck at longterm buildup and payoff.
This is such a problem for me, I don't know how people do 500+ message chats with models that can't create and resolve a plot thread to save their lives. What could be done to fix it in training?

Anonymous
07/25/24(Thu)09:04:41 No.101565603

Anonymous 07/25/24(Thu)09:04:41 No.101565603

>>101563535
>5-7 years
the world will not exist in 5 to 7 years

Anonymous
07/25/24(Thu)09:07:04 No.101565622

Anonymous 07/25/24(Thu)09:07:04 No.101565622

>>101565537
>ask model about very specific and uncontroversial scientific topic
>get smart answer because mostly smart people talk about it
>think the model is smart
>ask about politically charged topic
>get retarded answer because terminally online retards talk about it
>think this is a smart answer
There are only political implications if you confuse a language model with a real person.
The main problem is that Silicon Valley grifters like to pretend that upscaling transformers is the path to AGI in order to get VC money.

Anonymous
07/25/24(Thu)09:07:25 No.101565627

Anonymous 07/25/24(Thu)09:07:25 No.101565627

>>101565603
>the world will not exist in 5 to 7 years
Yeah... I've heard about that one for 2012, still waiting...

Anonymous
07/25/24(Thu)09:07:55 No.101565632

Anonymous 07/25/24(Thu)09:07:55 No.101565632

>>101565591
>there's only so many ways to describe said cock sucking
That is my impression too. But I also have a dream that one day I will start a cock sucking session. Then tell her OOC that I don't like shivers down the spine. And gleams in the eyes. And you get what I mean right? All those phrases. Don't use those. And she will say OK, easily classify them in her "brain" and stop using them from that point on. I don't see this ever happening for any current LLM that predicts the next token.

Anonymous
07/25/24(Thu)09:08:26 No.101565637

Anonymous 07/25/24(Thu)09:08:26 No.101565637

>>101565582
Ricky, listen to me. Don't be a dumbass. We're going to rob that datacenter (sips rum and coke) and you'll have all the A100s you want, OK? I've got it all worked out. Things are gonna change here, Ricky. No more living in a car.

Anonymous
07/25/24(Thu)09:09:08 No.101565641

Anonymous 07/25/24(Thu)09:09:08 No.101565641

>>101565096
>L3-Euryale-2.1
You must be extremely retarded.

Anonymous
07/25/24(Thu)09:11:29 No.101565669

Anonymous 07/25/24(Thu)09:11:29 No.101565669

>>101565632
Definitely. I would absolutely love to have my own fetish's various slops eliminated, it's 1000% true that mixing up the language even a bit can make things feel way fresher. Thankfully, even though the human brain is really good at recognizing patterns, it's also really easy to trick into thinking things are novel.

Anonymous
07/25/24(Thu)09:12:47 No.101565683

Anonymous 07/25/24(Thu)09:12:47 No.101565683

>>101565622
Political implications was just a memeable figure of speech. But yes I meant roughly something like that - the next most likely token can be absolutely retarded and this is what all the models are learning.

Anonymous
07/25/24(Thu)09:13:58 No.101565694

Anonymous 07/25/24(Thu)09:13:58 No.101565694

If you used llama 3.1 on ollama, the new ollama 3.0 update patched the models, so you should redownload them.

Anonymous
07/25/24(Thu)09:14:13 No.101565697

Anonymous 07/25/24(Thu)09:14:13 No.101565697

>>101565641
Ok drummer.

Anonymous
07/25/24(Thu)09:15:36 No.101565712

Anonymous 07/25/24(Thu)09:15:36 No.101565712

>>101565694
go back

Anonymous
07/25/24(Thu)09:17:10 No.101565721

Anonymous 07/25/24(Thu)09:17:10 No.101565721

https://poal.me/np0lsk

Anonymous
07/25/24(Thu)09:18:36 No.101565735

Anonymous 07/25/24(Thu)09:18:36 No.101565735

>>101565721
how do i vote for go back?

Anonymous
07/25/24(Thu)09:21:39 No.101565770

Anonymous 07/25/24(Thu)09:21:39 No.101565770

I should have broken Undi down into Undi(fr fr) and Undi(I am memeing)

Anonymous
07/25/24(Thu)09:27:04 No.101565809

Anonymous 07/25/24(Thu)09:27:04 No.101565809

>>101565770
nah, unid wonned stay mad soa

Anonymous
07/25/24(Thu)09:27:10 No.101565812

Anonymous 07/25/24(Thu)09:27:10 No.101565812

Can someone post PPL/KLD/MMLU benchmark comparisons for different mistral large quants? Trying to figure out if quanting it all the way down to IQ3XXS or even IQ2 is worth it

Anonymous
07/25/24(Thu)09:31:26 No.101565867

Anonymous 07/25/24(Thu)09:31:26 No.101565867

>>101565721
3 serious votes and 12 meme votes. And people here think r/localllama is bad while /lmg/ is the actual place to discuss AI topics.

Anonymous
07/25/24(Thu)09:33:27 No.101565887

Anonymous 07/25/24(Thu)09:33:27 No.101565887

Ok so what kind of rig will i need to run a 405B model?

Anonymous
07/25/24(Thu)09:34:12 No.101565891

Anonymous 07/25/24(Thu)09:34:12 No.101565891

>>101565867
the userbase of r/localllama and /lmg/ is a circle. we just come here to shitpost because moderation is nonexistant here compared to reddit

Anonymous
07/25/24(Thu)09:34:30 No.101565893

Anonymous 07/25/24(Thu)09:34:30 No.101565893

File: angry miku hatsune.jpg (73 KB, 736x973)

73 KB JPG

Will ooba ever fix the mistral-nemo loader issie?

Anonymous
07/25/24(Thu)09:34:53 No.101565898

Anonymous 07/25/24(Thu)09:34:53 No.101565898

>>101565891
you're not supposed to say it tho

Anonymous
07/25/24(Thu)09:35:13 No.101565902

Anonymous 07/25/24(Thu)09:35:13 No.101565902

>>101565887
at least one raspberry pi

Anonymous
07/25/24(Thu)09:43:49 No.101565980

Anonymous 07/25/24(Thu)09:43:49 No.101565980

>>101565721
where is "none of the above"?

Anonymous
07/25/24(Thu)09:47:28 No.101566009

Anonymous 07/25/24(Thu)09:47:28 No.101566009

Does anyone know what rope scaling actually does? I'm loading gemma-27b-it to 8k context with llama.cpp but am paranoid that it's making it retarded, I can't notice any difference but am constantly switching back and forth because I don't trust it...

Anonymous
07/25/24(Thu)09:52:11 No.101566053

Anonymous 07/25/24(Thu)09:52:11 No.101566053

>>101565887
I can't imagine any commercial offering using such a large model. It takes a full H100 SXM5 server to run, and it's not like it's going to serve multiple prompts per second, so how do you monetize such a thing? It's just burning cash in a dick waving contest.

Anonymous
07/25/24(Thu)09:52:22 No.101566056

Anonymous 07/25/24(Thu)09:52:22 No.101566056

>>101566009
It's retarded.
https://desuarchive.org/g/thread/101287708/#101295729
https://desuarchive.org/g/thread/101392789/#101398280
I noticed it with formatting, like with a markdown code block. At 8k it gets it right, but scaled it starts to make mistakes.

Anonymous
07/25/24(Thu)09:54:33 No.101566083

Anonymous 07/25/24(Thu)09:54:33 No.101566083

File: file.png (139 KB, 1368x772)

139 KB PNG

>>101566053

Anonymous
07/25/24(Thu)09:56:24 No.101566100

Anonymous 07/25/24(Thu)09:56:24 No.101566100

>>101566083
anecdotal, but fireworks is doing some shady shit. llama3 70b was underperforming on benchmarks last I tested it through their api

Anonymous
07/25/24(Thu)09:56:28 No.101566103

Anonymous 07/25/24(Thu)09:56:28 No.101566103

>>101566056
>At 8k it gets it right, but scaled it starts to make mistakes
But its supported context is 4k. Are you saying it's fine scaled 2x to 8k?

Anonymous
07/25/24(Thu)09:56:53 No.101566109

Anonymous 07/25/24(Thu)09:56:53 No.101566109

>>101566083
Is that per token?

Anonymous
07/25/24(Thu)09:58:18 No.101566129

Anonymous 07/25/24(Thu)09:58:18 No.101566129

>>101566109
yes, $3 per token in, if you send a large document you have to pay a year's salary

Anonymous
07/25/24(Thu)09:58:49 No.101566135

Anonymous 07/25/24(Thu)09:58:49 No.101566135

>>101566103
>But its supported context is 4k
no, it's 8k
>"max_position_embeddings": 8192,
https://huggingface.co/UCLA-AGI/Gemma-2-9B-It-SPPO-Iter3/blob/main/config.json#L19

Anonymous
07/25/24(Thu)09:59:29 No.101566140

Anonymous 07/25/24(Thu)09:59:29 No.101566140

>>101566129
still probably worth it. 405b has to be smarter than the average salaryman

Anonymous
07/25/24(Thu)09:59:32 No.101566141

Anonymous 07/25/24(Thu)09:59:32 No.101566141

File: 1721103827875.png (76 KB, 1850x175)

76 KB PNG

>>101566103
Yeah, it works at 8k without scaling.

Anonymous
07/25/24(Thu)10:00:54 No.101566154

Anonymous 07/25/24(Thu)10:00:54 No.101566154

>>101566135
>>101566141
Damn I guess I'm just retarded then, I was switching back and forth between 4k and 8k trying to notice the difference and thought I was going insane, thanks

Anonymous
07/25/24(Thu)10:08:35 No.101566245

Anonymous 07/25/24(Thu)10:08:35 No.101566245

All right, prices are beginning to descend from the delusional stratosphere on Volta. Check it: https://www.ebay.com/itm/335490833886
$1600 isn't bad if they're V100 32GB. If they're just 16GB, forget it, just continue to stack P100s.

Anonymous
07/25/24(Thu)10:10:42 No.101566264

Anonymous 07/25/24(Thu)10:10:42 No.101566264

>>101566245
And that's the complete server? Just plug that whore in and you're good? I can't believe that's less than $2k. Holy shit.
Of course the caveat is obvious, loud, and its a power hog. But how bad?

Anonymous
07/25/24(Thu)10:12:31 No.101566277

Anonymous 07/25/24(Thu)10:12:31 No.101566277

put the fucking base 405B on the openrouter or else, basterd

Anonymous
07/25/24(Thu)10:22:13 No.101566382

Anonymous 07/25/24(Thu)10:22:13 No.101566382

>>101566245
>w/ 4x TESLA P100 SXM2
>if they're V100 32GB
...

Anonymous
07/25/24(Thu)10:22:43 No.101566390

Anonymous 07/25/24(Thu)10:22:43 No.101566390

what's the writing style called where it's like:
>She walked into my office smelling of trouble, a dame on dagger heels.
like casablancian or something

Anonymous
07/25/24(Thu)10:23:27 No.101566400

Anonymous 07/25/24(Thu)10:23:27 No.101566400

Who will become rich from founding a video generation company?

Anonymous
07/25/24(Thu)10:23:33 No.101566401

Anonymous 07/25/24(Thu)10:23:33 No.101566401

File: 0_tADWew2sYSAk1pvM.jpg (20 KB, 640x480)

20 KB JPG

Which is better, Meta's Llama 3.1, or Google's Gemma 2?

I am just downloading Gemma 2 to run it on my machine. I just tried it online and it seemed to answer some questions better than Llama 3.1 (just for my particular questions - I'm sure Llama 3.1 is probably better for other types of questions)

Anonymous
07/25/24(Thu)10:24:31 No.101566412

Anonymous 07/25/24(Thu)10:24:31 No.101566412

>>101566390
hardboiled / detective noir

Anonymous
07/25/24(Thu)10:25:05 No.101566416

Anonymous 07/25/24(Thu)10:25:05 No.101566416

>>101566401
>I'm sure Llama 3.1 is probably better for other types of questions)
it's not. Gemma 2 mogs llama 3.1 in every way

Anonymous
07/25/24(Thu)10:26:10 No.101566425

Anonymous 07/25/24(Thu)10:26:10 No.101566425

>>101566401
Gemma 2 spat out complete schizo shit in every language known to man when I ran it with llamacpp, let me know which settings you're using once you get it to work

Anonymous
07/25/24(Thu)10:26:12 No.101566426

Anonymous 07/25/24(Thu)10:26:12 No.101566426

>>101566401
Gemma 2 is way better

Current tierlist is:
L3.1 70B
--small gap--
Gemma 2 27B
--big gap--
Nemo 12B
--tiny gap--
Gemma 9B = L3.1 8B

Anonymous
07/25/24(Thu)10:26:52 No.101566435

Anonymous 07/25/24(Thu)10:26:52 No.101566435

>>101566401
In the <10B weight class robably around the same when it comes to being a general assistant. Gemma can be better for ERP. The issue with Gemma is the low context, which can limit your use cases.

Anonymous
07/25/24(Thu)10:28:18 No.101566447

Anonymous 07/25/24(Thu)10:28:18 No.101566447

>>101566426
I have been using exclusively command-r-plus q4_k_m for months and suffering 0.6 tokens/s on cpu, gemma 2 27b is the only model I've tried so far (including 70b models) that is as good at reasoning as cr+, so I find it hard to believe l3.1 70b is better

Anonymous
07/25/24(Thu)10:29:43 No.101566464

Anonymous 07/25/24(Thu)10:29:43 No.101566464

>>101566426
Is Gemma better than largestral

Anonymous
07/25/24(Thu)10:31:13 No.101566483

Anonymous 07/25/24(Thu)10:31:13 No.101566483

>>101566264
1U is going to be very loud. I had an R720 with a pair of P100s, and that's a 2U, and it was loud since it needed a 60% fan offset to maintain the needed idle airflow.
There's nothing wrong with 1U, just keep it in a separate room behind a closed door.

Anonymous
07/25/24(Thu)10:31:43 No.101566486

Anonymous 07/25/24(Thu)10:31:43 No.101566486

Do you guys prefer gemma 2 it or base?

Anonymous
07/25/24(Thu)10:32:42 No.101566496

Anonymous 07/25/24(Thu)10:32:42 No.101566496

>>101566382
Well yeah of course it conveniently leaves that out so unless you get it in writing, assume it's 16GB V100s. Or maybe you can be fuck them over on a NAD claim and get it for free.

Anonymous
07/25/24(Thu)10:32:43 No.101566498

Anonymous 07/25/24(Thu)10:32:43 No.101566498

>>101566464
not even close

Anonymous
07/25/24(Thu)10:34:00 No.101566508

Anonymous 07/25/24(Thu)10:34:00 No.101566508

>>101566496
are you genuinely retarded?

Anonymous
07/25/24(Thu)10:35:10 No.101566521

Anonymous 07/25/24(Thu)10:35:10 No.101566521

>>101566401
Are you using this as some kind of evaluation or are there questions you ask a LLM because you really want an answer?

Anonymous
07/25/24(Thu)10:38:10 No.101566551

Anonymous 07/25/24(Thu)10:38:10 No.101566551

>>101566521
>actually trying to be useful by determing use case
sir, we don't do that here, you just shill what you prefer.

Anonymous
07/25/24(Thu)10:38:13 No.101566552

Anonymous 07/25/24(Thu)10:38:13 No.101566552

>>101566486
I can't find any gguf quants of the base model with llamacpp fixes so there's no way to know.

Anonymous
07/25/24(Thu)10:39:14 No.101566563

Anonymous 07/25/24(Thu)10:39:14 No.101566563

>>101566486
No one uses or tests base models these days.

Anonymous
07/25/24(Thu)10:45:00 No.101566618

Anonymous 07/25/24(Thu)10:45:00 No.101566618

>>101566508
They make V100 SXM2 in 16GB and 32GB variants, so no, go fuck yourself.

Anonymous
07/25/24(Thu)10:46:54 No.101566638

Anonymous 07/25/24(Thu)10:46:54 No.101566638

Huggingface is driving me nuts with uploading my quant. I have git lfs enabled on the repo, I have the file broken up by llama-gguf-split, and still, I get the following error when pushing:
error: RPC failed; HTTP 413 curl 22 The requested URL returned error: 413
send-pack: unexpected disconnect while reading sideband packet
Writing objects: 100% (13/13), 28.39 GiB | 84.82 MiB/s, done.
Total 13 (delta 2), reused 1 (delta 0), pack-reused 0
fatal: the remote end hung up unexpectedly
Everything up-to-date
Ideas?

Anonymous
07/25/24(Thu)10:47:19 No.101566643

Anonymous 07/25/24(Thu)10:47:19 No.101566643

>>101566563
It's because base models are usually incoherent and quick to lose track of what's happening at the normal sizes especially below 70b when you try to use them for chat. But gemma is SOTA and sort of on par with 100b+ models in terms of coherence, so I wouldn't be surprised if the base model can just be plugged into sillytavern and work perfectly.

Anonymous
07/25/24(Thu)10:47:22 No.101566644

Anonymous 07/25/24(Thu)10:47:22 No.101566644

>>101566401
What's your use case? If it's RP and smut L3 absolutely sucks at it because they filtered out NSFW from training data. If it's coding then ChatGPT is free and also I wouldn't use anything less than SOTA for coding

Anonymous
07/25/24(Thu)10:48:31 No.101566657

Anonymous 07/25/24(Thu)10:48:31 No.101566657

>>101566643
>But gemma is SOTA and sort of on par with 100b+ models
wow shills in overdrive today, you gonna claim it's better than largestral i guess?

Anonymous
07/25/24(Thu)10:49:40 No.101566668

Anonymous 07/25/24(Thu)10:49:40 No.101566668

>>101566638
>error: 413
Maybe make the files smaller or upload them independently? I've never uploaded to hf but don't think receiving the error code meaning "content too large" is a bug, it's pretty clear what that means

Anonymous
07/25/24(Thu)10:49:46 No.101566670

Anonymous 07/25/24(Thu)10:49:46 No.101566670

>>101566638
I think you should use their own lib to upload to their repos in that case.

Anonymous
07/25/24(Thu)10:50:40 No.101566680

Anonymous 07/25/24(Thu)10:50:40 No.101566680

>>101566657
Largestral literally just came out so I don't know. What I meant was that it's about as good as CR+

Anonymous
07/25/24(Thu)10:51:06 No.101566683

Anonymous 07/25/24(Thu)10:51:06 No.101566683

>>101566668
the limit is supposed to be 50gb afaik, so his 29gb should be perfectly fine

Anonymous
07/25/24(Thu)10:52:19 No.101566702

Anonymous 07/25/24(Thu)10:52:19 No.101566702

>>101566563
One big problem is that base models are too fucking schizo and loopy at the same time. Why would looping happen with base models even at high temperature (~1) anyway? I don't get that, wouldn't their wider token selection prevent it?

Anonymous
07/25/24(Thu)10:52:20 No.101566703

Anonymous 07/25/24(Thu)10:52:20 No.101566703

>>101566680
and that's absurd, try harder please if you said 70 it could at least perhaps be argued.

Anonymous
07/25/24(Thu)10:53:26 No.101566717

Anonymous 07/25/24(Thu)10:53:26 No.101566717

>>101566703
>70b
lmao nice joke
Name 1 single 70b model that's even vageuly usable (except miqu)

Anonymous
07/25/24(Thu)10:55:34 No.101566739

Anonymous 07/25/24(Thu)10:55:34 No.101566739

Are quants made with imatrix actually good? Should I be making all my quants using imatrix?

Anonymous
07/25/24(Thu)10:56:03 No.101566748

Anonymous 07/25/24(Thu)10:56:03 No.101566748

>>101566670
I'll pull them down to my desktop and try shoving them up from there in their web interface.

Anonymous
07/25/24(Thu)10:56:54 No.101566755

Anonymous 07/25/24(Thu)10:56:54 No.101566755

>>101566739
no

Anonymous
07/25/24(Thu)10:58:14 No.101566773

Anonymous 07/25/24(Thu)10:58:14 No.101566773

>>101566643
gemma is good but not that good that it jumps a size class. maybe if you have to choose between low-quant 70b and q8 gemma then yeah, but I run high quants of both and gemma is noticeably dumber, loses nuance and makes more mistakes
I would love to use a smaller, faster model if it gave the same or better results but it simply doesn't

Anonymous
07/25/24(Thu)11:00:53 No.101566799

Anonymous 07/25/24(Thu)11:00:53 No.101566799

Babe wake up, a 1T parameter model just got released
https://huggingface.co/CofeAI/Tele-FLM-1T

Anonymous
07/25/24(Thu)11:02:21 No.101566813

Anonymous 07/25/24(Thu)11:02:21 No.101566813

>>101566799
>create shitty 1T model
>claim it's good
>nobody can tell because nobody can run it

Anonymous
07/25/24(Thu)11:03:15 No.101566827

Anonymous 07/25/24(Thu)11:03:15 No.101566827

>>101566813
low-key way of storing personal backups on huggingface servers

Anonymous
07/25/24(Thu)11:03:28 No.101566830

Anonymous 07/25/24(Thu)11:03:28 No.101566830

>>101566799
> it has been trained on approximately 2T tokens
>"max_position_embeddings": 4096,
useless

Anonymous
07/25/24(Thu)11:03:59 No.101566836

Anonymous 07/25/24(Thu)11:03:59 No.101566836

File: 4a5001b7beea096457f480c88(...).jpg (103 KB, 1420x946)

103 KB JPG

>>101566813

Anonymous
07/25/24(Thu)11:05:32 No.101566850

Anonymous 07/25/24(Thu)11:05:32 No.101566850

>>101566830
i'd rather they use the compute on making their 52B better (10+T train, 32k+ ctx) sad to see really

Anonymous
07/25/24(Thu)11:05:36 No.101566852

Anonymous 07/25/24(Thu)11:05:36 No.101566852

>>101566755
Can you elaborate? Does it make output more sloppy or something?

Anonymous
07/25/24(Thu)11:06:17 No.101566860

Anonymous 07/25/24(Thu)11:06:17 No.101566860

>>101566053
sure, but quantized...

Anonymous
07/25/24(Thu)11:08:14 No.101566880

Anonymous 07/25/24(Thu)11:08:14 No.101566880

>>101566830
>>101566799
>we've made extensive efforts to thoroughly clean and filter the training corpus for the model

Anonymous
07/25/24(Thu)11:09:28 No.101566892

Anonymous 07/25/24(Thu)11:09:28 No.101566892

>>101566885
pedos are going to ruin videogen for us before we get a local model

Anonymous
07/25/24(Thu)11:09:38 No.101566893

Anonymous 07/25/24(Thu)11:09:38 No.101566893

>>101566799
What rig do I need to run this at Q4?

Anonymous
07/25/24(Thu)11:10:31 No.101566908

Anonymous 07/25/24(Thu)11:10:31 No.101566908

>>101566830
You don't need more than 2k context.

Anonymous
07/25/24(Thu)11:10:33 No.101566909

Anonymous 07/25/24(Thu)11:10:33 No.101566909

>>101566813
Can I run it on my gtx 950?

Anonymous
07/25/24(Thu)11:11:27 No.101566927

Anonymous 07/25/24(Thu)11:11:27 No.101566927

>>101566908
but how am i gonna have it write me an entire codebase overnight with just 512 ctx?

Anonymous
07/25/24(Thu)11:12:27 No.101566942

Anonymous 07/25/24(Thu)11:12:27 No.101566942

>>101566927
Summarization, using 256.

Anonymous
07/25/24(Thu)11:12:44 No.101566950

Anonymous 07/25/24(Thu)11:12:44 No.101566950

>>101566893
at least 20 3090s for 4bpw 40+ for 8bpw

Anonymous
07/25/24(Thu)11:13:55 No.101566967

Anonymous 07/25/24(Thu)11:13:55 No.101566967

>>101566942
you drive a hard bargain, those 64 tokens are really tempting...

Anonymous
07/25/24(Thu)11:19:02 No.101567022

Anonymous 07/25/24(Thu)11:19:02 No.101567022

>>101566989
precocious 14 year old teen girl on the beach

Anonymous
07/25/24(Thu)11:20:00 No.101567028

Anonymous 07/25/24(Thu)11:20:00 No.101567028

File: 11__00898_.png (1.21 MB, 1024x1024)

1.21 MB PNG

>>101566739
They enable quantization in smaller sizes like 1 or 2 bits (IQ quants). It requires a separate process to quant from traditional (static) ggufs but for the end-user the usage is the same.
If you can't offload them entirely to GPU your performance is going to take a hit. But importance matrix (imatrix) is what allows these quants to be smaller.
>>101566638
Sounds like a sign from above anon.

Anonymous
07/25/24(Thu)11:25:11 No.101567091

Anonymous 07/25/24(Thu)11:25:11 No.101567091

>>101564560
Had to in order to protect his ass legally for the future. He said he did so after a discussion with a lawyer. It is a grey area after all.

Anonymous
07/25/24(Thu)11:25:12 No.101567092

Anonymous 07/25/24(Thu)11:25:12 No.101567092

>>101567028
I mean at quants like Q4. Will using imatrix give better output?

Anonymous
07/25/24(Thu)11:25:18 No.101567093

Anonymous 07/25/24(Thu)11:25:18 No.101567093

>>101567050
>discord troon trying to associate miku with pedophiles
Jart lost, troons lost, ywnbaw neck yourself and leave

Anonymous
07/25/24(Thu)11:25:20 No.101567094

Anonymous 07/25/24(Thu)11:25:20 No.101567094

How does Command R compare to Mixtral 8x7B merges like BagelMIsteryTour v2 or Nous Hermes 2 Mixtruct?

Anonymous
07/25/24(Thu)11:26:34 No.101567109

Anonymous 07/25/24(Thu)11:26:34 No.101567109

>>101567094
It's similar to the difference back in the day between llama1 30b and 70b

Anonymous
07/25/24(Thu)11:26:52 No.101567111

Anonymous 07/25/24(Thu)11:26:52 No.101567111

>>101567094
all three mogged hard by dolphin 2.5 literally local gpt4 (both good and bad)

Anonymous
07/25/24(Thu)11:27:31 No.101567121

Anonymous 07/25/24(Thu)11:27:31 No.101567121

>>101567094
Why not use Nemo?

Anonymous
07/25/24(Thu)11:29:46 No.101567147

Anonymous 07/25/24(Thu)11:29:46 No.101567147

>>101566425
I just ran Gemma 2 on my laptop (9 billion parameters) and it runs fine, but it's a bit slow, and it makes the laptop sound like a jet engine every time I submit a prompt, because the CPU suddenly spikes to >90C

Indeed I don't have a GPU on this laptop. Well there's an iGPU but it doesn't seem to be using that.

Anonymous
07/25/24(Thu)11:30:47 No.101567157

Anonymous 07/25/24(Thu)11:30:47 No.101567157

are we still using mixtral for erp? or has anything changed?

Anonymous
07/25/24(Thu)11:31:10 No.101567164

Anonymous 07/25/24(Thu)11:31:10 No.101567164

>>101567147
>Well there's an iGPU but it doesn't seem to be using that.
perhaps try to use llama.cpp's vulkan backend? might work?

Anonymous
07/25/24(Thu)11:31:50 No.101567172

Anonymous 07/25/24(Thu)11:31:50 No.101567172

>>101566521
I'm looking at them because I have an idea for a website where an LLM could help with certain things. I'm wondering if I could run a model on a VPS. I think I could, but the LLM responses would take a couple of seconds if the VPS doesn't have access to a GPU. That might be okay though. Renting GPU time means more money of course.

Anonymous
07/25/24(Thu)11:33:10 No.101567182

Anonymous 07/25/24(Thu)11:33:10 No.101567182

>>101567147
iGPUs are barely usable for this anyways due to how the memory split works

Anonymous
07/25/24(Thu)11:33:29 No.101567185

Anonymous 07/25/24(Thu)11:33:29 No.101567185

I watch the flickering outputs of the model giving refusals and the code early-stopping and switching seeds/bumping temp and re-generating over and over, sometimes 10 times or so. Time wise it's all over in a second and the model falls in line, giving the requested answer.
The censoring attempts are futile. A mere inconvenience that adds a fraction of a second of overhead to an otherwise functional and efficient framework.

Anonymous
07/25/24(Thu)11:36:58 No.101567221

Anonymous 07/25/24(Thu)11:36:58 No.101567221

>>101567182
So, on the recent Intel stuff, (Iris Xe) it seems to be able to access the full system memory. However, in my testing on my lil N305 system, iGPU is not faster than CPU. I guess the only advantage is you have no load on your CPU cores, but they're going to contend for the shared memory bandwidth, so unless they improve things, iGPU is useless.

Anonymous
07/25/24(Thu)11:37:27 No.101567227

Anonymous 07/25/24(Thu)11:37:27 No.101567227

I thinking of taking some small 2B model, expand it to 1T parameters, but only the 2B parameters would ever get accessed by the network. The rest of it is filled with encrypted personal data.

Anonymous
07/25/24(Thu)11:38:21 No.101567234

Anonymous 07/25/24(Thu)11:38:21 No.101567234

>>101567121
Given comparable training quality I'd rather fill more of my GPU with model weights and less with context.

Anonymous
07/25/24(Thu)11:39:24 No.101567242

Anonymous 07/25/24(Thu)11:39:24 No.101567242

>>101567223
>>101567223
>>101567223

Anonymous
07/25/24(Thu)11:51:40 No.101567374

Anonymous 07/25/24(Thu)11:51:40 No.101567374

>>101567184
are we... the badies?

Anonymous
07/25/24(Thu)11:55:24 No.101567420

Anonymous 07/25/24(Thu)11:55:24 No.101567420

>>101567184
Teenagers are "children" only in the retarded angloamerican mind.

Anonymous
07/25/24(Thu)12:18:48 No.101567645

Anonymous 07/25/24(Thu)12:18:48 No.101567645

File: Capture.jpg (36 KB, 725x771)

36 KB JPG

Poll will close any minute now. Sao is in the lead.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.