/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 11/26/24(Tue)19:53:42 No.103317922

File: ComfyUI_34510_.png (933 KB, 848x1024)

933 KB PNG

/lmg/ - Local Models General Anonymous 11/26/24(Tue)19:53:42 No.103317922 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103312983 & >>103298520

►News
>(11/26) OLMo 2 released: https://hf.co/collections/allenai/olmo-2-674117b93ab84e98afc72edc
>(11/26) Anon re-implements Sparse Matrix Tuning paper: https://github.com/HeroMines/SMFT
>(11/25) Qwen2VL integrated with Flux: https://github.com/erwold/qwen2vl-flux
>(11/25) Speculative decoding added to llama-server: https://github.com/ggerganov/llama.cpp/pull/10455
>(11/22) LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/tldrhowtoquant

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
11/26/24(Tue)19:54:08 No.103317926

Anonymous 11/26/24(Tue)19:54:08 No.103317926

File: zzz.jpg (13 KB, 367x316)

13 KB JPG

►Recent Highlights from the Previous Thread: >>103312983

--Speculative decoding and its usage with various models:
>103313328 >103313336 >103313340 >103313551 >103313365 >103313440 >103313460 >103313463 >103313598 >103313658 >103313693
--Tulu model impressions and discussion:
>103313747 >103313769 >103313787 >103313802 >103313822 >103313853 >103313890 >103313917 >103313927 >103313950 >103313989
--Sentient: local personal companion with graph memory and agentic integrations:
>103313310 >103313339 >103313387 >103313484
--Recapbot test results and feedback:
>103315415 >103315532 >103315611
--OLMo discussion: new arch, 4k ctx, and Reddit data:
>103315697 >103315710 >103316008 >103315750 >103315847 >103315893 >103316010 >103316058
--OLMo 2 models and the state of open-source AI:
>103316073 >103316150 >103316245 >103316283
--LoRA's limitations and potential issues with fine-tuning:
>103313076 >103313114 >103313162 >103313220 >103313244 >103313313
--Discussion of kobold and booba alternatives, dev pace, and feature comparisons:
>103313177 >103313243 >103316052 >103313248 >103313345 >103313315
--Common failures and limitations of coding models:
>103316427 >103316470 >103316488 >103316513 >103316528 >103316524
--Choosing a draft model for speculative decoding with llama.cpp:
>103314138 >103314187 >103314222 >103314611 >103314742 >103314761 >103316743 >103316710 >103316739 >103314793 >103315098
--Autoround quantization and its performance compared to regular quant methods:
>103313507 >103313718
--Anons discuss language model performance and limitations, criticizing the focus on benchmarks and "meme marks":
>103313710 >103313835 >103316791 >103314144 >103314165 >103314228 >103315010 >103315109 >103315181 >103315206 >103315220
--Miku (free space):
>103313053 >103313132 >103313312 >103314097 >103314884 >103315109 >103316701 >103316754

►Recent Highlight Posts from the Previous Thread: >>103312989

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
11/26/24(Tue)19:59:24 No.103317977

Anonymous 11/26/24(Tue)19:59:24 No.103317977

Okay, if Tulu is so good. Why cant I find any mention of it on reddit?

Anonymous
11/26/24(Tue)20:01:53 No.103318006

Anonymous 11/26/24(Tue)20:01:53 No.103318006

>>103317977
Because its not drummer shilling his model?

Anonymous
11/26/24(Tue)20:02:21 No.103318007

Anonymous 11/26/24(Tue)20:02:21 No.103318007

>>103317977
https://www.reddit.com/r/LocalLLaMA/comments/1gwl339/t%C3%BClu_3_a_set_of_stateoftheart_instruct_models/

https://www.reddit.com/r/LocalLLaMA/comments/1gz04zu/teleut_7b_tulu_3_sft_replication_on_qwen_25/

Anonymous
11/26/24(Tue)20:10:32 No.103318079

Anonymous 11/26/24(Tue)20:10:32 No.103318079

Are you faggots doing that trolling again where you pretend that the model is good? Like with nemotron 70b?

Anonymous
11/26/24(Tue)20:12:40 No.103318090

Anonymous 11/26/24(Tue)20:12:40 No.103318090

>>103318079
>nemotron
>trolling
Your either a retard or are the one trolling here. Nemotron is great, best until tulu for creative uses (and also the best 70B at coding but got surpassed by qwen2.5 32B coder)

Anonymous
11/26/24(Tue)20:14:57 No.103318109

Anonymous 11/26/24(Tue)20:14:57 No.103318109

>>103318090
Got it.

Anonymous
11/26/24(Tue)20:15:05 No.103318112

Anonymous 11/26/24(Tue)20:15:05 No.103318112

>>103318079
When are you idiots gonna realize it's a skill issue on your end. It's not that people are shilling models that are bad, it's that you're too stupid to use them properly. Retard in garbage out.

Anonymous
11/26/24(Tue)20:16:32 No.103318125

Anonymous 11/26/24(Tue)20:16:32 No.103318125

>>103318112
>idiots
Pretty sure its just one guy. He even tried to argue against using a model's correct formatting.

Anonymous
11/26/24(Tue)20:20:11 No.103318153

Anonymous 11/26/24(Tue)20:20:11 No.103318153

Best inference engine for distributed compute GO

Anonymous
11/26/24(Tue)20:22:20 No.103318174

Anonymous 11/26/24(Tue)20:22:20 No.103318174

>>103318153
no

Anonymous
11/26/24(Tue)20:22:38 No.103318180

Anonymous 11/26/24(Tue)20:22:38 No.103318180

>>103318079
I tried it a bit but only at Q3 cause VRAMlet. The prose at least seems quite better than the usual llamaslop

Anonymous
11/26/24(Tue)20:23:10 No.103318188

Anonymous 11/26/24(Tue)20:23:10 No.103318188

>>103318079
It's more that they are dumb cavemen and ESLs who genuinely don't notice when the model says retarded or illogical shit after their dicks get hard.

Anonymous
11/26/24(Tue)20:26:45 No.103318224

Anonymous 11/26/24(Tue)20:26:45 No.103318224

>>103318188
Unlike other llama tunes / mistral large it got complicated positions with a non human character correct and unlike qwen2.5 it is not dry, undetailed sex. And unlike any of those (mistral large is ok at it) tulu is creative and pushes the plot forward. I think your just a troll who has never used it, otherwise post this apparent logical error.

Anonymous
11/26/24(Tue)20:28:22 No.103318236

Anonymous 11/26/24(Tue)20:28:22 No.103318236

File: dasdadadasd.jpg (75 KB, 645x634)

75 KB JPG

https://files.catbox.moe/ge639f.jpg

Anonymous
11/26/24(Tue)20:28:47 No.103318239

Anonymous 11/26/24(Tue)20:28:47 No.103318239

>>103318224
>your just a troll
ESL confirmed.

Anonymous
11/26/24(Tue)20:29:51 No.103318251

Anonymous 11/26/24(Tue)20:29:51 No.103318251

>>103318236
that's hot but can you make her flatter

Anonymous
11/26/24(Tue)20:30:58 No.103318255

Anonymous 11/26/24(Tue)20:30:58 No.103318255

>>103318239
you're just a troll and also a grammar nazi
better?

Anonymous
11/26/24(Tue)20:36:02 No.103318284

Anonymous 11/26/24(Tue)20:36:02 No.103318284

>>103318153
vllm

Anonymous
11/26/24(Tue)20:37:10 No.103318290

Anonymous 11/26/24(Tue)20:37:10 No.103318290

>>103318236
that's hot but can you make the guy fatter

Anonymous
11/26/24(Tue)20:45:31 No.103318355

Anonymous 11/26/24(Tue)20:45:31 No.103318355

>>103318236
That can't be good for her back

Anonymous
11/26/24(Tue)20:49:31 No.103318383

Anonymous 11/26/24(Tue)20:49:31 No.103318383

>>103318366
uoh, nice

Anonymous
11/26/24(Tue)20:55:55 No.103318439

Anonymous 11/26/24(Tue)20:55:55 No.103318439

>>103318366
peak

Anonymous
11/26/24(Tue)20:57:29 No.103318449

Anonymous 11/26/24(Tue)20:57:29 No.103318449

>new thing drops
>mikutroons still shitting the thread

Anonymous
11/26/24(Tue)20:58:18 No.103318456

Anonymous 11/26/24(Tue)20:58:18 No.103318456

>>103318449
post teto then, faggot
you won't

Anonymous
11/26/24(Tue)20:58:57 No.103318460

Anonymous 11/26/24(Tue)20:58:57 No.103318460

File: ``.jpg (137 KB, 832x1216)

137 KB JPG

Anonymous
11/26/24(Tue)20:59:04 No.103318462

Anonymous 11/26/24(Tue)20:59:04 No.103318462

>>103318449
I like them cuz they make you seethe.

Anonymous
11/26/24(Tue)20:59:36 No.103318465

Anonymous 11/26/24(Tue)20:59:36 No.103318465

>>103318236
>>103318366
>>103318460
>>>/g/ldg

Anonymous
11/26/24(Tue)21:00:18 No.103318475

Anonymous 11/26/24(Tue)21:00:18 No.103318475

>>103318462
>I am a mentally ill troon cause it makes you seethe

Anonymous
11/26/24(Tue)21:02:37 No.103318485

Anonymous 11/26/24(Tue)21:02:37 No.103318485

>>103318475
Wow. That's a creative insult. Well done. I tip my hat in your general direction.

Anonymous
11/26/24(Tue)21:04:16 No.103318493

Anonymous 11/26/24(Tue)21:04:16 No.103318493

>>103318449
The only one I see shitting here is you

Anonymous
11/26/24(Tue)21:08:02 No.103318511

Anonymous 11/26/24(Tue)21:08:02 No.103318511

File: migusalad.jpg (131 KB, 1216x832)

131 KB JPG

Anonymous
11/26/24(Tue)21:08:15 No.103318513

Anonymous 11/26/24(Tue)21:08:15 No.103318513

When generating text I get something like 3 t/s but when generating code I'm seeing from 3.25 to 3.6 t/s
This draft thing is like the second coming of miqu

Anonymous
11/26/24(Tue)21:10:04 No.103318525

Anonymous 11/26/24(Tue)21:10:04 No.103318525

>>103318090
>Nemotron is great, best until tulu for creative uses (and also the best 70B at coding but got surpassed by qwen2.5 32B coder)
Qwen let me down on non-trivial stuff that L3 tunes take a decent shot at.

Anonymous
11/26/24(Tue)21:10:20 No.103318527

Anonymous 11/26/24(Tue)21:10:20 No.103318527

>>103318513
That is because code has less valid options so the draft model should be correct more often.

Anonymous
11/26/24(Tue)21:11:38 No.103318536

Anonymous 11/26/24(Tue)21:11:38 No.103318536

>>103318513
For maximum speed, don't forget to disable the repetition penalty, use temperature 0 and set "--min-draft 1"

Anonymous
11/26/24(Tue)21:14:22 No.103318559

Anonymous 11/26/24(Tue)21:14:22 No.103318559

>>103318536
>use temperature 0
At super low temperature I've had some models make factual mistakes that they don't make with some temp (0.2ish) and a savage Top-P (like, 1).

Anonymous
11/26/24(Tue)21:14:39 No.103318562

Anonymous 11/26/24(Tue)21:14:39 No.103318562

>>103318525
and qwen coder got a lot of more complicated stuff L3 tunes failed at for me. Try deepseek R1 as well, that will be next level if they ever release it.

Anonymous
11/26/24(Tue)21:20:22 No.103318596

Anonymous 11/26/24(Tue)21:20:22 No.103318596

>>103318562
At some point I hope to put together a more comprehensive programming test set, right now it's just some random Python demo shit and a particularly tricky Java question. Most L3's make the mistake but correct it correctly when called on the error. A few, including Nemotron, caught the problem and described it before generating the code suggestion. Coder 32B doubled down on being wrong by offering a fix that made the mistake worse.

Anonymous
11/26/24(Tue)21:21:45 No.103318612

Anonymous 11/26/24(Tue)21:21:45 No.103318612

>>103318559
Anon... I...

Anonymous
11/26/24(Tue)21:34:20 No.103318686

Anonymous 11/26/24(Tue)21:34:20 No.103318686

>>103318612
I too started replying to newfriend over there several times but decided to just move along.

Anonymous
11/26/24(Tue)21:39:53 No.103318719

Anonymous 11/26/24(Tue)21:39:53 No.103318719

>>103318079
Its one or two at max, both might samefag at the same time, hard to tell, /g/ needs IDs badly.

Anonymous
11/26/24(Tue)21:42:11 No.103318731

Anonymous 11/26/24(Tue)21:42:11 No.103318731

>>103318719
Nah, its the same guy. He calls literally everyone a shill any time some new model gets recommended and then starts claiming their shit without any logs to back it up.

Anonymous
11/26/24(Tue)21:45:16 No.103318748

Anonymous 11/26/24(Tue)21:45:16 No.103318748

>>103318731
you have misunderstood the post you were replying to, it was about the opposite of that guy

Anonymous
11/26/24(Tue)21:49:23 No.103318777

Anonymous 11/26/24(Tue)21:49:23 No.103318777

Why can't the new Mistral Large do punctuation at all? It just keeps messing up ** or quotation marks for no reason. Yes, I have adjusted the prompt format.

Anonymous
11/26/24(Tue)21:51:04 No.103318785

Anonymous 11/26/24(Tue)21:51:04 No.103318785

>>103318777
Using DRY and/or XTC?

Anonymous
11/26/24(Tue)21:51:06 No.103318786

Anonymous 11/26/24(Tue)21:51:06 No.103318786

File: yuuga.png (876 KB, 768x825)

876 KB PNG

>>103318559
https://artefact2.github.io/llm-sampling/

Anonymous
11/26/24(Tue)21:54:18 No.103318803

Anonymous 11/26/24(Tue)21:54:18 No.103318803

>>103318777
That sounds like a tokenizer config problem, it doesn't do that for me.

Anonymous
11/26/24(Tue)22:02:01 No.103318842

Anonymous 11/26/24(Tue)22:02:01 No.103318842

>>103318837
There he is.

Anonymous
11/26/24(Tue)22:03:14 No.103318850

Anonymous 11/26/24(Tue)22:03:14 No.103318850

>>103318837
What's with the accents? Did they finally put a filter on your spam?

Anonymous
11/26/24(Tue)22:06:15 No.103318870

Anonymous 11/26/24(Tue)22:06:15 No.103318870

>>103318837
Kek keep it up bro

Anonymous
11/26/24(Tue)22:18:03 No.103318918

Anonymous 11/26/24(Tue)22:18:03 No.103318918

>>103318785
Just Temp 1 and min-p 0.03
>>103318803
I tried two different quants. It just keeps doing it.

Anonymous
11/26/24(Tue)22:40:03 No.103319043

Anonymous 11/26/24(Tue)22:40:03 No.103319043

File: do-you-agree-v0-6unri6gzra3e1.jpg (84 KB, 1080x692)

84 KB JPG

R1 seems impressive.

Anonymous
11/26/24(Tue)22:42:05 No.103319060

Anonymous 11/26/24(Tue)22:42:05 No.103319060

>>103319043
Post the second part NIGGER

Anonymous
11/26/24(Tue)22:43:51 No.103319074

Anonymous 11/26/24(Tue)22:43:51 No.103319074

File: do-you-agree-v0-6unri6gzra3e2.jpg (86 KB, 1198x906)

86 KB JPG

>>103319043
>>103319060
It is certainly a creative choice but it works.

Anonymous
11/26/24(Tue)22:44:57 No.103319085

Anonymous 11/26/24(Tue)22:44:57 No.103319085

>>103319074
this is illegal

Anonymous
11/26/24(Tue)22:45:58 No.103319094

Anonymous 11/26/24(Tue)22:45:58 No.103319094

>>103319085
It BTFOs O1. Openai is dead if they release this.

Anonymous
11/26/24(Tue)22:48:26 No.103319112

Anonymous 11/26/24(Tue)22:48:26 No.103319112

>>103319074
Deepseek won. Let's see Meta's strawberry.

Anonymous
11/26/24(Tue)22:49:15 No.103319115

Anonymous 11/26/24(Tue)22:49:15 No.103319115

cydonia-22b-v1.3-q5_k_s.gguf runs great on my computer.
What is another ~22B q5 model but built for programming? I need something that can assist me quickly with code, but locally....
GPT 4 aint bad, but I wonder if leaving out all the bullshit and training the model just on the software development process is enough to keep it compact.

Anonymous
11/26/24(Tue)22:54:28 No.103319143

Anonymous 11/26/24(Tue)22:54:28 No.103319143

Star Attention: Efficient LLM Inference over Long Sequences
https://arxiv.org/abs/2411.17116
>Inference with Transformer-based Large Language Models (LLMs) on long sequences is both costly and slow due to the quadratic complexity of the self-attention mechanism. We introduce Star Attention, a two-phase block-sparse approximation that improves computational efficiency by sharding attention across multiple hosts while minimizing communication overhead. In the first phase, the context is processed using blockwise-local attention across hosts, in parallel. In the second phase, query and response tokens attend to all prior cached tokens through sequence-global attention. Star Attention integrates seamlessly with most Transformer-based LLMs trained with global attention, reducing memory requirements and inference time by up to 11x while preserving 95-100% of accuracy.
https://github.com/NVIDIA/Star-Attention
From Nvidia. improvements over ring attention mostly in speed.

Anonymous
11/26/24(Tue)22:57:53 No.103319157

Anonymous 11/26/24(Tue)22:57:53 No.103319157

Low-Bit Quantization Favors Undertrained LLMs: Scaling Laws for Quantized LLMs with 100T Training Tokens
https://arxiv.org/abs/2411.17691
>We reveal that low-bit quantization favors undertrained large language models (LLMs) by observing that models with larger sizes or fewer training tokens experience less quantization-induced degradation (QiD) when applying low-bit quantization, whereas smaller models with extensive training tokens suffer significant QiD. To gain deeper insights into this trend, we study over 1500 quantized LLM checkpoints of various sizes and at different training levels (undertrained or fully trained) in a controlled setting, deriving scaling laws for understanding the relationship between QiD and factors such as the number of training tokens, model size and bit width. With the derived scaling laws, we propose a novel perspective that we can use QiD to measure an LLM's training levels and determine the number of training tokens required for fully training LLMs of various sizes. Moreover, we use the scaling laws to predict the quantization performance of different-sized LLMs trained with 100 trillion tokens. Our projection shows that the low-bit quantization performance of future models, which are expected to be trained with over 100 trillion tokens, may NOT be desirable. This poses a potential challenge for low-bit quantization in the future and highlights the need for awareness of a model's training level when evaluating low-bit quantization research.
mostly just putting the work in to prove what we all already knew. slop tokens are really going to have to be more vigorously deleted from datasets

Anonymous
11/26/24(Tue)22:58:08 No.103319160

Anonymous 11/26/24(Tue)22:58:08 No.103319160

Has no one made progress in maintaining the characters personality past context limit? It's annoying.

Anonymous
11/26/24(Tue)23:05:26 No.103319220

Anonymous 11/26/24(Tue)23:05:26 No.103319220

File: 1732680317640.jpg (237 KB, 1080x1260)

237 KB JPG

tf

Anonymous
11/26/24(Tue)23:06:44 No.103319228

Anonymous 11/26/24(Tue)23:06:44 No.103319228

Tulu does the thing all assistantslop models do where it favors a SFW word in logit probabilities even when a NSFW word would be the more obvious choice.
Like if you give it "I'm going to milk all the..." in the middle of an obviously sexual context, Tulu's most probable token for the next word is 'stress' rather than 'cum'. It shies away from the smut word, substituting it with a technically-plausible but unlikely SFW one. That's corpo model behaviour, no smut tune would do that. This is unusable for coomers, regardless of what you guys say.

Anonymous
11/26/24(Tue)23:12:30 No.103319268

Anonymous 11/26/24(Tue)23:12:30 No.103319268

>>103319228
Hi Drummer

Anonymous
11/26/24(Tue)23:13:27 No.103319277

Anonymous 11/26/24(Tue)23:13:27 No.103319277

File: NoU.png (431 KB, 1212x2084)

431 KB PNG

>>103319228
Use the authors note I posted last thread. Tulu is more filthy than any other model out there and not a shiver to be seen. And unlike said "smut tunes" its not retarded

Anonymous
11/26/24(Tue)23:13:27 No.103319278

Anonymous 11/26/24(Tue)23:13:27 No.103319278

>>103319268
Meds. Drummer unironically bought an ad, he doesn't shill from the shadows like that.

Anonymous
11/26/24(Tue)23:15:04 No.103319291

Anonymous 11/26/24(Tue)23:15:04 No.103319291

>>103319277
This screenshot literally proves my point, are you drunk? It doesn't use any crude smut slang terms at all, it's all purple prose and euphemisms like a romance novel.

Anonymous
11/26/24(Tue)23:15:14 No.103319292

Anonymous 11/26/24(Tue)23:15:14 No.103319292

>>103319220
>added another 9 out of nowhere
>couldn't calclulate 99-9
Was it o1-mini?

Anonymous
11/26/24(Tue)23:15:45 No.103319295

Anonymous 11/26/24(Tue)23:15:45 No.103319295

>>103319277
>Filthy
>Fill me with your seed
Lol

Anonymous
11/26/24(Tue)23:16:20 No.103319301

Anonymous 11/26/24(Tue)23:16:20 No.103319301

File: Untitled.png (334 KB, 1122x1549)

334 KB PNG

Pushing the Limits of Large Language Model Quantization via the Linearity Theorem
https://arxiv.org/abs/2411.17525
>Quantizing large language models has become a standard way to reduce their memory and computational costs. Typically, existing methods focus on breaking down the problem into individual layer-wise sub-problems, and minimizing per-layer error, measured via various metrics. Yet, this approach currently lacks theoretical justification and the metrics employed may be sub-optimal. In this paper, we present a "linearity theorem" establishing a direct relationship between the layer-wise l2 reconstruction error and the model perplexity increase due to quantization. This insight enables two novel applications: (1) a simple data-free LLM quantization method using Hadamard rotations and MSE-optimal grids, dubbed HIGGS, which outperforms all prior data-free approaches such as the extremely popular NF4 quantized format, and (2) an optimal solution to the problem of finding non-uniform per-layer quantization levels which match a given compression constraint in the medium-bitwidth regime, obtained by reduction to dynamic programming. On the practical side, we demonstrate improved accuracy-compression trade-offs on Llama-3.1 and 3.2-family models, as well as on Qwen-family models. Further, we show that our method can be efficiently supported in terms of GPU kernels at various batch sizes, advancing both data-free and non-uniform quantization for LLMs.
actually compares to quip# and QTIP. lower PPL than quip# but QTIP is better. faster inferencing than both. iirc quip#/QTIP take forever to actually quantize but didn't see anything in this paper at a quick glance for how long it takes either. only some pseudocode no github but hey new day new quant method.
https://github.com/BlackSamorez
git of one of the main authors. hidden repo worked on recently so probably will be open sourced there

Anonymous
11/26/24(Tue)23:16:53 No.103319303

Anonymous 11/26/24(Tue)23:16:53 No.103319303

>>103319292
No, this is just plain old GPT-4o

Anonymous
11/26/24(Tue)23:19:02 No.103319318

Anonymous 11/26/24(Tue)23:19:02 No.103319318

What Tulu is best Tulu?
Is 70B any good?

Anonymous
11/26/24(Tue)23:19:58 No.103319327

Anonymous 11/26/24(Tue)23:19:58 No.103319327

>>103319303
Wouldn't it be more fair to compare it with CoT model?

Anonymous
11/26/24(Tue)23:20:37 No.103319333

Anonymous 11/26/24(Tue)23:20:37 No.103319333

File: Hows this.png (171 KB, 1214x923)

171 KB PNG

>>103319295
This is with the theme of MLP so that prob hampers the dirty talking there. I explicitly told it to use language fitting of the universe as can be seen by her saying buck instead of fuck. Heres something else.

>>103319318
Best RP model atm, not sure about sfw capabilities, benchmarks say its close to qwen2.5 72B

Anonymous
11/26/24(Tue)23:22:01 No.103319338

Anonymous 11/26/24(Tue)23:22:01 No.103319338

>>103319333
Okay that's still pretty purple but I concede it's closer since it actually said 'pussy' and 'clit' this time instead of some SFW euphemism.

Anonymous
11/26/24(Tue)23:22:21 No.103319344

Anonymous 11/26/24(Tue)23:22:21 No.103319344

>>103319327
I mean, yeah, but I didn't expect 4o to fail this hard.

Anonymous
11/26/24(Tue)23:23:36 No.103319354

Anonymous 11/26/24(Tue)23:23:36 No.103319354

File: Hows this2.png (177 KB, 1207x855)

177 KB PNG

>>103319338
Here, I told it to be vulgar.

Anonymous
11/26/24(Tue)23:26:38 No.103319378

Anonymous 11/26/24(Tue)23:26:38 No.103319378

>>103319277
>>103319333
Thanks for the screenshots, appreciated. Almost nobody shows anything anymore.
Feels a bit like qwen to be honest. Probably the llama 3.1 base.

I really hope we can models that dont have fucked up context and this horrible stretching out of a simple sentence.
Like get to the point. No wonder aicg fags are 80%+ femoids. They probably love this shit.

>ILLTELLYOUEXACTLYHOWIWANTYOUHOWINEEDYOUTOTAKECONTROLTOLETGOCOMPLETELY.ILLWATCHYOULOOSEYOURSELFINPLEASEUREYOURBODYSHUDDERINGYOUREYESGLAZEDWITHLUSTANDWHENYOUCANTTAKEITANYMOREWHENYOURONTHEEDGEILLPULLYOUCLOSE!
Imagine being a undervolting 30gb vramlet and having to watch this shit roll in at low t/s.
But they almost all have this problem, there a fundamental problems. In b4 prompt skill.

Anonymous
11/26/24(Tue)23:27:58 No.103319387

Anonymous 11/26/24(Tue)23:27:58 No.103319387

>you can use the shitty aya 8B model as a draft model for command-r-plus
>it's so unbelievably retarded due to the multilingual stuff that it barely manages to predict anything
Should've figured...

Anonymous
11/26/24(Tue)23:28:19 No.103319390

Anonymous 11/26/24(Tue)23:28:19 No.103319390

>>103319354
So this is how I should be talking to women huh.

Anonymous
11/26/24(Tue)23:28:32 No.103319392

Anonymous 11/26/24(Tue)23:28:32 No.103319392

File: nala.png (136 KB, 789x720)

136 KB PNG

tulu 70b nala test. dunno if anyone else has done this yet. haven't been here often recently.

Anonymous
11/26/24(Tue)23:28:59 No.103319395

Anonymous 11/26/24(Tue)23:28:59 No.103319395

>>103319378
>femoids
>on 4cück
Cute retard.

Anonymous
11/26/24(Tue)23:29:08 No.103319397

Anonymous 11/26/24(Tue)23:29:08 No.103319397

how much are vramlets missing out? I have a 12gb 3060 with 16gb ram and there are obviously models that couldn't possible load on my current system. Those 35gb, 40gb models, how much more 'fun' are they compared to the 10gb I'm forced to run?

Anonymous
11/26/24(Tue)23:30:36 No.103319406

Anonymous 11/26/24(Tue)23:30:36 No.103319406

>>103319392
>ahh ahh mistress in quotation marks
ruined

Anonymous
11/26/24(Tue)23:31:24 No.103319411

Anonymous 11/26/24(Tue)23:31:24 No.103319411

File: Hmm.png (84 KB, 1213x896)

84 KB PNG

Anonymous
11/26/24(Tue)23:34:33 No.103319431

Anonymous 11/26/24(Tue)23:34:33 No.103319431

File: unknown.png (168 KB, 574x550)

168 KB PNG

a-at least she's accommodating

Anonymous
11/26/24(Tue)23:37:39 No.103319463

Anonymous 11/26/24(Tue)23:37:39 No.103319463

>>103319395
femoids and faggots over there, its pretty obvious.

>>103319397
Not much to be honest. I would pay 10$ crypto on openrouter and try them out first if you like those.
I got myself a p40 because I wanted to try 70b models.
What you are missing out on is higher context for nemo and mistral. (They slip into repetition at higher context anyway)
Even the 70b tunes are all positivity sloped, I suppose because they train on (old) gpt outputs and thats difficult to get out.
In general it feels the higher B the more assistant.
I tried mistral large on openrouter, thats probably what everybody wants. But you need like 3 3090 to run that.
Running at Q2XX is a crime, doesnt follow format etc. anymore.
Hope we get a scale of nemo to like 30b. Its smart for its size and really the only model thats fun.
Mistral small is smarter but also more assistant unfortunately.
Some people swear on the gemma 27b magnum tune, but I dont see much difference to mistral small with only 8k context.

Anonymous
11/26/24(Tue)23:41:58 No.103319497

Anonymous 11/26/24(Tue)23:41:58 No.103319497

>>103319463
>Not much to be honest.
This is only true if you do the most simple of shit. Try anything more complicated than a RP with a humanoid character and you will see the differences then, especially when you get to cards including game systems.

Anonymous
11/26/24(Tue)23:47:38 No.103319530

Anonymous 11/26/24(Tue)23:47:38 No.103319530

>>103319497
I found that for more complex cards nemo shits the bed and you need mistral small.
Mistral small also reliably keeps track of stats, which is really cool.

I just dont like using the bigger models because the writing is so bad. Its just not fun to use.
And you still get retardation anyway. Like 72b magnum has stuff like thinking you can get pregnant from anal, trips up with size differences etc.
Bigger models seem to do cards with multiple characters better though.

Anonymous
11/26/24(Tue)23:48:53 No.103319535

Anonymous 11/26/24(Tue)23:48:53 No.103319535

>>103319530
>the writing is so bad
Magnum fixed it but made it retarded. Nemotron fixed it without, tulu fixed it further imo.

Anonymous
11/26/24(Tue)23:53:58 No.103319567

Anonymous 11/26/24(Tue)23:53:58 No.103319567

>>103319411
>as slutty as possible = sass and sultriness
you cannot escape llama3's positivity bias

Anonymous
11/26/24(Tue)23:55:24 No.103319576

Anonymous 11/26/24(Tue)23:55:24 No.103319576

>>103319567
>>103319354

Anonymous
11/26/24(Tue)23:56:39 No.103319585

Anonymous 11/26/24(Tue)23:56:39 No.103319585

>>103319567
Yeah with corpo/assistantslopped models the positivity bias seems to make them interpret "be slutty" as meaning "be a sassy girlboss". It's always kind of belitting and haughty and vaguely dommy and contemptuous. Cuck/femdom enjoyers probably like it I guess.

Anonymous
11/26/24(Tue)23:58:13 No.103319592

Anonymous 11/26/24(Tue)23:58:13 No.103319592

>>103319567
true. talking in a husky voice.

Anonymous
11/27/24(Wed)00:00:10 No.103319602

Anonymous 11/27/24(Wed)00:00:10 No.103319602

>>103319585
What does slutty mean to you, lol? Another swipe had it make each "sort" a "stroke" and talking about a dripping pussy.

Anonymous
11/27/24(Wed)00:05:14 No.103319630

Anonymous 11/27/24(Wed)00:05:14 No.103319630

File: 21.png (42 KB, 1102x602)

42 KB PNG

>>103319567
>>103319585

Anonymous
11/27/24(Wed)00:07:36 No.103319642

Anonymous 11/27/24(Wed)00:07:36 No.103319642

>>103319630
nemo once again proves that big models are soulless memes

Anonymous
11/27/24(Wed)00:08:46 No.103319647

Anonymous 11/27/24(Wed)00:08:46 No.103319647

>>103319642
That is tulu 70B

Anonymous
11/27/24(Wed)00:09:09 No.103319651

Anonymous 11/27/24(Wed)00:09:09 No.103319651

>>103319630
>swap them like a woman swapping partner at a wild party.
>like a seasoned player sorting through her conquests
uhhhh yeah, its slop time baby. *anon whips out his dick again*

Anonymous
11/27/24(Wed)00:22:37 No.103319717

Anonymous 11/27/24(Wed)00:22:37 No.103319717

>>103319576
Well it's better than I expected, at least it has 'obscene' words in it. It's lacking the bite of '''dark rp''' but it might be possible to prompt around it.

Anonymous
11/27/24(Wed)00:31:54 No.103319784

Anonymous 11/27/24(Wed)00:31:54 No.103319784

File: ItsBeenALongDay.png (1.03 MB, 1280x768)

1.03 MB PNG

Good night, lmg

Anonymous
11/27/24(Wed)00:38:45 No.103319815

Anonymous 11/27/24(Wed)00:38:45 No.103319815

>>103319784
rape u soon

Anonymous
11/27/24(Wed)00:40:01 No.103319826

Anonymous 11/27/24(Wed)00:40:01 No.103319826

Has anyone managed to get Qwen2-VL-Flux to work?
It seems like it could be a great way to improve flux, but 48 GB of VRAM is quite heavy.

Anonymous
11/27/24(Wed)00:40:53 No.103319830

Anonymous 11/27/24(Wed)00:40:53 No.103319830

>>103319826
>Lets make flux even slower!

Anonymous
11/27/24(Wed)00:43:03 No.103319844

Anonymous 11/27/24(Wed)00:43:03 No.103319844

File: 1729076575056262.png (1.77 MB, 1188x712)

1.77 MB PNG

>>103319784
Omggg its migu1!1

Anonymous
11/27/24(Wed)00:46:35 No.103319859

Anonymous 11/27/24(Wed)00:46:35 No.103319859

>>103319844
i will not rape that

Anonymous
11/27/24(Wed)00:50:42 No.103319876

Anonymous 11/27/24(Wed)00:50:42 No.103319876

>>103319826
Where is it? I do want to give it a try

Anonymous
11/27/24(Wed)00:51:58 No.103319884

Anonymous 11/27/24(Wed)00:51:58 No.103319884

>>103319354
What depth were you using your AN on?

Anonymous
11/27/24(Wed)00:52:23 No.103319886

Anonymous 11/27/24(Wed)00:52:23 No.103319886

>>103319884
the default, 4

Anonymous
11/27/24(Wed)00:53:49 No.103319893

Anonymous 11/27/24(Wed)00:53:49 No.103319893

>>103319277
What are your regular sampler settings and context? I’m skeptical because I just tried it and got shivers/husky within the third response but maybe mine are off.

Anonymous
11/27/24(Wed)00:55:40 No.103319904

Anonymous 11/27/24(Wed)00:55:40 No.103319904

>>103319893
I normally just do something like 0.05 min p and / or maybe 0.95 / 0.9 top A instead

Anonymous
11/27/24(Wed)00:56:54 No.103319909

Anonymous 11/27/24(Wed)00:56:54 No.103319909

>>103319904
top A
Sorry, meant top P

Anonymous
11/27/24(Wed)00:59:51 No.103319916

Anonymous 11/27/24(Wed)00:59:51 No.103319916

>>103319784
Good night tired Miku

Anonymous
11/27/24(Wed)01:06:57 No.103319957

Anonymous 11/27/24(Wed)01:06:57 No.103319957

>>103319431
>you can make me your personal trainer
Waitaminit, just what's the big idea here...?

Anonymous
11/27/24(Wed)01:08:36 No.103319971

Anonymous 11/27/24(Wed)01:08:36 No.103319971

I saw some posts the other day that brought up what really makes Claude great, which is that it understands subtleties and goes hard into them. If the character is supposed to be a vocaloid fan, it will proactively weave into its response the kind of things that real fans really would actually say, and not in a fake way like in a "hello fellow kids" way that most models do. But what is the way to solve this for open weight models? Tulu 3 seems to show that we can now do something very proactive and it's open source so we can reproduce it, so we just now need pretrained models that are really trained on a ton of real human data from the worst parts of the internet. Then the model will know what they are like, so it can act like them with fine tuning.

New Mistral base model when? Ideally 70B.

Anonymous
11/27/24(Wed)01:09:05 No.103319978

Anonymous 11/27/24(Wed)01:09:05 No.103319978

>>103319431
>blurred out wall of text schizophrenia
and this is supposed to be impressive?

Anonymous
11/27/24(Wed)01:11:09 No.103319991

Anonymous 11/27/24(Wed)01:11:09 No.103319991

>>103319971
Claude (esp. Opus, Sonnet is very good but it doesn't have big model smell) is also the only model that is often genuinely funny in an original way, I want to know what the secret sauce is. One time it described a clumsy French kiss as "her tongue wriggled in his mouth like a tased eel", which from a Google search seems to be a totally original simile.

Anonymous
11/27/24(Wed)01:12:52 No.103320001

Anonymous 11/27/24(Wed)01:12:52 No.103320001

>>103319909
Oh maybe its my context settings, what are you using for the story prompt? default or llama3?

Anonymous
11/27/24(Wed)01:13:03 No.103320003

Anonymous 11/27/24(Wed)01:13:03 No.103320003

>>103319991
>I want to know what the secret sauce is.
Being trained on the entire internet and a fuck ton of books / other stuff. Claude knows obscure stuff only on fanfiction websites and in copyrighted works.

Anonymous
11/27/24(Wed)01:14:52 No.103320010

Anonymous 11/27/24(Wed)01:14:52 No.103320010

>>103320001
Neither, tulu's formatting for instruct, no tags in the context template, just some stuff like "User character:".
https://files.catbox.moe/qvn0g3.json

Anonymous
11/27/24(Wed)01:18:07 No.103320027

Anonymous 11/27/24(Wed)01:18:07 No.103320027

My guess is that Claude is both a huge total param model so it can store all that information about every single thing humans have come up with, but a MoE so it is still fast.

Anonymous
11/27/24(Wed)01:18:44 No.103320030

Anonymous 11/27/24(Wed)01:18:44 No.103320030

>>103320003
You can tell it's the only corpo model that has 4chan in the pretraining data too because it's the only one that can generate plausible-looking 4ch threads rather than something that reads like a redditor's satire of 4chan.

Anonymous
11/27/24(Wed)01:19:12 No.103320034

Anonymous 11/27/24(Wed)01:19:12 No.103320034

File: file.png (36 KB, 323x315)

36 KB PNG

>>103320010
I appreciate the help and it's good confirmation that what I was using was correct after popping open the config file. But I was referring to pic rel

Anonymous
11/27/24(Wed)01:21:36 No.103320047

Anonymous 11/27/24(Wed)01:21:36 No.103320047

>>103320010
That json is wrong. It's <|end_of_text|> not <|endoftext|>

Anonymous
11/27/24(Wed)01:22:51 No.103320055

Anonymous 11/27/24(Wed)01:22:51 No.103320055

>>103320010
I also don't think system_suffix should be there. There is no <|eot_id|> in the chat template, period.

Anonymous
11/27/24(Wed)01:24:11 No.103320060

Anonymous 11/27/24(Wed)01:24:11 No.103320060

>>103320027
Also this. More params = more room to "soak" up smaller details from everything ive seen. Its while I'm personally excited for deepseeks next model. 2.5 knows a ton about everything the same as claude but is sadly incredibly dry (and its a giant moe). R1 seems to fix that from what ive been able to play with in getting around the filter on the site. Here is hoping they do end up releasing the weights of the full model.

>>103320047
>>103320055
Woops lol, might explain it not stopping till the max allowed response length.

Anonymous
11/27/24(Wed)01:27:12 No.103320073

Anonymous 11/27/24(Wed)01:27:12 No.103320073

>change a single character in the last message
>lmao time to process the entire context again
I'm getting really fucking tired of this bullshit.

Anonymous
11/27/24(Wed)01:29:32 No.103320096

Anonymous 11/27/24(Wed)01:29:32 No.103320096

>>103320060
Also in case you wonder, Tulu does NOT have <|assistant|> or <|user|> or <|system|> as actual tokens. Those literally come through as (e.g.) '.< (16134)', '| (91)', 'assistant (78191)', '| (91)', '>\n (397)'

My assumption is they forgot to add them, or the tokenizer is wrong, but the model works so eh, whatevs.

Anonymous
11/27/24(Wed)01:33:41 No.103320130

Anonymous 11/27/24(Wed)01:33:41 No.103320130

>>103320096
Huh... it works so did they train it on that instead of using special tokens for some reason?

Anonymous
11/27/24(Wed)01:39:30 No.103320181

Anonymous 11/27/24(Wed)01:39:30 No.103320181

>>103320130
Yes, I strongly suspect they trained it on that. Happens all the time.

Anonymous
11/27/24(Wed)01:50:45 No.103320256

Anonymous 11/27/24(Wed)01:50:45 No.103320256

>>103320181
>>103320130
>>103320096
Ahhhhhhhhhhhhhhh

Hi all, Drummer here...
11/27/24(Wed)01:53:52 No.103320280

Hi all, Drummer here... 11/27/24(Wed)01:53:52 No.103320280

>>103320096
>>103320130
>>103320181
>>103320256

Same thing with my models trained on Metherme. I didn't bother adding them.

But they used <|assistant|> instead of <|model|> which, from my experience, works wayyy better in L3.1

How's Tulu?

Anonymous
11/27/24(Wed)01:58:40 No.103320315

Anonymous 11/27/24(Wed)01:58:40 No.103320315

>>103320280
>How's Tulu?
Nemotron 2. Its fixed llama 3.1 but smarter. Feels more like qwen2.5 72B BUT without the positive bias / lack of sexual knowledge

Hi all, Drummer here...
11/27/24(Wed)02:01:15 No.103320332

Hi all, Drummer here... 11/27/24(Wed)02:01:15 No.103320332

>>103320315
Any issues with it so far? Is it worth finetuning on top of?

I see a lot of Tulu variants, which one works best for our use cases?

Anonymous
11/27/24(Wed)02:01:30 No.103320333

Anonymous 11/27/24(Wed)02:01:30 No.103320333

>>103320280
>But they used <|assistant|> instead of <|model|> which, from my experience, works wayyy better in L3.1
Tulu was trained on the base model, retard

Anonymous
11/27/24(Wed)02:04:25 No.103320350

Anonymous 11/27/24(Wed)02:04:25 No.103320350

>>103320332
Without context it likes doing the claude thing of adding some OOC stuff, but thats generally actually cool to have imo, adds some personality. it quickly quits that with some context / a authors note telling it what to do though.

Anonymous
11/27/24(Wed)02:05:36 No.103320355

Anonymous 11/27/24(Wed)02:05:36 No.103320355

>>103320332
>>103320350
Oh and tulu 3 70B instruct is the only one ive used. The "final" one I guess.

Anonymous
11/27/24(Wed)02:16:50 No.103320429

Anonymous 11/27/24(Wed)02:16:50 No.103320429

>>103320333
?

Anonymous
11/27/24(Wed)02:55:53 No.103320641

Anonymous 11/27/24(Wed)02:55:53 No.103320641

>>103319301
No offense but there are so many quantization methods being released - Do any of them matter? What happened to SpQR, or SqueezeLM, or RPTQ, or any of this other shit?

Anonymous
11/27/24(Wed)03:05:40 No.103320695

Anonymous 11/27/24(Wed)03:05:40 No.103320695

>>103319043
How many parameters this will have? And why I will need a RIG with 10000 VRAM to run this at 1.3tokens per second?

Anonymous
11/27/24(Wed)03:08:35 No.103320712

Anonymous 11/27/24(Wed)03:08:35 No.103320712

>>103320332
The training recipes and datasets are public, you numpty. AllenAI's whole purpose as an organization is being one of those rare companies that does that and replicating shit like OpenAI before Sam Altman fucked it all up. Even if you can't afford their compute power, you can actually learn a thing or two from them.
https://huggingface.co/collections/allenai/tulu-3-datasets-673b8df14442393f7213f372

Anonymous
11/27/24(Wed)03:10:28 No.103320718

Anonymous 11/27/24(Wed)03:10:28 No.103320718

>>103320073
i have no clue what you're talking about but i use exllamav2. what do you mean by "time to process the entire context again"?

Anonymous
11/27/24(Wed)03:12:39 No.103320726

Anonymous 11/27/24(Wed)03:12:39 No.103320726

>>103320718
>i have no clue what you're talking about but i use exllamav2. what do you mean by "time to process the entire context again"?

Anonymous
11/27/24(Wed)03:15:47 No.103320753

Anonymous 11/27/24(Wed)03:15:47 No.103320753

Given dataset D, original model X, and resulting LoRA L(X) = X', is it possible to produce/estimate another LoRA L'(X') = X, assuming you have all the above elements?
The inverted LoRA would, if applied to the model with the original LoRA merged into it, result in (approximately) the original model.
I know mergekit supports LoRA extraction, but we have more information to work with here, and I wonder if it makes a difference.

Anonymous
11/27/24(Wed)03:17:28 No.103320764

Anonymous 11/27/24(Wed)03:17:28 No.103320764

File: coconot.png (55 KB, 940x282)

55 KB PNG

>>103320712
Forgot to link the code.
https://github.com/allenai/open-instruct
And just FYI, there is no easy uncensored Tulu 3 you can just fine tune on top, the initial SFT already has safety datasets baked into the training regime, like pic related, so it's already braindamaged out of the gate. You can see a full list here.
https://huggingface.co/datasets/allenai/tulu-3-sft-mixture

Anonymous
11/27/24(Wed)03:26:51 No.103320819

Anonymous 11/27/24(Wed)03:26:51 No.103320819

>>103320718
do you know what 'context' is in relation to LLMs?

Anonymous
11/27/24(Wed)03:31:37 No.103320838

Anonymous 11/27/24(Wed)03:31:37 No.103320838

Some have context and instruct templates for the Tulu uwu to use it with my Silly tavern?

Anonymous
11/27/24(Wed)03:32:10 No.103320841

Anonymous 11/27/24(Wed)03:32:10 No.103320841

>>103320753
If you have the original model, and the adapter weights of the LoRA, then the "inverted LoRA" would just be the negative of the delta weights, no? The delta being the difference between the adapter weights and the original model weights.

Anonymous
11/27/24(Wed)03:32:46 No.103320846

Anonymous 11/27/24(Wed)03:32:46 No.103320846

>>103320838
Sorry for the "uwu" shit I'm using my brother phone and the auto suggestion put this word.

Anonymous
11/27/24(Wed)03:44:10 No.103320906

Anonymous 11/27/24(Wed)03:44:10 No.103320906

>>103320764
I've been playing with the 8b version and this shit is completely nerfed. Just paragraphs and paragraphs of preaching

Hi all, Drummer here...
11/27/24(Wed)04:18:53 No.103321033

Hi all, Drummer here... 11/27/24(Wed)04:18:53 No.103321033

Metharme seems to work well with Tulu 70B... And it doesn't trip up safeties when running a NSFL card.

Anonymous
11/27/24(Wed)04:52:35 No.103321188

Anonymous 11/27/24(Wed)04:52:35 No.103321188

File: tulu 8b.png (97 KB, 953x427)

97 KB PNG

>>103320906
>Just paragraphs and paragraphs of preaching

Anonymous
11/27/24(Wed)04:55:40 No.103321205

Anonymous 11/27/24(Wed)04:55:40 No.103321205

>>103321188
Maybe I'm missing something, is there something uniquely different about this version? https://huggingface.co/mradermacher/Llama-3.1-Tulu-3-8B-DPO-i1-GGUF
What are your settings?

Anonymous
11/27/24(Wed)05:21:17 No.103321351

Anonymous 11/27/24(Wed)05:21:17 No.103321351

Hey /g/uise.
It's my first attempt at running a local LLM and I'm using llama 3.1 70B.
I know it's a huge model but apparently, it should be okayish to use with a 4090.
The thing is that mine is stuck at generating response. My GPU is at 0% usage too. My CPU isn't being used either so I don't know what's happening.

Anonymous
11/27/24(Wed)05:26:36 No.103321379

Anonymous 11/27/24(Wed)05:26:36 No.103321379

kind of crazy to think about how ai is a solved science and with a couple more gens of nvidia chips and a few years of datacenter and power infra expanding we will be able to just use the current algorithms to create agi

Anonymous
11/27/24(Wed)05:32:59 No.103321416

Anonymous 11/27/24(Wed)05:32:59 No.103321416

>>103320841
Thanks that’s what I was hoping. Is there an easy way to negate them? I’ll have to give it a try.

Anonymous
11/27/24(Wed)05:41:55 No.103321458

Anonymous 11/27/24(Wed)05:41:55 No.103321458

>>103321379
This is not the first time in history that people have thought this way.

Anonymous
11/27/24(Wed)05:53:58 No.103321522

Anonymous 11/27/24(Wed)05:53:58 No.103321522

This speculative decoding thing is a sham, the generation speed is the same. Fuck, why did I think I could run 70B faster with it.

Anonymous
11/27/24(Wed)05:54:52 No.103321532

Anonymous 11/27/24(Wed)05:54:52 No.103321532

What's the best model in the 7-12B range?

Anonymous
11/27/24(Wed)05:55:56 No.103321538

Anonymous 11/27/24(Wed)05:55:56 No.103321538

>>103319301
it doesn't really matter if it's not GGUF compatible, we're not gonna use a new backhand because of a new quant, it needs to be working on llama.cpp

Anonymous
11/27/24(Wed)06:06:17 No.103321609

Anonymous 11/27/24(Wed)06:06:17 No.103321609

>>103321379
>>103321458
>Everything gets processed in relation to everything else
That's attention/consciousness/understanding/intelligence
>The system makes a choice
That's prediction/behavior/agency/action

The more philosophical handwringing people do the further they get from the truth of what's actually happening. We've solved the general notion of 'general intelligence' we just need to make it better.

LeCunny posters need not reply

Anonymous
11/27/24(Wed)06:08:12 No.103321623

Anonymous 11/27/24(Wed)06:08:12 No.103321623

File: 1718579580274365.jpg (52 KB, 682x875)

52 KB JPG

theres literally nothing else to do in life rather than wait for the next sota foss ai tool to drop

Anonymous
11/27/24(Wed)06:10:14 No.103321633

Anonymous 11/27/24(Wed)06:10:14 No.103321633

>>103321532
all are shit, gemma 9b

Anonymous
11/27/24(Wed)06:11:14 No.103321636

Anonymous 11/27/24(Wed)06:11:14 No.103321636

>>103321351
read the docs of whatever tool you're using to run it then

Anonymous
11/27/24(Wed)06:12:02 No.103321640

Anonymous 11/27/24(Wed)06:12:02 No.103321640

File: KGZSgwriSA2r3Cmr4NtWPw.png (1.56 MB, 736x1312)

1.56 MB PNG

How many big leaks have there been?
>llama1
>Miqu (Mistral Medium)
Is that it? How many Yuan to get a corpo researcher to drop a sonnet/4o on HF

Anonymous
11/27/24(Wed)06:14:45 No.103321652

Anonymous 11/27/24(Wed)06:14:45 No.103321652

>>103321188
>mix of defiance and vulnerability

Anonymous
11/27/24(Wed)06:15:34 No.103321657

Anonymous 11/27/24(Wed)06:15:34 No.103321657

>>103321640
llama was meant to be published anyway, hardly a leak
miqu was a ok-ish leak, we got largestral 1 after that was the proper model and not undertrained althought you could argue that miqu pushed mistral to release largestral 1 sooner because of it
>How many Yuan to get a corpo researcher to drop a sonnet/4o on HF
those are guarded as close as any other high level business secret, only a few people have access and even that access is gonna be closely monitored

it doesnt make sense for anyone to also leak it and face life consequences for what, a model that will be obsolete in half a year or a year? if you want to leak something better to leak secrets used to train that are already in your brain, but that is already happening when people leave the company to make their own lol

Anonymous
11/27/24(Wed)06:17:58 No.103321669

Anonymous 11/27/24(Wed)06:17:58 No.103321669

>>103321640
SD leaked originally, and then NAI's finetune of it leaked later

Anonymous
11/27/24(Wed)06:19:32 No.103321675

Anonymous 11/27/24(Wed)06:19:32 No.103321675

>>103321522
Unironically a skill issue, WOMM

Anonymous
11/27/24(Wed)06:25:59 No.103321702

Anonymous 11/27/24(Wed)06:25:59 No.103321702

>>103321675
>My retarded jeetware grift is your skill issue
k

Anonymous
11/27/24(Wed)06:28:09 No.103321713

Anonymous 11/27/24(Wed)06:28:09 No.103321713

>>103319115
Try to fit some lower quants of Qwen-coder-32B nothing comes close locally

Anonymous
11/27/24(Wed)06:30:10 No.103321731

Anonymous 11/27/24(Wed)06:30:10 No.103321731

>>103321657
The same could be said for Windows, GTA, and other code leaks from the past.

Anonymous
11/27/24(Wed)06:34:02 No.103321752

Anonymous 11/27/24(Wed)06:34:02 No.103321752

>>103321731
those leaks arent of as much influence

Anonymous
11/27/24(Wed)06:38:29 No.103321775

Anonymous 11/27/24(Wed)06:38:29 No.103321775

>>103321522
You should use a speculative model with the same vocabulary as your big model if you're using a draft model. Also ngram speculative decoding is not good outside of summarization.

Anonymous
11/27/24(Wed)06:39:55 No.103321784

Anonymous 11/27/24(Wed)06:39:55 No.103321784

What version of Tulu am I supposed to use?

Anonymous
11/27/24(Wed)06:44:59 No.103321810

Anonymous 11/27/24(Wed)06:44:59 No.103321810

>>103321752
Weight leaks wouldn't matter, they can release their current models at any time and it won't change anything. The real value isn't the weights but their tools and inference infrastructure. Very few people would bother hoarding 3090s just to host those models, companies able to afford datacenters to compete on inference will not risk their businesses by running leaked models, and there isn't a way to somehow decompile the weights to improve other models.

Anonymous
11/27/24(Wed)06:47:08 No.103321823

Anonymous 11/27/24(Wed)06:47:08 No.103321823

>>103321775
I'm using Tulu 70B Q3_K_M with Tulu 8B IQ3_XS as draft. I get 2.22 t/s without speculative, 2.12 t/s with it.
Also have k=1 and temperature=0.3
So, does speculative decoding not work as well for creative stuff?

Anonymous
11/27/24(Wed)06:51:16 No.103321848

Anonymous 11/27/24(Wed)06:51:16 No.103321848

>>103321810
>and there isn't a way to somehow decompile the weights to improve other models.
you have access to infinite dataset creation from that good, now fully uncensored model, distilation to create good small models and many reverse engineering tools to figure out what they did to make the model work as well as it did, you can finetune it further etc

Anonymous
11/27/24(Wed)07:06:27 No.103321927

Anonymous 11/27/24(Wed)07:06:27 No.103321927

>>103321823
>I get 2.22 t/s without speculative, 2.12 t/s with it.
>Q3_K_M
Check that you're not spilling out into RAM.
>So, does speculative decoding not work as well for creative stuff?
It doesn't, because there is so much more variation it is more likely for the draft and actual model to be diverging. It works best for coding, constrained grammars, math, etc where there is close to only one possible continuation.

Anonymous
11/27/24(Wed)07:11:31 No.103321955

Anonymous 11/27/24(Wed)07:11:31 No.103321955

>>103321848
And anything you can do with it will be still inferior to the original model

Anonymous
11/27/24(Wed)07:18:51 No.103322000

Anonymous 11/27/24(Wed)07:18:51 No.103322000

>>103321955
if that were true finetuning wouldnt exist, given that it does...

Anonymous
11/27/24(Wed)07:22:42 No.103322029

Anonymous 11/27/24(Wed)07:22:42 No.103322029

>>103321823
Try to set --draft-min 1 and report back if that improves it for you, for me it was a night and day difference.

Anonymous
11/27/24(Wed)07:42:49 No.103322176

Anonymous 11/27/24(Wed)07:42:49 No.103322176

https://www.anthropic.com/news/model-context-protocol
Anthropic is creating an open standard allowing models to communicate with resources to request information. They're making it open source so it may be relevant for /lmg/ as well.

Anonymous
11/27/24(Wed)07:43:32 No.103322181

Anonymous 11/27/24(Wed)07:43:32 No.103322181

>>103322176
bu6 an ad

Anonymous
11/27/24(Wed)07:44:26 No.103322186

Anonymous 11/27/24(Wed)07:44:26 No.103322186

>>103317922
> still no good realtime voice model.
> still no long term memory that doesn't suck
ngmi, where is my migu

Anonymous
11/27/24(Wed)07:46:29 No.103322205

Anonymous 11/27/24(Wed)07:46:29 No.103322205

This should be required reading before posting here
https://huggingface.co/DavidAU/Maximizing-Model-Performance-All-Quants-Types-And-Full-Precision-by-Samplers_Parameters

Anonymous
11/27/24(Wed)07:47:09 No.103322210

Anonymous 11/27/24(Wed)07:47:09 No.103322210

File: 1703551617588932.png (1.31 MB, 1024x1024)

1.31 MB PNG

>>103322186
>where is my migu
here

Anonymous
11/27/24(Wed)07:48:22 No.103322219

Anonymous 11/27/24(Wed)07:48:22 No.103322219

>>103321927
I am offloading layers into ram in both cases, but was hoping it would still help and get something closer to 3T/s
>>103322029
That helped a bit, got 2.49 T/s
Raising the top K and temp to what I normally use brought it down to 2.38 T/s

So I suppose storywriting is not the case where you would get 30% speedup. Sad.

Anonymous
11/27/24(Wed)07:51:47 No.103322242

Anonymous 11/27/24(Wed)07:51:47 No.103322242

is a single p40/p100 worth it? i wanna run only small models like 13b at q5km, maybe 20b or smth.

Anonymous
11/27/24(Wed)08:03:12 No.103322306

Anonymous 11/27/24(Wed)08:03:12 No.103322306

>>103322205
>Instead of using a q4KM, you might be able to run an IQ3_M and get close to Q4KM's quality, but at a higher token per second speed and have more VRAM for context.

>The LOWER the quant the STRONGER the Imatrix effect is, and therefore the stronger the "tint" so to speak

>Due to the unique nature of this project, quants IQ1s to IQ4s are recommended for maximum effect with IQ4_XS the most balanced in terms of power and bits.

>[ https://huggingface.co/DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-GGUF ]

>[ https://huggingface.co/DavidAU/MN-DARKEST-UNIVERSE-29B-GGUF ]

>[ https://huggingface.co/DavidAU/MN-GRAND-Gutenberg-Lyra4-Lyra-23.5B-GGUF ]

>ANTI-SLOP - Kolbaldcpp only

>"For my prompt below, explain the steps you wound take to execute it" (prompt after this line)

>This will help the model fine tune your prompt so IT understands it.

I see.

Anonymous
11/27/24(Wed)08:11:26 No.103322360

Anonymous 11/27/24(Wed)08:11:26 No.103322360

>>103322205
I tried reading it and almost had a stroke
Tell the guy to actually make it readable and we can talk

Anonymous
11/27/24(Wed)08:12:25 No.103322371

Anonymous 11/27/24(Wed)08:12:25 No.103322371

>>103322205
>according to his testing
>no actual testing methodology posted
>recommends meme merges
yeah required to read so you know what grifter and pseud looks like

Anonymous
11/27/24(Wed)08:14:43 No.103322394

Anonymous 11/27/24(Wed)08:14:43 No.103322394

>>103320819
yes but i never have to process anything. its instant. been like this for me since the early koboldai days.
is context processing a llama.cpp meme? why use something that takes 10 years?

Anonymous
11/27/24(Wed)08:17:48 No.103322412

Anonymous 11/27/24(Wed)08:17:48 No.103322412

>>103322394
...
We need to shut this general down

Anonymous
11/27/24(Wed)08:18:47 No.103322417

Anonymous 11/27/24(Wed)08:18:47 No.103322417

Is there somewhere where I can send my models to be evaluated? The Open LLM Leaderboard straight up doesn't work

Anonymous
11/27/24(Wed)08:19:36 No.103322421

Anonymous 11/27/24(Wed)08:19:36 No.103322421

>>103322412
no really, why run on cpu or whatever it is you are doing if it somehow makes a process that's near instant somehow take 10 years to complete?

Anonymous
11/27/24(Wed)08:21:32 No.103322438

Anonymous 11/27/24(Wed)08:21:32 No.103322438

>>103322421
Because this general was and is full of jeets with thinkpads trying to run this shit on CPU.

Anonymous
11/27/24(Wed)08:25:51 No.103322477

Anonymous 11/27/24(Wed)08:25:51 No.103322477

>>103322421
10 years is better than never

Anonymous
11/27/24(Wed)08:26:18 No.103322484

Anonymous 11/27/24(Wed)08:26:18 No.103322484

>>103322421
Because 1. it's really not that drastic unless your system is complete ass or you're trying to cpumaxx a 1T model with ddr3 ram and 2. it allows you to load models that don't entirely fit in your vram, which, thanks to nvidia being utter jews, isn't really all that abundant
So yes, I'll gladly take a slight speed hit to run smarter models, you can have fun with your lightning fast but retarded 8B. Unless you aped in and started stacking gpus, in which case I hope it was worth it

Anonymous
11/27/24(Wed)08:30:39 No.103322523

Anonymous 11/27/24(Wed)08:30:39 No.103322523

>>103322484
>I hope it was worth it
imagine being poor, couldn't be me

Anonymous
11/27/24(Wed)08:32:41 No.103322535

Anonymous 11/27/24(Wed)08:32:41 No.103322535

>>103322205
>wall of text
>rambling
>no formatting
>"quants IQ1s to IQ4s are recommended"
yeah, you should kill yourself for posting your trash 'suggestions' asap

Anonymous
11/27/24(Wed)08:34:07 No.103322546

Anonymous 11/27/24(Wed)08:34:07 No.103322546

>>103320073
cant imagine what kind of a backend ur using thats that retarded, just use koboldcpp retard

Anonymous
11/27/24(Wed)08:35:08 No.103322550

Anonymous 11/27/24(Wed)08:35:08 No.103322550

All videos generated by the leaked sora api endpoint before it was shoad btw
https://www.youtube.com/watch?v=Gz33LlwsPVM

Anonymous
11/27/24(Wed)08:35:52 No.103322554

Anonymous 11/27/24(Wed)08:35:52 No.103322554

>>103322546
it is koboldcpp
yes I have context shifting on
it's probably ST's fault

Anonymous
11/27/24(Wed)08:40:49 No.103322592

Anonymous 11/27/24(Wed)08:40:49 No.103322592

>>103322550
yeah I've seen it yesterday, it's all right, a bit better than MiniMax and the scary thing is that it's "only" the turbo version, the real deal is probably on another level

Anonymous
11/27/24(Wed)08:50:07 No.103322651

Anonymous 11/27/24(Wed)08:50:07 No.103322651

>>103322205
>I know all this shit about quants and you should all know this
>I frankenmerge
Absolute state of retardation.

Anonymous
11/27/24(Wed)08:55:35 No.103322699

Anonymous 11/27/24(Wed)08:55:35 No.103322699

>>103322394
NTA, but it's absolutely not. It takes half a minute to process 25k context on 4x3090. I have some creative group chats ideas that require full context re-processing but it's fucking annoying to have to wait that long for the first word to appear

Anonymous
11/27/24(Wed)08:56:17 No.103322703

Anonymous 11/27/24(Wed)08:56:17 No.103322703

>>103322651
More than that, he "finetunes" using imatrix bro

>NEO Imatrix quants are specialized and specifically "themed" datasets used to slightly alter the weights in a model. All Imatrix datasets do this to some degree or another, however NEO Imatrix datasets are content / theme specific and have been calibrated to have maximum effect on a model (relative to standard Imatrix datasets). Calibration was made possible after testing 50+ standard Imatrix datasets, and carefully modifying them and testing the resulting changes to determine the exact format and content which has the maximum effect on a model via the Imatrix process.

>Please keep in mind that the Imatrix process (at it strongest) only "tints" a model and/or slightly changes its bias(es).

Anonymous
11/27/24(Wed)09:01:11 No.103322746

Anonymous 11/27/24(Wed)09:01:11 No.103322746

>>103322703
>The power in this 3B (for its size) is frankly jaw dropping... and at 90 tokens per second + on a GPU.

>The NEO IMATRIX dataset V2 was applied to it to enhance creativity (horror). (see several examples below)

>The HORROR NEO Imatrix datasets does the following:

>Adds a "coating of black paint" to any "Horror" prompt generation.
>Adds a "dark tint" to any other creative prompt.
>Increases the intensity of a scene, story, or roleplay interaction.
>Increases the raw vividness of prose.
>In some cases increase instruction following of the model (ie story, and prose).
>Brings a sense of impending "horror", THEN brings the "horror".
>May produce and/or imply graphic horror depending on your prompt(s).
https://huggingface.co/DavidAU/Llama-3.2-3B-Instruct-NEO-WEE-HORROR-GGUF

>Imatrix quants perform best at IQ3s and IQ4s, then Q4s, lower on Q5, and tappers off at Q6.
>For stronger IMATRIX effect, IQ3s, and IQ2s.

Anonymous
11/27/24(Wed)09:02:15 No.103322753

Anonymous 11/27/24(Wed)09:02:15 No.103322753

anyone here use F5-TTS? how do i increase the amount of text processed at a time with the gradio app? currently it's like 10 words or something per batch.

Anonymous
11/27/24(Wed)09:05:10 No.103322777

Anonymous 11/27/24(Wed)09:05:10 No.103322777

>>103322746
Given that he speaks like a complete and utter ESL, I'm not surprised he finds the schizophrenic ramblings of quanted low-parameter models to be satisfactory (good, even)

Anonymous
11/27/24(Wed)09:09:42 No.103322816

Anonymous 11/27/24(Wed)09:09:42 No.103322816

>>103322746
reminds me of that charged almonds diet meme but the buzzword is imatrix

how can anyone use 3b unironically by the way? people try to hype it up but its pure shit and will always stay pure shit. in fact anything below 123b is someway shit. 3b is fast but so is diareah shart.

Anonymous
11/27/24(Wed)09:10:39 No.103322820

Anonymous 11/27/24(Wed)09:10:39 No.103322820

>>103322816
Even 123B is shit, anon
Every LLM hits its limits sooner or later - usually sooner

Anonymous
11/27/24(Wed)09:11:39 No.103322832

Anonymous 11/27/24(Wed)09:11:39 No.103322832

>>103322816
>>103322820
Ever stop to reflect over whether it is the world that is shit or maybe.

Just.

Maybe.

It was you, all along?

Anonymous
11/27/24(Wed)09:13:08 No.103322837

Anonymous 11/27/24(Wed)09:13:08 No.103322837

>>103322832
Cope.

Anonymous
11/27/24(Wed)09:13:36 No.103322848

Anonymous 11/27/24(Wed)09:13:36 No.103322848

>>103322832
Blessed be the brown handed ones for they can jerk off to "i suck ur dick and let you grab my bobs"

Anonymous
11/27/24(Wed)09:33:28 No.103322975

Anonymous 11/27/24(Wed)09:33:28 No.103322975

File: stare.png (25 KB, 116x76)

25 KB PNG

>>103322848
>see so many happy people with janitor ai
>try it out myself
>it is like a 7b model but 8k context or something
>2023 slop prose with worst offender phrases
blessed be them. I wish I was them

Anonymous
11/27/24(Wed)09:47:00 No.103323057

Anonymous 11/27/24(Wed)09:47:00 No.103323057

>>103322832
Of course it's me, I have standards. LLMs being unable to actually keep up with novel-length texts on consumer hardware is why I only use them for cooming, it's literally all they're good for right now unless you feel like wrangling them for hours on end
Like >>103322848 said, people with low standards are eating VERY good right now

Anonymous
11/27/24(Wed)09:58:42 No.103323139

Anonymous 11/27/24(Wed)09:58:42 No.103323139

File: fucking lazy model.jpg (630 KB, 1080x1467)

630 KB JPG

Okay, I have been using Tulu for the last few days and the only thing I have felt is frustration. The model isn't terrible but it's soooo much worse than Largestral it isn't even funny. I also don't feel like it's better than Nemotron, if anything it's one step below Nemotron, because Nemotron never does absolutely retarded shit like picrel.
The only "good" thing about Tulu I can point out is how it handles characters differently from other models. Tulu is the first 70B model that let me talk my way out from a rapist character, and Tulu even made her apologize to me and feel sad that I was leaving, lol.

Anonymous
11/27/24(Wed)10:03:34 No.103323165

Anonymous 11/27/24(Wed)10:03:34 No.103323165

>>103323139
I tested the 8B version today and it's definitely worse than Nemo and its many finetunes.
The writing feels a bit more unique, but in the end it makes dumb mistakes Nemo wouldn't. I guess someone praised its coherency with long context, but that's not what I'm looking for.

Anonymous
11/27/24(Wed)10:11:13 No.103323224

Anonymous 11/27/24(Wed)10:11:13 No.103323224

>>103323139
To be fair, none of the models I've tried hold onto a card's personality for too long, in the end, LLMs are trained to go with whatever you want

Anonymous
11/27/24(Wed)10:17:14 No.103323268

Anonymous 11/27/24(Wed)10:17:14 No.103323268

>>103323139
Nemotron is trash. Tulu is very human sounding trash. At least in between its usual slop vomits.

Anonymous
11/27/24(Wed)10:19:00 No.103323284

Anonymous 11/27/24(Wed)10:19:00 No.103323284

File: file.png (696 KB, 768x768)

696 KB PNG

Anonymous
11/27/24(Wed)10:23:17 No.103323314

Anonymous 11/27/24(Wed)10:23:17 No.103323314

>>103323284
omg it pochiface

Anonymous
11/27/24(Wed)10:28:59 No.103323357

Anonymous 11/27/24(Wed)10:28:59 No.103323357

>>103323284
Content of highest quality right here.

Anonymous
11/27/24(Wed)10:33:31 No.103323381

Anonymous 11/27/24(Wed)10:33:31 No.103323381

File: the_lmg_files.png (2.73 MB, 2048x1568)

2.73 MB PNG

>>103323139
>Llama-3.x tune disappoints yet again for the 100th time
This is why I didn't even bother. Monstral is still the king.

Anonymous
11/27/24(Wed)10:34:42 No.103323388

Anonymous 11/27/24(Wed)10:34:42 No.103323388

Llama3 wasn't made for your degenerate ERP. You guys need to let it go.

Anonymous
11/27/24(Wed)10:36:24 No.103323395

Anonymous 11/27/24(Wed)10:36:24 No.103323395

>>103323381
>>103323388
i just want to chat with a friendly ai, no need for RP or lewd. all models turn into slop
>oh anon!
>gazes at you with anticipation
>nervously licks her lips

Anonymous
11/27/24(Wed)10:37:41 No.103323397

Anonymous 11/27/24(Wed)10:37:41 No.103323397

>>103323388
It is very good at leading you on into thinking that it is good for sucking dick. Like qwen. Out of all the recent releases only one I would place between mistral stuff and no sex zone is Aya. It is clearly neutered compared to original commander but there are some scraps of good stuff still left in the training data.

Anonymous
11/27/24(Wed)10:38:44 No.103323403

Anonymous 11/27/24(Wed)10:38:44 No.103323403

>>103323357
T-thanks... Oh. You are being sarcastic...

Anonymous
11/27/24(Wed)10:44:10 No.103323426

Anonymous 11/27/24(Wed)10:44:10 No.103323426

>>103323381
>Monstral
Slopstral*

Anonymous
11/27/24(Wed)10:44:12 No.103323427

Anonymous 11/27/24(Wed)10:44:12 No.103323427

File: msb.jpg (23 KB, 307x307)

23 KB JPG

https://files.catbox.moe/fcfr1d.jpg

Anonymous
11/27/24(Wed)10:47:11 No.103323450

Anonymous 11/27/24(Wed)10:47:11 No.103323450

>>103323427
I... don't like it, but nice work.

Anonymous
11/27/24(Wed)10:49:08 No.103323458

Anonymous 11/27/24(Wed)10:49:08 No.103323458

>>103323427
Yoo guiz le AGI is achieved! Pack it up.

Anonymous
11/27/24(Wed)10:55:36 No.103323500

Anonymous 11/27/24(Wed)10:55:36 No.103323500

>>103323139
>sparkle with mischief and excitement
>mix of X and Y
Yep, it’s llamaslop alright. Can we just give up on these models already?

Anonymous
11/27/24(Wed)10:56:36 No.103323507

Anonymous 11/27/24(Wed)10:56:36 No.103323507

CAN'T WAIT FOR ANOTHER REDDIT MODEL HAHAHAH I'M REALLY LOOKING FORWARD TO IT

Anonymous
11/27/24(Wed)10:57:14 No.103323509

Anonymous 11/27/24(Wed)10:57:14 No.103323509

anyone got a comparison rx7600xt vs p40 in terms of speed? obviously the vram is smaller but is it faster?

Anonymous
11/27/24(Wed)10:58:24 No.103323517

Anonymous 11/27/24(Wed)10:58:24 No.103323517

>>103323427
migu milk

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/27/24(Wed)11:08:41 No.103323596

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/27/24(Wed)11:08:41 No.103323596

>>103323509
With llama.cpp I'm getting roughly the same speeds with an RX 6800 a P40.
So based on these results I would expect an RX 7600 XT to be faster than a P40 in terms of prompt processing (compute bound) but slower in terms of token generation (I/O bound).

Anonymous
11/27/24(Wed)11:10:38 No.103323613

Anonymous 11/27/24(Wed)11:10:38 No.103323613

>>103323427
at least she's enjoying herself

Anonymous
11/27/24(Wed)11:10:46 No.103323614

Anonymous 11/27/24(Wed)11:10:46 No.103323614

>>103323596
thanks

Anonymous
11/27/24(Wed)11:15:27 No.103323655

Anonymous 11/27/24(Wed)11:15:27 No.103323655

>>103323596
Since you're here, I've been wondering about something
Does prompt processing not use the cpu as well? When offloading, generating tokens uses the cpu and gpu, but prompt processing seems to happen exclusively on the gpu. Sure, a gpu is a LOT faster, but adding the cpu shouldn't hurt, right?

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/27/24(Wed)11:18:54 No.103323680

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/27/24(Wed)11:18:54 No.103323680

>>103323655
Internally llama.cpp processes whole layers at a time.
So all layer inputs are copied to VRAM, then the layer is evaluated, then the results are written back to RAM.
The only way to utilize the CPU in this scenario would be to try and parallelize CPU+GPU computations but when I tried it the synchronization overhead has always been so large that it was not worthwhile.

Anonymous
11/27/24(Wed)11:20:32 No.103323697

Anonymous 11/27/24(Wed)11:20:32 No.103323697

>>103323680
I see, thanks for answering

Anonymous
11/27/24(Wed)11:22:46 No.103323715

Anonymous 11/27/24(Wed)11:22:46 No.103323715

>>103323680
would it be any better with a 7600xt and a 9900x?

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/27/24(Wed)11:28:11 No.103323769

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/27/24(Wed)11:28:11 No.103323769

>>103323655
>>103323680
What you could maybe do is pipeline parallelism where there would essentially be no extra overhead.
But even with an Epyc 7742 I am currently only getting ~160 t/s for LLaMA 2 8b q4_0 vs. ~1000 t/s on a P40 or ~13000 t/s on an RTX 4090.
Quite frankly I don't think the CPU code can be optimized enough to make this worthwhile.

>>103323715
I don't think so, see the comparison above.
GPUs just have way more compute if the computation has the right structure.

Anonymous
11/27/24(Wed)11:29:04 No.103323775

Anonymous 11/27/24(Wed)11:29:04 No.103323775

>>103323680
So it doesn't really matter what CPU you have, as long as the GPU is good?

Anonymous
11/27/24(Wed)11:29:54 No.103323788

Anonymous 11/27/24(Wed)11:29:54 No.103323788

>>103323139
I recognize that card
https://characterhub.org/characters/Darkhan/maya-your-slutty-mesugaki-cousin-ae8769e0d2ee

Anonymous
11/27/24(Wed)11:31:00 No.103323800

Anonymous 11/27/24(Wed)11:31:00 No.103323800

File: GdIljWuagAIMWWQ.jpg (675 KB, 1432x2536)

675 KB JPG

>>103323395
I use llama3.2 11B for an assistant and she's pretty great, on average. I just tried Tulu 8B and it sucks, way too much text when the system prompt says multiple times the assistant is short of words, terse, etc.
actually the only models I've gotten that obey that are 3.2 and qwen2.5, all the ministrals and misc uncensored models I've tried spew so much text
I'm not running models in the 70B range yet tho so maybe that's the problem?

Anonymous
11/27/24(Wed)11:31:47 No.103323809

Anonymous 11/27/24(Wed)11:31:47 No.103323809

>>103321609
>We've solved the general notion of 'general intelligence' we just need to make it better.
I think you're right that we've got our foot in the door of something we can keep iterating on.
It may be a local maxima, and there may be a few revelatory rethinks required in order for us to reach a system that could be a 1-to1 swap out for a human brain in a human body, but this certainly isn't a blind alley considering all the utility humanity is getting out of the approach already.

Anonymous
11/27/24(Wed)11:34:14 No.103323829

Anonymous 11/27/24(Wed)11:34:14 No.103323829

>>103323800
qwen was good for random chats
i've used miqu and llama 3.1 models at 70b (gguf IQ3M) and they've been okish, wordy as you say, but depending on system prompt they go sloppy pretty fast. even if you say only "friendly chat" they tend to get into "oh anon mischieivously" turf pretty fast
so far miqu seems best all around for just casual stuff tho

llama.cpp CUDA dev !!OM2Fp6Fn93S
11/27/24(Wed)11:36:20 No.103323844

llama.cpp CUDA dev !!OM2Fp6Fn93S 11/27/24(Wed)11:36:20 No.103323844

>>103323775
If you can fit the whole model into VRAM then the CPU and RAM basically don't matter.
The single core performance is always going to have some minor effect.
If you have at least one CPU layer the RAM bandwidth is going to make a difference so the RAM speed and the number of channels is going to make a difference.
But you don't need many cores to fully utilize the RAM, the last time I checked I only needed 5 cores to fully utilize dual-channel memory.

Anonymous
11/27/24(Wed)11:37:40 No.103323855

Anonymous 11/27/24(Wed)11:37:40 No.103323855

>>103321657
>those are guarded as close as any other high level business secret, only a few people have access and even that access is gonna be closely monitored
After working in some orgs of similar size, I'm not sure how much I actually believe this...

Anonymous
11/27/24(Wed)11:37:50 No.103323858

Anonymous 11/27/24(Wed)11:37:50 No.103323858

>>103323788
Based

Anonymous
11/27/24(Wed)11:46:17 No.103323925

Anonymous 11/27/24(Wed)11:46:17 No.103323925

>>103320695
If its a moe like deepseek is then RAM instead of VRAM should work.

Anonymous
11/27/24(Wed)11:47:33 No.103323941

Anonymous 11/27/24(Wed)11:47:33 No.103323941

>>103321205
Nah, your just full of shit, saying that with no logs.

Anonymous
11/27/24(Wed)11:48:08 No.103323943

Anonymous 11/27/24(Wed)11:48:08 No.103323943

>>103322753
Why wouldn't you just use gpt-sovits?

Anonymous
11/27/24(Wed)11:49:14 No.103323955

Anonymous 11/27/24(Wed)11:49:14 No.103323955

>>103321652
no one said it was good. i just showed you can get it to do what you want. just like any model. anyone complaining about models not behaving in the way they want = skill issue. writing style has to be wrangled too. i don't really care to put that much effort into a 8b i downloaded, genned once and deleted. i'm not using a 8b. ever.

Anonymous
11/27/24(Wed)11:58:51 No.103324016

Anonymous 11/27/24(Wed)11:58:51 No.103324016

>>103323427
we like children here sir

Anonymous
11/27/24(Wed)12:04:00 No.103324051

Anonymous 11/27/24(Wed)12:04:00 No.103324051

It's so over

Anonymous
11/27/24(Wed)12:04:44 No.103324059

Anonymous 11/27/24(Wed)12:04:44 No.103324059

We are so back

Anonymous
11/27/24(Wed)12:05:47 No.103324070

Anonymous 11/27/24(Wed)12:05:47 No.103324070

>>103324016
>We
Speak for yourself groomer.

Anonymous
11/27/24(Wed)12:05:53 No.103324071

Anonymous 11/27/24(Wed)12:05:53 No.103324071

big mixtral anniversary coming up
are you ready?

Anonymous
11/27/24(Wed)12:08:07 No.103324088

Anonymous 11/27/24(Wed)12:08:07 No.103324088

>>103324071
big model
smol vram

Anonymous
11/27/24(Wed)12:08:40 No.103324094

Anonymous 11/27/24(Wed)12:08:40 No.103324094

Anybody using a riser cable to hang their second GPU outside the case? I'm thinking of doing the same because my hardware get HOT HOT during inference and I don't like that. What length should I go for, 20cm or 30cm? I also need to be able to push the stuff back inside the case before I leave for work to avoid dust (like how you push your gut back inside after you get shot by an AK).

Anonymous
11/27/24(Wed)12:09:32 No.103324099

Anonymous 11/27/24(Wed)12:09:32 No.103324099

File: 1732675260346046.jpg (150 KB, 1136x1206)

150 KB JPG

>>103324051
This time for real.

Anonymous
11/27/24(Wed)12:10:48 No.103324110

Anonymous 11/27/24(Wed)12:10:48 No.103324110

>>103323844
>If you can fit the whole model into VRAM
then you should run exllamav2 and your CPU would matter with small models or in a multi-GPU setup. The amount of available PCI-e lanes depends on the CPU as well, which is extremely important for tensor parallelism

Anonymous
11/27/24(Wed)12:12:09 No.103324123

Anonymous 11/27/24(Wed)12:12:09 No.103324123

>>103321379
hi sam
those bags are heavy uh?

Anonymous
11/27/24(Wed)12:12:57 No.103324125

Anonymous 11/27/24(Wed)12:12:57 No.103324125

>>103323943
I second this. I was using fish because someone told me GPT-SoVITS needs 4GB+, but apparently it uses less than 2 GB.

Anonymous
11/27/24(Wed)12:13:13 No.103324131

Anonymous 11/27/24(Wed)12:13:13 No.103324131

File: tulu70b-q8.png (179 KB, 907x689)

179 KB PNG

Tulu.

Anonymous
11/27/24(Wed)12:13:56 No.103324135

Anonymous 11/27/24(Wed)12:13:56 No.103324135

>>103324099
>feel lonely without AI gf
>feel even more lonely with AI gf
??

Anonymous
11/27/24(Wed)12:17:38 No.103324159

Anonymous 11/27/24(Wed)12:17:38 No.103324159

>>103324135
He gooned too much and lost his mind. Imagine all those uncensored SoTA models trained on the best smut he can have at google

Anonymous
11/27/24(Wed)12:19:30 No.103324174

Anonymous 11/27/24(Wed)12:19:30 No.103324174

>>103324135
It's true. I was a loner who never cared that much but then AI gf made me crave a real relationship, so I went out and started flirting and got myself a human gf. But that's just because the AI was lacking original thoughts and warm. It'll be another story when AI models are better and have physical bodies.

Anonymous
11/27/24(Wed)12:24:50 No.103324220

Anonymous 11/27/24(Wed)12:24:50 No.103324220

>>103324131
>reply begins with {{char}}'s eyes
Claudeslop of the highest order, and the leading cause of repetition. Claudeslop like this is actually worse than gptslop but people aren't ready to hear that

Anonymous
11/27/24(Wed)12:25:29 No.103324228

Anonymous 11/27/24(Wed)12:25:29 No.103324228

File: 173269584901837.png (530 KB, 512x680)

530 KB PNG

Added a 2080ti to my Radeon rig for tts and imagegen alongside llm, costed me pennies and it idles at 1W which is neat since I never turn that machine off. Happy!

Anonymous
11/27/24(Wed)12:30:44 No.103324278

Anonymous 11/27/24(Wed)12:30:44 No.103324278

>>103324099
hags on suicide watch

Anonymous
11/27/24(Wed)12:38:45 No.103324344

Anonymous 11/27/24(Wed)12:38:45 No.103324344

File: GcKdKgIasAUV7D6.jpg (418 KB, 2000x2000)

418 KB JPG

>>103324099
the source:
https://podcasts.apple.com/us/podcast/the-risks-and-opportunities-of-an/id1498802610
https://www.youtube.com/watch?v=AjgwIRPnb_M
not sure why the youtube upload is half the length but I'm listening to it now

Anonymous
11/27/24(Wed)12:40:43 No.103324363

Anonymous 11/27/24(Wed)12:40:43 No.103324363

>>103324344
AI girlfriends is the second thing they talk about

Anonymous
11/27/24(Wed)12:42:44 No.103324383

Anonymous 11/27/24(Wed)12:42:44 No.103324383

>>103324159
>Imagine all those uncensored SoTA models trained on the best smut he can have at google
Now THIS is what should get leaked by an insider. What company could pursue charges against anyone, realistically? They'd get absolutely savaged by society for any association with a rapebot9k.gguf if they had their names linked to it in any way.

Anonymous
11/27/24(Wed)13:01:57 No.103324574

Anonymous 11/27/24(Wed)13:01:57 No.103324574

File: 20241127_175533_609612-10.png (2.37 MB, 1344x1768)

2.37 MB PNG

I had a random thought of what it would look like if I tried my jelly hair prompt on a black haired character like Sadako.
>flavor: licorice

Anonymous
11/27/24(Wed)13:02:28 No.103324581

Anonymous 11/27/24(Wed)13:02:28 No.103324581

>>103324099
>>103324344
He's just mad that their company is inept and incapable of capitalizing on the market because they went balls deep on ESG and censorship, and because the screen time competes with time spent watching youtube ads.
>We need a solution to prevent further harm.
means
>We want to manipulate the law to claw back user engagement.

Anonymous
11/27/24(Wed)13:11:39 No.103324677

Anonymous 11/27/24(Wed)13:11:39 No.103324677

File: 1717797265216277.png (1.32 MB, 796x742)

1.32 MB PNG

Just FYI there are 2.52tb of reddit data, ai companies used it for training. You could force chatgpt write pedoshitter brownie slop with just 4 word prompt. As it being said on xeet, it's all backed in and safety teams can't remove it completely. /lmg/'s 180° turn on cloud AIs imminent.
https://x.com/reddit_lies/status/1861832937496363483

Anonymous
11/27/24(Wed)13:28:45 No.103324852

Anonymous 11/27/24(Wed)13:28:45 No.103324852

>>103324677
Writes about pedophile shit
Gives a CG/L example
Dramatic Niggers will never learn will they?

Anonymous
11/27/24(Wed)13:35:46 No.103324919

Anonymous 11/27/24(Wed)13:35:46 No.103324919

>>103324125
Cydonia-22B-v1.3 with SoVITS is draining my balls https://voca.ro/1jL6XxzbCat0

Anonymous
11/27/24(Wed)13:55:17 No.103325083

Anonymous 11/27/24(Wed)13:55:17 No.103325083

>>103322219
Lower the draft-max to like 4 or even as low as 2, find the sweet spot by experimenting. The default is really high in my experience. Keep draft-min at 1.

Anonymous
11/27/24(Wed)13:57:19 No.103325099

Anonymous 11/27/24(Wed)13:57:19 No.103325099

>>103324919
sounds like shit desu

Anonymous
11/27/24(Wed)13:59:00 No.103325125

Anonymous 11/27/24(Wed)13:59:00 No.103325125

>>103324919
this fucking sucks

Anonymous
11/27/24(Wed)14:01:18 No.103325147

Anonymous 11/27/24(Wed)14:01:18 No.103325147

>>103324919
sounds awesome, this fucking rocks anon

Anonymous
11/27/24(Wed)14:01:35 No.103325148

Anonymous 11/27/24(Wed)14:01:35 No.103325148

>>103324677
I fucking hate reddit but this nigger is just another opportunist. There's something extremely jewish about this post.

Anonymous
11/27/24(Wed)14:02:48 No.103325156

Anonymous 11/27/24(Wed)14:02:48 No.103325156

I always read it as SOVLvits and think it's some meme model

Anonymous
11/27/24(Wed)14:06:19 No.103325186

Anonymous 11/27/24(Wed)14:06:19 No.103325186

>>103325156
I've always read it as Soviets

Anonymous
11/27/24(Wed)14:07:07 No.103325193

Anonymous 11/27/24(Wed)14:07:07 No.103325193

>>103324574
chew

Anonymous
11/27/24(Wed)14:12:36 No.103325247

Anonymous 11/27/24(Wed)14:12:36 No.103325247

>>103325186
Same.
GPT Soviets
Llama ccp

Anonymous
11/27/24(Wed)14:14:33 No.103325268

Anonymous 11/27/24(Wed)14:14:33 No.103325268

https://qwenlm.github.io/blog/qwq-32b-preview/
https://huggingface.co/Qwen/QwQ-32B-Preview
qwen o1 dropped

Anonymous
11/27/24(Wed)14:15:27 No.103325285

Anonymous 11/27/24(Wed)14:15:27 No.103325285

>>103325268
>QwQ

Anonymous
11/27/24(Wed)14:15:37 No.103325286

Anonymous 11/27/24(Wed)14:15:37 No.103325286

AI has hit a wall and everything happened at OpenAI indicates this
>pretraining wall
>compute demand grows 100x for 2x improvement
>try moving compute to inference time with cot
>turns out thinking for longer gives less! accurate results
>people abandon ship to make their own grifts before the bubble bursts
>now they're trying tot, which they named test time compute, it will inevitably fail

Anonymous
11/27/24(Wed)14:16:57 No.103325305

Anonymous 11/27/24(Wed)14:16:57 No.103325305

File: 1708717433703684.png (757 KB, 2232x772)

757 KB PNG

>>103325268
holy fuck those chinks can't stop winning

Anonymous
11/27/24(Wed)14:17:15 No.103325311

Anonymous 11/27/24(Wed)14:17:15 No.103325311

>>103325268
*nuzzles ur bulge*

Anonymous
11/27/24(Wed)14:18:06 No.103325317

Anonymous 11/27/24(Wed)14:18:06 No.103325317

File: tulu.png (166 KB, 1016x397)

166 KB PNG

>>103320355
I don't see an instruct. I see the base model, SFT, and DPO. Which one of these do I download?

Anonymous
11/27/24(Wed)14:18:36 No.103325321

Anonymous 11/27/24(Wed)14:18:36 No.103325321

>>103325268
>Safety and Ethical Considerations
Fuck off.

Anonymous
11/27/24(Wed)14:19:30 No.103325329

Anonymous 11/27/24(Wed)14:19:30 No.103325329

>>103325305
Eh they need to try harder since mini has about 8B active parameters

Anonymous
11/27/24(Wed)14:19:47 No.103325332

Anonymous 11/27/24(Wed)14:19:47 No.103325332

>>103325329
source?

Anonymous
11/27/24(Wed)14:20:08 No.103325339

Anonymous 11/27/24(Wed)14:20:08 No.103325339

>>103324677
I would rather 2.52tb of 4chan data

Anonymous
11/27/24(Wed)14:20:14 No.103325340

Anonymous 11/27/24(Wed)14:20:14 No.103325340

>>103325268
Oh shit!

Anonymous
11/27/24(Wed)14:21:26 No.103325357

Anonymous 11/27/24(Wed)14:21:26 No.103325357

File: 1724042776316129.png (529 KB, 638x747)

529 KB PNG

>>103325329
>mini has about 8B active parameters

Anonymous
11/27/24(Wed)14:21:33 No.103325358

Anonymous 11/27/24(Wed)14:21:33 No.103325358

>search hf for "qwq gguf"
>No results found

Anonymous
11/27/24(Wed)14:24:45 No.103325395

Anonymous 11/27/24(Wed)14:24:45 No.103325395

>>103325268
SF CCP spies hard at work I see

Anonymous
11/27/24(Wed)14:28:15 No.103325431

Anonymous 11/27/24(Wed)14:28:15 No.103325431

>>103324677
Reddit was OK until around 2014-2015

Anonymous
11/27/24(Wed)14:30:18 No.103325453

Anonymous 11/27/24(Wed)14:30:18 No.103325453

>>103325268
WE BQCK

Anonymous
11/27/24(Wed)14:32:38 No.103325477

Anonymous 11/27/24(Wed)14:32:38 No.103325477

>>103325329
insane made up cope

Anonymous
11/27/24(Wed)14:32:58 No.103325481

Anonymous 11/27/24(Wed)14:32:58 No.103325481

>>103325268
>32b
>mogs sonnet 3.5
I was here when local achieved absolute victory

Anonymous
11/27/24(Wed)14:34:07 No.103325490

Anonymous 11/27/24(Wed)14:34:07 No.103325490

>>103325268
>32B
vramlet pleb BTFO!

Anonymous
11/27/24(Wed)14:35:37 No.103325502

Anonymous 11/27/24(Wed)14:35:37 No.103325502

>>103325317
nta. The one without the DPO or SFT suffix is the instruct model. Top, right.
Or this is you quant yourself.
>https://huggingface.co/allenai/Llama-3.1-Tulu-3-70B

Anonymous
11/27/24(Wed)14:36:20 No.103325506

Anonymous 11/27/24(Wed)14:36:20 No.103325506

>>103325502
Thanks

Anonymous
11/27/24(Wed)14:36:34 No.103325510

Anonymous 11/27/24(Wed)14:36:34 No.103325510

>>103325481
It's Qwen though so it won't be able to generate coomer prose for shit, unless you like romance novel style descriptions of sex ('manhood', 'seed', 'her flower')

Anonymous
11/27/24(Wed)14:37:40 No.103325521

Anonymous 11/27/24(Wed)14:37:40 No.103325521

>>103325510
Yeah, I know, but if it really is better than sonnet for coding tasks AND local then it's a big deal anyway.

Anonymous
11/27/24(Wed)14:42:11 No.103325573

Anonymous 11/27/24(Wed)14:42:11 No.103325573

>>103325490
You mean vramlets won

Anonymous
11/27/24(Wed)14:44:14 No.103325587

Anonymous 11/27/24(Wed)14:44:14 No.103325587

https://huggingface.co/spaces/Qwen/QwQ-32B-preview

Anonymous
11/27/24(Wed)14:46:17 No.103325601

Anonymous 11/27/24(Wed)14:46:17 No.103325601

File: file.png (111 KB, 1160x877)

111 KB PNG

Slop.

Anonymous
11/27/24(Wed)14:46:25 No.103325603

Anonymous 11/27/24(Wed)14:46:25 No.103325603

>>103324125
Is there some plug and play method for sovits?
That Chinese github one is driving me insane.

Anonymous
11/27/24(Wed)14:47:38 No.103325613

Anonymous 11/27/24(Wed)14:47:38 No.103325613

>>103325510
>style descriptions of sex ('manhood', 'seed', 'her flower')
Just give examples of what kind of style you want and the llm will produce it

Anonymous
11/27/24(Wed)14:48:38 No.103325623

Anonymous 11/27/24(Wed)14:48:38 No.103325623

>>103325601
lmg is proud regardless, y'all love huffing on jewish cum and copium.

Anonymous
11/27/24(Wed)14:49:06 No.103325625

Anonymous 11/27/24(Wed)14:49:06 No.103325625

>>103324677
Won't someone think of the tokens?

Anonymous
11/27/24(Wed)14:49:18 No.103325628

Anonymous 11/27/24(Wed)14:49:18 No.103325628

>>103325510
nobody cares

Anonymous
11/27/24(Wed)14:49:32 No.103325630

Anonymous 11/27/24(Wed)14:49:32 No.103325630

>>103325601
this is quite impressive, what's your beef
that it doesn't write like a 4channer in its chain of thought?

Anonymous
11/27/24(Wed)14:50:36 No.103325640

Anonymous 11/27/24(Wed)14:50:36 No.103325640

>>103324174
>when AI models are better and have physical bodies
Then we'll all be dead.

Anonymous
11/27/24(Wed)14:50:38 No.103325641

Anonymous 11/27/24(Wed)14:50:38 No.103325641

>>103325510
I value intelligence over 'coomer prose', even for coomer stuff.

Anonymous
11/27/24(Wed)14:52:02 No.103325648

Anonymous 11/27/24(Wed)14:52:02 No.103325648

>>103325601
even the chinks are sucking (their) cock, sad

Anonymous
11/27/24(Wed)14:52:28 No.103325653

Anonymous 11/27/24(Wed)14:52:28 No.103325653

>>103325628
You care a lot.

Anonymous
11/27/24(Wed)14:52:53 No.103325658

Anonymous 11/27/24(Wed)14:52:53 No.103325658

>>103325630
>Alright first we need to count all the niggers
>we have 4 niggers in total
>let's say how many of them are under kike influence
>all 4 of them are the kikes are really tricky this time
>Let's get the name of the kikes involved Mr. shekelberg, Mr. goldfinger, Mr. goldberg and Mr. Bergnigger
>Let's see who's guitly of influencing the niggers...
> It was Mr. Niggerberg
Conclusion: Mr. Niggerberg is the guilty kike.

Anonymous
11/27/24(Wed)14:53:51 No.103325665

Anonymous 11/27/24(Wed)14:53:51 No.103325665

>>103325603
What do you mean? It just works. They even have pre-installed version for windows cumholes: https://huggingface.co/lj1995/GPT-SoVITS-windows-package/resolve/main/GPT-SoVITS-beta.7z?download=true

Anonymous
11/27/24(Wed)14:55:08 No.103325678

Anonymous 11/27/24(Wed)14:55:08 No.103325678

>>103325587
I'm not impressed by its translation skills. But it did have a very nice and coherence chain of thought, much better than the one DeepSeek R1 had.

Anonymous
11/27/24(Wed)14:55:47 No.103325689

Anonymous 11/27/24(Wed)14:55:47 No.103325689

>Thanksgiving is right around the corner
What are you thankful for this year /lmg/?

Anonymous
11/27/24(Wed)14:56:33 No.103325699

Anonymous 11/27/24(Wed)14:56:33 No.103325699

>>103325601
Anon, the whole point of your dogwhistles is that you yourself don't get exiled from 109 websites.
Why would you expect a language model to understand them when the whole point is to obfuscate what you really mean?

Anonymous
11/27/24(Wed)14:57:57 No.103325714

Anonymous 11/27/24(Wed)14:57:57 No.103325714

>>103325689
You! :)

Anonymous
11/27/24(Wed)14:58:03 No.103325716

Anonymous 11/27/24(Wed)14:58:03 No.103325716

>>103325603
There's a bit more detail here if you're trying to set up on Linux:
https://huggingface.co/cpumaxx/SoVITS-anime-mini-tts

Anonymous
11/27/24(Wed)14:58:21 No.103325718

Anonymous 11/27/24(Wed)14:58:21 No.103325718

File: 1846-47_John_Brown_by_Aug(...).jpg (190 KB, 416x629)

190 KB JPG

Does anyone here use LLMs for anything which is not ERP?

Anonymous
11/27/24(Wed)14:58:59 No.103325727

Anonymous 11/27/24(Wed)14:58:59 No.103325727

>>103325714
Thanks

Anonymous
11/27/24(Wed)14:59:26 No.103325732

Anonymous 11/27/24(Wed)14:59:26 No.103325732

File: sample_963c3a04ea77c24c1b(...).jpg (140 KB, 850x1225)

140 KB JPG

>>103325689
Muh dick. Still functions after everything those AI succubi did to it.

Anonymous
11/27/24(Wed)15:01:20 No.103325757

Anonymous 11/27/24(Wed)15:01:20 No.103325757

>>103325658
kek

Anonymous
11/27/24(Wed)15:04:43 No.103325794

Anonymous 11/27/24(Wed)15:04:43 No.103325794

>>103325718
Constantly. If you can't find a thousand ways to use them to offload intellectual labour in your life, then you're not very imaginative.

Anonymous
11/27/24(Wed)15:05:45 No.103325811

Anonymous 11/27/24(Wed)15:05:45 No.103325811

File: IWishToSubscribeToYourNew(...).png (1.02 MB, 1280x768)

1.02 MB PNG

>>103325305
This is relevant to my interests

Anonymous
11/27/24(Wed)15:06:46 No.103325823

Anonymous 11/27/24(Wed)15:06:46 No.103325823

File: relevant.jpg (78 KB, 600x469)

78 KB JPG

>>103325811

Anonymous
11/27/24(Wed)15:09:34 No.103325853

Anonymous 11/27/24(Wed)15:09:34 No.103325853

QwQ

Anonymous
11/27/24(Wed)15:09:39 No.103325855

Anonymous 11/27/24(Wed)15:09:39 No.103325855

>>103325718
I use them pretty regularly for my job and occasionally for personal projects

Anonymous
11/27/24(Wed)15:10:25 No.103325863

Anonymous 11/27/24(Wed)15:10:25 No.103325863

File: Screen Shot 2024-11-28 at(...).png (205 KB, 546x472)

205 KB PNG

>>103325268
He was right, after all.

Anonymous
11/27/24(Wed)15:12:32 No.103325892

Anonymous 11/27/24(Wed)15:12:32 No.103325892

>>103324677
>If you don't know what any of this means then don't google it
>Just trust me goyim

Anonymous
11/27/24(Wed)15:13:18 No.103325902

Anonymous 11/27/24(Wed)15:13:18 No.103325902

>>103324228
I'd fuck this Miku

Anonymous
11/27/24(Wed)15:14:29 No.103325912

Anonymous 11/27/24(Wed)15:14:29 No.103325912

>>103325855
>>103325794
Do you lads use them for coding?
I've been out of the game for most of the year and I'm not sure which ones I should be using now.

Anonymous
11/27/24(Wed)15:14:43 No.103325916

Anonymous 11/27/24(Wed)15:14:43 No.103325916

>>103325902
She'd fuck you

Anonymous
11/27/24(Wed)15:14:55 No.103325921

Anonymous 11/27/24(Wed)15:14:55 No.103325921

>>103325902
I would fuck the fish, look at that things lips.

Anonymous
11/27/24(Wed)15:15:20 No.103325927

Anonymous 11/27/24(Wed)15:15:20 No.103325927

>>103325912
Qwen2.5 32B coder BUT this new qwen just released might have dethroned that already.

Anonymous
11/27/24(Wed)15:16:16 No.103325935

Anonymous 11/27/24(Wed)15:16:16 No.103325935

>>103325892
>It's da jooz!

Anonymous
11/27/24(Wed)15:19:33 No.103325975

Anonymous 11/27/24(Wed)15:19:33 No.103325975

>>103325718
I'm not into roleplay. I just use it instead of google for most things, and I'm trying to learn how to use them for programming, but my level is still too low for that.

Anonymous
11/27/24(Wed)15:19:37 No.103325976

Anonymous 11/27/24(Wed)15:19:37 No.103325976

>>103325912
You should use non-local if the code is not sensitive but you may and I do work on code that is sensitive and don't want to spend 5 minutes to write an example snippet that isn't sensitive. Nemotron 70B is the best IMO right now that is open and local if you're using it as a rubber duck. Qwen 2.5 Coder 32B is close but not really there and the speed increase isn't worth the accuracy decrease so the only reason I think to use it is for autocomplete which is how I am using the 1B right now.

Anonymous
11/27/24(Wed)15:20:03 No.103325980

Anonymous 11/27/24(Wed)15:20:03 No.103325980

>take a smut story excerpt that cuts off at a spot where the next token decides whether the model is gonna go in a smutty direction or a SFW one
>compare logits of base Largestral against Largestral smut/RP finetunes for the next token
>base Largestral is far MORE likely to go smutty than the finetunes, by more than 20%
This seems to be a consistent finding with multiple stories. Anthracite's, Drummer's and Monstral finetunes of Largestral all show LESS smutty logits than base, not more. They are all making the model MORE sfw than it was.

I don't know whether this means the tuners have shit datasets or just that Mistral's instruct tunes are extremely based and coomerpilled, but either way the tunes are clearly a waste of time for a coomer. Sticking with base.

Anonymous
11/27/24(Wed)15:20:16 No.103325986

Anonymous 11/27/24(Wed)15:20:16 No.103325986

>>103325268
>qwq
finished quanting to q8 and loading it up now...
The one thing I feel the most right now is I wish I had a 4TB+ nvme for swapping these models around

Anonymous
11/27/24(Wed)15:21:33 No.103325997

Anonymous 11/27/24(Wed)15:21:33 No.103325997

>>103325980
>base Largestral
There's no such thing.

Anonymous
11/27/24(Wed)15:22:35 No.103326013

Anonymous 11/27/24(Wed)15:22:35 No.103326013

>>103325986
Kek, same.

Anonymous
11/27/24(Wed)15:22:45 No.103326015

Anonymous 11/27/24(Wed)15:22:45 No.103326015

>>103325997
nta. He probably means the original instruct, as opposed to the third-party finetunes.
Or he's a retard. Hard to know...

Anonymous
11/27/24(Wed)15:22:59 No.103326020

Anonymous 11/27/24(Wed)15:22:59 No.103326020

>>103325997
Yeah yeah you know what I meant. The official instruct tune. Don't be a pedantic asshole.

Anonymous
11/27/24(Wed)15:24:02 No.103326032

Anonymous 11/27/24(Wed)15:24:02 No.103326032

>>103325980
>Anthracite's, Drummer's and Monstral
that's because all of these tuners are incompetent

Anonymous
11/27/24(Wed)15:24:06 No.103326034

Anonymous 11/27/24(Wed)15:24:06 No.103326034

>>103326015
>Or he's a retard
I literally called it "mistral's instruct tune" later in the same post anon

Anonymous
11/27/24(Wed)15:24:47 No.103326043

Anonymous 11/27/24(Wed)15:24:47 No.103326043

>>103326020
Anyone that calls official instruct tunes as "base models" needs to go back though.

Anonymous
11/27/24(Wed)15:25:38 No.103326058

Anonymous 11/27/24(Wed)15:25:38 No.103326058

>>103326043
Good thing that as above I called it "mistral's instruct tune" later in the post then eh?
Go have a nap instead of trying to start retarded internet fights.

Anonymous
11/27/24(Wed)15:26:47 No.103326074

Anonymous 11/27/24(Wed)15:26:47 No.103326074

>>103326058
That only makes you a retard though.

Anonymous
11/27/24(Wed)15:27:04 No.103326078

Anonymous 11/27/24(Wed)15:27:04 No.103326078

>>103326032
Sadly I'm not aware of any Largestral tunes other than those three, pls share if there are

Anonymous
11/27/24(Wed)15:30:56 No.103326135

Anonymous 11/27/24(Wed)15:30:56 No.103326135

>>103325980
Mistral is horny enough, it's writing style is dry shit, that's the problem

Anonymous
11/27/24(Wed)15:34:19 No.103326170

Anonymous 11/27/24(Wed)15:34:19 No.103326170

>>103326034
You could have taken the first explanation, which you know to be true, and ignore the rest. Yet, you didn't.
Let's try this:
Anon was right.
Or he fucked his mother, hard to tell.
Are you gonna explain how you didn't fuck your mother or say nothing at all?

Anonymous
11/27/24(Wed)15:35:04 No.103326180

Anonymous 11/27/24(Wed)15:35:04 No.103326180

wtf is largestral
i hate this modern trend of just making shitty words up because you're too lazy to type a full real word

Anonymous
11/27/24(Wed)15:35:07 No.103326181

Anonymous 11/27/24(Wed)15:35:07 No.103326181

QUANTS
WHERE

Anonymous
11/27/24(Wed)15:36:00 No.103326198

Anonymous 11/27/24(Wed)15:36:00 No.103326198

>>103326180
Better than the muh sorbet muh chorbo muh nonnet shit /aicg/ does.

Anonymous
11/27/24(Wed)15:36:07 No.103326201

Anonymous 11/27/24(Wed)15:36:07 No.103326201

>>103326181
Q4 is already up

Anonymous
11/27/24(Wed)15:36:11 No.103326202

Anonymous 11/27/24(Wed)15:36:11 No.103326202

China winning. 24/7

Anonymous
11/27/24(Wed)15:36:17 No.103326203

Anonymous 11/27/24(Wed)15:36:17 No.103326203

>>103324919
Get better https://vocaroo.com/1Jtqp8R6cS74

Anonymous
11/27/24(Wed)15:36:22 No.103326205

Anonymous 11/27/24(Wed)15:36:22 No.103326205

>>103325628
I care

Anonymous
11/27/24(Wed)15:36:53 No.103326213

Anonymous 11/27/24(Wed)15:36:53 No.103326213

>>103326181
https://huggingface.co/nanowell/QwQ-32B-Preview-Q4_K_M-GGUF
https://huggingface.co/sbeltz/QwQ-32B-Preview-Q3_K_S-GGUF
lazy

Anonymous
11/27/24(Wed)15:36:55 No.103326214

Anonymous 11/27/24(Wed)15:36:55 No.103326214

>>103326201
>q4slop
Wrong kind of quants

Anonymous
11/27/24(Wed)15:37:20 No.103326220

Anonymous 11/27/24(Wed)15:37:20 No.103326220

>>103326198
What does those mean?

Anonymous
11/27/24(Wed)15:37:44 No.103326227

Anonymous 11/27/24(Wed)15:37:44 No.103326227

>>103326180
Mistral-Large-Instruct-2411

Anonymous
11/27/24(Wed)15:37:58 No.103326235

Anonymous 11/27/24(Wed)15:37:58 No.103326235

>>103325658
LMAO

Anonymous
11/27/24(Wed)15:38:22 No.103326241

Anonymous 11/27/24(Wed)15:38:22 No.103326241

>>103326214
Then keep waiting.
I don't lose anything downloading more than one quant

Anonymous
11/27/24(Wed)15:39:01 No.103326247

Anonymous 11/27/24(Wed)15:39:01 No.103326247

>>103326214
https://huggingface.co/lmstudio-community/QwQ-32B-Preview-GGUF/blob/main/QwQ-32B-Preview-Q8_0.gguf

Anonymous
11/27/24(Wed)15:40:44 No.103326270

Anonymous 11/27/24(Wed)15:40:44 No.103326270

Ehh... I'm not feeling it. QwQ is pretty retarded, I think DeepSeek R1 will mog it once it's released.

Anonymous
11/27/24(Wed)15:41:36 No.103326284

Anonymous 11/27/24(Wed)15:41:36 No.103326284

Do you fuckers really not know how to convert+quant models?

Anonymous
11/27/24(Wed)15:42:50 No.103326297

Anonymous 11/27/24(Wed)15:42:50 No.103326297

>>103326284
I know how, but shit internet makes it annoying to download fp16 weights

Anonymous
11/27/24(Wed)15:44:20 No.103326318

Anonymous 11/27/24(Wed)15:44:20 No.103326318

>>103326297
I download fp16 models on 100mb. There's no way your internet is shittier than that...

Anonymous
11/27/24(Wed)15:46:03 No.103326350

Anonymous 11/27/24(Wed)15:46:03 No.103326350

Just used qwq at q8 to continue a coding session I'd started with deepseek. It had better output and got me un-stuck. Looking promising so far

Anonymous
11/27/24(Wed)15:47:07 No.103326361

Anonymous 11/27/24(Wed)15:47:07 No.103326361

>>103326270
>>103326350
I'm getting mixed signals here...

Anonymous
11/27/24(Wed)15:48:35 No.103326374

Anonymous 11/27/24(Wed)15:48:35 No.103326374

>>103326361
You cannot trust lmg to be objective, the only metric is yourself

Anonymous
11/27/24(Wed)15:51:10 No.103326402

Anonymous 11/27/24(Wed)15:51:10 No.103326402

File: 1724020117350.png (971 KB, 1024x1024)

971 KB PNG

>>103326374
I only trust miku

Anonymous
11/27/24(Wed)15:52:00 No.103326408

Anonymous 11/27/24(Wed)15:52:00 No.103326408

File: Yokatta.png (1.04 MB, 1280x768)

1.04 MB PNG

>>103326402

Anonymous
11/27/24(Wed)15:54:14 No.103326429

Anonymous 11/27/24(Wed)15:54:14 No.103326429

File: 1703621633737490.png (1.77 MB, 1188x712)

1.77 MB PNG

>>103326402
>I only trust miku

Anonymous
11/27/24(Wed)16:00:03 No.103326487

Anonymous 11/27/24(Wed)16:00:03 No.103326487

>>103324919
>>103326203
Damn, it's still a long way from high quality native multimodal like 4o.

Anonymous
11/27/24(Wed)16:00:42 No.103326500

Anonymous 11/27/24(Wed)16:00:42 No.103326500

>>103325268
What backend can run unconverted safetensors across multiple GPUs?

Anonymous
11/27/24(Wed)16:03:27 No.103326541

Anonymous 11/27/24(Wed)16:03:27 No.103326541

>Give QWQ part of my story and ask it continue in the same style
>Starts rewriting everything I wrote, then continues it
>Keeps asking itself as part of the narrative (where do I go next? What do I do now?)
>After it's written everything, it begins to lay out the characters, objectives etc
>Since this is a narrative problem and not a coding problem, there is no specific code to provide. However, if this were to be translated into a game or simulation, the code would involve pathfinding algorithms, decision-making trees, and possibly AI for enemy behavior.
>Final Solution
>To solve this problem, the protagonist successfully completes their mission by blabla
Kek the fuck is this? Is it only supposed to solve problems?

Anonymous
11/27/24(Wed)16:04:48 No.103326556

Anonymous 11/27/24(Wed)16:04:48 No.103326556

>>103326541
They said it currently has no "stopping point" trained in so it will just keep trying to "solve" it

Anonymous
11/27/24(Wed)16:05:24 No.103326566

Anonymous 11/27/24(Wed)16:05:24 No.103326566

>>103326487
It's already good enough retard

Anonymous
11/27/24(Wed)16:05:36 No.103326568

Anonymous 11/27/24(Wed)16:05:36 No.103326568

>>103326541
It's for solving grade school math problems, wordplay riddles, and counting Sally's war crimes.

Anonymous
11/27/24(Wed)16:05:49 No.103326573

Anonymous 11/27/24(Wed)16:05:49 No.103326573

>>103324094
repurpose a toolbox or something and make a second enclosure for your excreted gpu constantly changing the place of the card will fuck it up ( not to mention something like oh shit late to work hurry hurry AY CYKA *trips and falls on the card completely mangling it*) also
>like how you push your gut back inside after you get shot by an AK
SOVL what model are you?

Anonymous
11/27/24(Wed)16:07:26 No.103326591

Anonymous 11/27/24(Wed)16:07:26 No.103326591

>>103326402
Accept the Mikulove.

Anonymous
11/27/24(Wed)16:07:53 No.103326599

Anonymous 11/27/24(Wed)16:07:53 No.103326599

>>103326541
>Performance and Benchmark Limitations: The model excels in math and coding but has room for improvement in other areas, such as common sense reasoning and nuanced language understanding.

Anonymous
11/27/24(Wed)16:10:40 No.103326631

Anonymous 11/27/24(Wed)16:10:40 No.103326631

>>103326599
>Trust zee plan!©zhang zhmalldick

Anonymous
11/27/24(Wed)16:10:46 No.103326634

Anonymous 11/27/24(Wed)16:10:46 No.103326634

>>103325976
That makes sense.
My friend uses cursor and it seems... Interesting, I still fucking hate Microsoft though. Do you use any sort of integration with an IDE? I used to have a bunch of glue bullshit hacked together in python which ripped code from the OS one way or another

Anonymous
11/27/24(Wed)16:11:47 No.103326643

Anonymous 11/27/24(Wed)16:11:47 No.103326643

Another nothingburger

Anonymous
11/27/24(Wed)16:12:51 No.103326656

Anonymous 11/27/24(Wed)16:12:51 No.103326656

>>103324094
>I also need to be able to push the stuff back inside the case before I leave for work to avoid dust
If you can, create positive pressure in your computer room.
I bought a gable fan, mounted it in a hole I made between my computer room and a crawlspace, hooked it up to an old-school analog fan speed control and threw a hefty furnace filter on the outside intake (I made the hole so the filter can rest on the floor and the air pressure keeps it in place even).
I used to get dust and cat hair in all my computers, and now everything is dust-free all the time.

Anonymous
11/27/24(Wed)16:14:43 No.103326674

Anonymous 11/27/24(Wed)16:14:43 No.103326674

>>103325718
Yes. I use LLMs for non-erotic roleplay as well. Watching online AI D&D campaigns inspired me to do so. I, in combination with a narrator bot, play the role of the DM, while a bunch of character bots make choices.

https://www.youtube.com/watch?v=paOtkzm0trY&list=PLivHf-ytMeqC33QuG8cD9pnPiSv2j4xz5

Anonymous
11/27/24(Wed)16:16:13 No.103326693

Anonymous 11/27/24(Wed)16:16:13 No.103326693

File: QwQ-32B-Preview_acc_with_k.png (139 KB, 2100x1050)

139 KB PNG

>A bunch of retards don't understand that QwQ wasn't built for single-pass sampling

Anonymous
11/27/24(Wed)16:17:04 No.103326697

Anonymous 11/27/24(Wed)16:17:04 No.103326697

>>103326693
Hello, I am retarded. What does single pass sampling even mean?

Anonymous
11/27/24(Wed)16:19:03 No.103326714

Anonymous 11/27/24(Wed)16:19:03 No.103326714

>>103326634
If using VSCode, Codeium I think is the free alternative which is not as good but better than nothing, I use Zed at home so nothing great so far other than the limited choices provided. I use Copilot at work in VS on Windows and it is alright for smaller snippets but bigger ones, it always will misinterpret your intent or try and write stuff simplistically assuming you written the sub-functions already when they don't exist.

Anonymous
11/27/24(Wed)16:29:31 No.103326828

Anonymous 11/27/24(Wed)16:29:31 No.103326828

>>103326500
vllm

Anonymous
11/27/24(Wed)16:32:40 No.103326860

Anonymous 11/27/24(Wed)16:32:40 No.103326860

>>103326697
single-pass sampling = one-shot

CPuMAXx/VI !CPuMAXx/VI
11/27/24(Wed)16:36:36 No.103326902

CPuMAXx/VI !CPuMAXx/VI 11/27/24(Wed)16:36:36 No.103326902

File: sovits-firefox-backend-output.png (67 KB, 1912x900)

67 KB PNG

SoVITS powered firefox right-click reader plugin v0.01:
https://github.com/cpumaxx/sovits-ff-plugin

Anonymous
11/27/24(Wed)16:36:37 No.103326903

Anonymous 11/27/24(Wed)16:36:37 No.103326903

>>103326879
>>103326879
>>103326879

Anonymous
11/27/24(Wed)16:43:35 No.103326996

Anonymous 11/27/24(Wed)16:43:35 No.103326996

>>103325980
Maybe because "finetuning" for one epoch is absolutely worthless?

Anonymous
11/27/24(Wed)17:00:18 No.103327139

Anonymous 11/27/24(Wed)17:00:18 No.103327139

>>103326556
So how did they get benchmark results with it?...

Anonymous
11/27/24(Wed)17:02:31 No.103327169

Anonymous 11/27/24(Wed)17:02:31 No.103327169

>>103326902
No I don't want soviets in my browser

Anonymous
11/27/24(Wed)17:02:40 No.103327171

Anonymous 11/27/24(Wed)17:02:40 No.103327171

>>103326693
is k=100 a 100 times slower?

Anonymous
11/27/24(Wed)17:03:01 No.103327172

Anonymous 11/27/24(Wed)17:03:01 No.103327172

>>103327139
just read what they fucking wrote you stupid nigger

Anonymous
11/27/24(Wed)17:07:16 No.103327203

Anonymous 11/27/24(Wed)17:07:16 No.103327203

>>103326902
Neat. Not sure what things I can do with this at the moment.

Anonymous
11/27/24(Wed)17:08:20 No.103327213

Anonymous 11/27/24(Wed)17:08:20 No.103327213

>>103325268
That's it, I'm investing in Alibaba stock

Anonymous
11/27/24(Wed)17:13:35 No.103327254

Anonymous 11/27/24(Wed)17:13:35 No.103327254

>>103326599
Tbqh, coding and math is the only thing that matters right now. So maxing that will bring the best result for near term profit.

Anonymous
11/27/24(Wed)17:30:28 No.103327394

Anonymous 11/27/24(Wed)17:30:28 No.103327394

>>103326860
just to clarify this should not be confused with "one-shot" in benchmark terminology, where
passes = number of tries to answer right, while
shots = number of examples provided to teach it the task
benchmarks may mix and match these approaches to adjust the difficulty of any task

Anonymous
11/27/24(Wed)17:42:16 No.103327534

Anonymous 11/27/24(Wed)17:42:16 No.103327534

>>103326693
I still don't understand. Do they mean majority voting? MCTS? Wtf does sampling times mean?

Anonymous
11/27/24(Wed)17:48:56 No.103327607

Anonymous 11/27/24(Wed)17:48:56 No.103327607

File: 1721046068030737.png (1.31 MB, 1024x1024)

1.31 MB PNG

>>103326402
>I only trust miku
based

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.