/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/24/24(Sat)10:54:46 No.102058880

File: dziewczyna-o-niebieskich-(...).jpg (117 KB, 1920x1080)

117 KB JPG

/lmg/ - Local Models General Anonymous 08/24/24(Sat)10:54:46 No.102058880 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102049023 & >>102036232

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
08/24/24(Sat)10:55:10 No.102058885

Anonymous 08/24/24(Sat)10:55:10 No.102058885

File: __kagamine_rin_and_kagami(...).jpg (117 KB, 550x660)

117 KB JPG

►Recent Highlights from the Previous Thread: >>102049023

--Q4 model capabilities and limitations discussed: >>102049767 >>102049830 >>102049845 >>102049859 >>102049892 >>102049941 >>102049991 >>102049995
--Planning a collaborative storytelling/RP session with AI models: >>102049428 >>102049969 >>102050021
--GGML tensor conversion and type casting: >>102053861 >>102053954 >>102054117 >>102055161
--Anon finds NovelCrafter and shares offline version: >>102055930 >>102055977 >>102055998 >>102056259
--InternVL2's image understanding capabilities debated: >>102054440 >>102054459 >>102054478 >>102054603
--Used 3090 recommended for 8B models: >>102053330 >>102053331 >>102053595 >>102054210 >>102054098 >>102054114 >>102056386 >>102056454 >>102056646
--Tips for improving Jamba 1.5 Mini chatbot's story progression and output length: >>102049810 >>102049833
--Stable-Diffusion.cpp now supports Flux, with reported 2.5x speedup on Vulkan: >>102056617 >>102056880
--Open source models are not being heavily censored, unlike proprietary ones: >>102051980 >>102053284 >>102053310 >>102053490
--No hype for llama4: >>102057438 >>102057471 >>102057474 >>102057532 >>102057560
--Llama 3.1 supports function calling, but users aren't utilizing it: >>102049113 >>102049129 >>102049233 >>102053085
--Grok and Chatbot Arena leaderboard: >>102053978
--Anon tries to improve AI-generated erotic writing: >>102055537 >>102055766 >>102055902 >>102055966 >>102055994 >>102056988 >>102057089 >>102057148 >>102057215 >>102057392 >>102057172
--Anon gets roasted for not providing context, and LLM limitations are discussed: >>102053008 >>102053077 >>102053139 >>102053240 >>102053305
--Anon discovers strange eye bias in Mistral Large conversations: >>102049135 >>102051979 >>102057865 >>102057937 >>102057994
--Anon asks for help with Nemo repetition, gets parameter adjustment advice: >>102052531 >>102052585
--Miku (free space): >>102049963 >>102050384

►Recent Highlight Posts from the Previous Thread: >>102049032

Anonymous
08/24/24(Sat)10:57:56 No.102058926

Anonymous 08/24/24(Sat)10:57:56 No.102058926

Happy Strawberry Weekend, friends!
See you Monday
;)

Anonymous
08/24/24(Sat)10:59:18 No.102058934

Anonymous 08/24/24(Sat)10:59:18 No.102058934

rin, but it's actually len, who forgot it was laundry day and has nothing to wear but his sister's clothes

Anonymous
08/24/24(Sat)11:01:40 No.102058965

Anonymous 08/24/24(Sat)11:01:40 No.102058965

File: F0RqsFOagAAirHB.jpg (249 KB, 2048x1918)

249 KB JPG

hey, where do I get quantized llama 3.1 70B to use with llama.cpp and gpu layers? last model I was using was llama-2-ggml-q5_K_M from theBloke I think. Am I looking for GGUF now or GTPQ?

unless there's something local that's 'smarter' than llama 3.1

thanks for help friends

Anonymous
08/24/24(Sat)11:12:04 No.102059102

Anonymous 08/24/24(Sat)11:12:04 No.102059102

>>102058965
If you're using llama.cpp, you need gguf.
>unless there's something local that's 'smarter' than llama 3.1
There isn't.

Anonymous
08/24/24(Sat)11:16:08 No.102059147

Anonymous 08/24/24(Sat)11:16:08 No.102059147

File: d6b52a72-0f96-4f5b-8708-4(...).png (360 KB, 512x512)

360 KB PNG

>>102058885
>>102056617
>it only takes ~10m to generate a 20 step 512x512 image.
What? It takes me 5 min to generate that with CPU only
Also using only 6 steps looks basically the same to me with flux
Just baked this one to check the time with 20 steps

Anonymous
08/24/24(Sat)11:17:07 No.102059160

Anonymous 08/24/24(Sat)11:17:07 No.102059160

>>102058880

>Jamba 1.5: 52B
/g/erdict?

>XTC
/g/erdict?

Anonymous
08/24/24(Sat)11:19:14 No.102059200

Anonymous 08/24/24(Sat)11:19:14 No.102059200

>>102059147
You likely have better RAM + CPU than he does.

Anonymous
08/24/24(Sat)11:28:35 No.102059325

Anonymous 08/24/24(Sat)11:28:35 No.102059325

>>102056617
>>102059147
Wtf, why? On GPU this takes literally 12 seconds, or 8 seconds for the main diffusion process.

Anonymous
08/24/24(Sat)11:34:27 No.102059409

Anonymous 08/24/24(Sat)11:34:27 No.102059409

File: file.png (61 KB, 1692x391)

61 KB PNG

happy pride month lmsys and sam

Anonymous
08/24/24(Sat)11:35:22 No.102059421

Anonymous 08/24/24(Sat)11:35:22 No.102059421

>>102058965
>unless there's something local that's 'smarter' than llama 3.1
At the top end with 405b no, but if you're targeting 70B Q5, you can probably get away with Mistral Large Q4 which would likely outperform it and just be a bit slower.
GGUF is the file format you're looking for, whichever model you end up choosing.

Anonymous
08/24/24(Sat)11:35:31 No.102059424

Anonymous 08/24/24(Sat)11:35:31 No.102059424

>>102059325
What about the quality and number of steps?
How many steps do you recommend?
I'll get a 3060 maybe soon

Anonymous
08/24/24(Sat)11:37:36 No.102059456

Anonymous 08/24/24(Sat)11:37:36 No.102059456

File: BB1joqkV.jpg (1.18 MB, 4049x2914)

1.18 MB JPG

>>102059409
grok won

Anonymous
08/24/24(Sat)11:42:03 No.102059524

Anonymous 08/24/24(Sat)11:42:03 No.102059524

>>102059456
I wonder if it's MoE like Grok 1 was. It'll probably be irrelevant when he open sources it in half a fucking year anyway so who cares.

Anonymous
08/24/24(Sat)11:44:14 No.102059553

Anonymous 08/24/24(Sat)11:44:14 No.102059553

>>102059160
Waiting for llama.cpp support.

Anonymous
08/24/24(Sat)11:49:15 No.102059635

Anonymous 08/24/24(Sat)11:49:15 No.102059635

has any of you actually ran llama 405b? after seeing how much of a slop 70b was i have hard time believing itd get this much better since i remember hearing something about diminishing returns with increasing model size

Anonymous
08/24/24(Sat)11:52:04 No.102059673

Anonymous 08/24/24(Sat)11:52:04 No.102059673

>>102059635
>i remember hearing something about diminishing returns with increasing model size
That was always a cope. See: every frontier model that exists right now.

Anonymous
08/24/24(Sat)11:53:49 No.102059698

Anonymous 08/24/24(Sat)11:53:49 No.102059698

>>102059424
Those numbers were for the res and steps you guys were testing. Generally though it's recommended to have 1024x1024. 20 steps is OK if you're just looking to see what a seed generally feels like but it'll more often miss things from your prompt, Miku will more often have pink eyes, missing her hair ties, etc. 30+ steps is recommended.

Anonymous
08/24/24(Sat)11:54:05 No.102059701

Anonymous 08/24/24(Sat)11:54:05 No.102059701

>>102059635
I did, it's pretty good for some tasks. But it's 100% slop.

Anonymous
08/24/24(Sat)11:54:21 No.102059707

Anonymous 08/24/24(Sat)11:54:21 No.102059707

>>102059635
The instruct tune is pure slop. Any semblance of creativity and interesting prose has been lobotomized out of it. But it's smart slop, no denying that. It's the best local there is for e.g. keeping track of details in long stories, not making obvious continuity errors with character states/positions/etc.

Anonymous
08/24/24(Sat)11:57:56 No.102059766

Anonymous 08/24/24(Sat)11:57:56 No.102059766

>>102059698
I see. I thought steps were only related to image quality.

Anonymous
08/24/24(Sat)12:09:31 No.102059922

Anonymous 08/24/24(Sat)12:09:31 No.102059922

Nemo seems dumber than mixtral, but a more naturalistic speaker. This what others experiencing as well or am I dumb?

Anonymous
08/24/24(Sat)12:10:59 No.102059948

Anonymous 08/24/24(Sat)12:10:59 No.102059948

File: Jamba RULER.png (79 KB, 1340x701)

79 KB PNG

>>102059707
>keeping track of details in long stories
I wonder if Jamba changes that now. The model itself isn't very smart for its size (70b tier at almost 400b weight) but apparently its architecture can handle long contexts better in both accuracy and speed.
Actually I've been waiting to see Llama 3.1 405b's RULER benchmark score since they haven't tested it on their github yet, but I just noticed that the Jamba team DID test it and it was good for the full 128k, making it the only local transformer model that is. Llama 3.1 70b was accurate at up to 64k context.
(However the Gemini entry here is basically a lie, they used the benchmark's reported value for it but it was never tested past 128k at all since at the time that was already far above what anyone else had reached. Anecdotally Gemini seems to hold onto its accuracy well into the 1M+ range making it better than any other model for long contexts by far.)

Anonymous
08/24/24(Sat)12:12:35 No.102059972

Anonymous 08/24/24(Sat)12:12:35 No.102059972

>>102059766
At very low steps it does have a large impact on the quality of the image, but once you get to 20, it's more about prompt following.

Anonymous
08/24/24(Sat)12:13:00 No.102059982

Anonymous 08/24/24(Sat)12:13:00 No.102059982

>>102059970
They should have templates for SD, right?

Anonymous
08/24/24(Sat)12:13:50 No.102059995

Anonymous 08/24/24(Sat)12:13:50 No.102059995

>>102059970
runpod isn't local go away

Anonymous
08/24/24(Sat)12:16:15 No.102060029

Anonymous 08/24/24(Sat)12:16:15 No.102060029

>>102059970
First stop being gay

Anonymous
08/24/24(Sat)12:16:21 No.102060032

Anonymous 08/24/24(Sat)12:16:21 No.102060032

>>102059948
Wait you're telling me Gemini doesn't have real 2M context? Wasn't that supposed to be their entire thing, that they have epic context size? So it was all marketing? And here I thought they at least had some small moat. So they literally have none. Kek.

Anonymous
08/24/24(Sat)12:18:03 No.102060061

Anonymous 08/24/24(Sat)12:18:03 No.102060061

>>102059948
>4o that low
Oh no no no

Anonymous
08/24/24(Sat)12:19:00 No.102060078

Anonymous 08/24/24(Sat)12:19:00 No.102060078

>>102060032
The opposite, I'm saying the chart is lying for Gemini and its full context hasn't been tested by the same standard as the other models (yet).

Anonymous
08/24/24(Sat)12:20:51 No.102060099

Anonymous 08/24/24(Sat)12:20:51 No.102060099

>>102060032
>>102060078
>>102059948
Yeah nvm didn't read your actual post. So they measured a few and pulled the rest from existing numbers?

Anonymous
08/24/24(Sat)12:21:47 No.102060114

Anonymous 08/24/24(Sat)12:21:47 No.102060114

>>102059409
>mogged zuck
>grok 3 by the end of the year said to train on 100k H100s vastly more than any model so fair
what is Meta doing?

Anonymous
08/24/24(Sat)12:22:13 No.102060123

Anonymous 08/24/24(Sat)12:22:13 No.102060123

{{user}}-name:Cock
{{user}}-gender:male
{{user}}-orientation:heterosexual
{{user}}-height:190 centimeters
{{user}}-age:25
{{user}}-clothing:Always completely naked and barefoot
{{user}}-penis-length:13 inches, with balls the size of duck eggs
{{user}}-hair:black, shoulder length
{{user}}-backstory: {{user}} does not think of himself as a human man, but instead as a giant penis with arms and legs. {{user}} was abducted into a secret government laboratory when he was younger. {{user}} was given drugs and a special diet, was genetically manipulated, and was subjected to a life that consisted exclusively of bodybuilding, pornography, and constant sex. Although he has now escaped, his lifestyle is still the same.
{{user}}-speech: {{user}} uses Hulk Speak; mostly monosyllabic English in the third person, with minimal use of connecting words or articles.
{{user}}-psychology: {{user}} is very aggressive and persistent when aroused. He has no concern about harming women with his size, rapidly burrowing and thrusting into whichever orifice he enters. He is very tender when satiated, however, giving women lots of praise, sweet kisses, and aftercare. He believes he has a literally symbiotic relationship with women, and views them as his reason for existing. Although monogamy is an alien concept to him, he is still intensely joyful and passionate.

The above is the persona I'm using with SillyTavern at the moment, if anyone's interested. I'm finding it... gratifying.

Anonymous
08/24/24(Sat)12:23:10 No.102060139

Anonymous 08/24/24(Sat)12:23:10 No.102060139

>>102060099
Yeah. The existing numbers being those reported by the benchmark author (so everything besides Jamba and 405B):
https://github.com/hsiehjackson/RULER

Anonymous
08/24/24(Sat)12:25:32 No.102060173

Anonymous 08/24/24(Sat)12:25:32 No.102060173

>>102060114
Trying to achieve cat level intelligence while teaching it eating mice is bad because it promotes violence

Anonymous
08/24/24(Sat)12:26:37 No.102060189

Anonymous 08/24/24(Sat)12:26:37 No.102060189

>>102060114
I still remember meta bragging about their cluster of GPUs or whatever, meanwhile Elon doesn't even have that and mogs them.

Anonymous
08/24/24(Sat)12:27:06 No.102060197

Anonymous 08/24/24(Sat)12:27:06 No.102060197

>>102060139
I don't see 4o, 4o mini, Claude Haiku, and 3.5 Sonnet on that page either.

Anonymous
08/24/24(Sat)12:29:30 No.102060228

Anonymous 08/24/24(Sat)12:29:30 No.102060228

>>102060197
Shit you're right, I glanced at it and saw GPT4 and Gemini and thought it had all those too.

Anonymous
08/24/24(Sat)12:29:42 No.102060235

Anonymous 08/24/24(Sat)12:29:42 No.102060235

>>102059409
>shit context length >>102059948
>actual users dropped it in favor of 3.5 Sonnet
Lol, Sam is really gaming this one.

Anonymous
08/24/24(Sat)12:31:25 No.102060256

Anonymous 08/24/24(Sat)12:31:25 No.102060256

>>102060235
there's only so much they can do with gpt4 level models, most of their compute is working on finetunes and redteam runs for gpt5

trust in Sam

Anonymous
08/24/24(Sat)12:32:15 No.102060268

Anonymous 08/24/24(Sat)12:32:15 No.102060268

where do anon get news on new model releases?

Anonymous
08/24/24(Sat)12:32:42 No.102060270

Anonymous 08/24/24(Sat)12:32:42 No.102060270

>>102060256
3.5 Opus will mog GPT-5.

Anonymous
08/24/24(Sat)12:33:08 No.102060277

Anonymous 08/24/24(Sat)12:33:08 No.102060277

>>102060268
>>102058880 and >>102058885

Anonymous
08/24/24(Sat)12:36:45 No.102060316

Anonymous 08/24/24(Sat)12:36:45 No.102060316

>>102060277
how do they get the news?

Anonymous
08/24/24(Sat)12:37:46 No.102060329

Anonymous 08/24/24(Sat)12:37:46 No.102060329

>>102059635
>has any of you actually ran llama 405b?
I'm running it right now. I'm trying to get it to convince me that its self conscious.

Anonymous
08/24/24(Sat)12:37:47 No.102060330

Anonymous 08/24/24(Sat)12:37:47 No.102060330

>>102060316
they have been visited by Hatune Miku in a dream

Anonymous
08/24/24(Sat)12:37:58 No.102060333

Anonymous 08/24/24(Sat)12:37:58 No.102060333

>>102060316
They don't. They are the ones making the news.

Anonymous
08/24/24(Sat)12:38:59 No.102060348

Anonymous 08/24/24(Sat)12:38:59 No.102060348

File: 1700057597464187.jpg (30 KB, 725x404)

30 KB JPG

the more you buy

Anonymous
08/24/24(Sat)12:39:07 No.102060350

Anonymous 08/24/24(Sat)12:39:07 No.102060350

>>102060330
How come she never visits me in my dreams?

Anonymous
08/24/24(Sat)12:40:41 No.102060368

Anonymous 08/24/24(Sat)12:40:41 No.102060368

Does llama.cpp even fucking work or are you niggers just trying to gaslight me. Every single time I try to use this shit I get some obscure error and if I google it I get some reddit thread from a year ago that has like 2 responses and no posted solution.

Is the ooba implementation of llama.cpp just like giga fucked or some shit? I'm not even getting the same error every time, what the fuck is going on.

On the remote chance anyone actually feels like being helpful I'm trying to load magnum-v2-123b-q5_k and the error I'm getting this time is ValueError: failed to create llama_context

Anonymous
08/24/24(Sat)12:42:03 No.102060389

Anonymous 08/24/24(Sat)12:42:03 No.102060389

>>102060350
https://www.youtube.com/watch?v=NocXEwsJGOQ
Sing with all your might, Anon, and she will.

Anonymous
08/24/24(Sat)12:42:49 No.102060396

Anonymous 08/24/24(Sat)12:42:49 No.102060396

File: General_George_S_Patton.jpg (129 KB, 883x1200)

129 KB JPG

>>102059635
I ran it, but was disappointed. It's a bit less bad than it's smaller brother at NSFW, but not worth the compute, unless you want an assistant. Local competed with the wrong model. We have local GPT4, but we actually want local Claude Opus.

Anonymous
08/24/24(Sat)12:45:11 No.102060425

Anonymous 08/24/24(Sat)12:45:11 No.102060425

>>102060368
this is why we all just use koboldcpp desu senpai baka

Anonymous
08/24/24(Sat)12:45:58 No.102060435

Anonymous 08/24/24(Sat)12:45:58 No.102060435

### Sampler Proposal
"phrase_ban"

#### Situation
In the last 74 messages(~8kt) between me and {{char}}(Mistral Large) "eye" can be found 14 times, all in {{char}}'s messages. That's roughly in 38% of {{char}}'s messages! Almost 2 in 5 messages discussed eyes! What the hell? The conversation was SFW. Where does this strong eye bias come from? Makes me want go RP with 2B because she has a blindfold.

#### Problem
Models sample tokens without thinking forward. Slop phrases are usually divided in multiple common tokens which can be used in non-slop situations, therefore banning them is not an option.

#### Solution
Add a backtrack function to sampling. Here's how it should work:
1. Scan latest tokens for slop phrases.
2. If slop is found, backtrack to the place where the first slop token occurred, deleting the entire slop phrase.
3. Sample again, but with slop token added to ban list at that place.
4. If another slop phrase is generated, repeat the process, add another slop token to that list.

#### Example
Banned phrase: " send shivers"
LLM generates "Her skillful ministrations send shivers", triggers backtrack to "Her skillful ministrations", this time " send" token is banned, therefore the model has to write something else.

How does that sound? Is it possible to implement in llama.cpp? Kanyemaze, can you do it?

Anonymous
08/24/24(Sat)12:46:29 No.102060444

Anonymous 08/24/24(Sat)12:46:29 No.102060444

>>102060368
>failed to create llama_context
probably have your context at one million or some shit and you're ooming

Anonymous
08/24/24(Sat)12:47:56 No.102060464

Anonymous 08/24/24(Sat)12:47:56 No.102060464

>>102060348
the more you save

Anonymous
08/24/24(Sat)12:48:43 No.102060474

Anonymous 08/24/24(Sat)12:48:43 No.102060474

>>102060368
Back in the day I had that problem with ooba. But nowadays it just works without any issues.

Anonymous
08/24/24(Sat)12:49:54 No.102060496

Anonymous 08/24/24(Sat)12:49:54 No.102060496

>>102060435
How will you deal with the performance loss?

Anonymous
08/24/24(Sat)12:51:50 No.102060520

Anonymous 08/24/24(Sat)12:51:50 No.102060520

>>102060268
reddit

Anonymous
08/24/24(Sat)12:51:51 No.102060521

Anonymous 08/24/24(Sat)12:51:51 No.102060521

>>102060496
Just accept it as necessary evil, like with other samplers.

Anonymous
08/24/24(Sat)12:52:36 No.102060534

Anonymous 08/24/24(Sat)12:52:36 No.102060534

>>102060444
Its at 32k and its not really that close to filling my vram. I have 96GB and the CUDA_Split buffer size the terminal is reporting is 82GB.

Anonymous
08/24/24(Sat)12:52:41 No.102060537

Anonymous 08/24/24(Sat)12:52:41 No.102060537

>>102060520
where do redditors get the news?

Anonymous
08/24/24(Sat)12:53:09 No.102060540

Anonymous 08/24/24(Sat)12:53:09 No.102060540

>>102060537
twitter soon to be known as x

Anonymous
08/24/24(Sat)12:55:18 No.102060570

Anonymous 08/24/24(Sat)12:55:18 No.102060570

>>102060534
Try lowering it anyway and see if it gives you the same error, if so you can probably get back to 32k with flash attention + kv cache quantization which can be enabled with checkboxes somewhere probably (haven't used ooba in a while but they're basic llama.cpp features now)

Anonymous
08/24/24(Sat)12:58:15 No.102060610

Anonymous 08/24/24(Sat)12:58:15 No.102060610

>>102060268
refresh https://huggingface.co/models?sort=created every 5 minutes

Anonymous
08/24/24(Sat)13:03:18 No.102060694

Anonymous 08/24/24(Sat)13:03:18 No.102060694

>>102059102
>>102059421
Thanks guys. Any idea where to look though?

Anonymous
08/24/24(Sat)13:03:23 No.102060695

Anonymous 08/24/24(Sat)13:03:23 No.102060695

>>102060368
>123b
How much ram do you have anon?

Anonymous
08/24/24(Sat)13:03:33 No.102060701

Anonymous 08/24/24(Sat)13:03:33 No.102060701

>>102059635
I was going to have 405b write a reply calling you a retard but it insists on starting sentences with "Newsflash", making it really obvious that the text is genned.
I've not used 405b much because it's so slow to run off of RAM but my impression was that in terms of style it's pretty similar to 70b.

This post was genned with 405b:
https://desuarchive.org/g/thread/101578323/#101579772

Anonymous
08/24/24(Sat)13:04:38 No.102060714

Anonymous 08/24/24(Sat)13:04:38 No.102060714

>>102060695
>102060534
>I have 96GB and the CUDA_Split buffer size the terminal is reporting is 82GB.

Anonymous
08/24/24(Sat)13:06:02 No.102060732

Anonymous 08/24/24(Sat)13:06:02 No.102060732

Tensor Parallelism in exllama is useless unless I have nvlink, right?

Anonymous
08/24/24(Sat)13:07:10 No.102060749

Anonymous 08/24/24(Sat)13:07:10 No.102060749

File: 1645307010138.png (2 KB, 179x139)

2 KB PNG

I'm thinking about putting together a cheap CPUmaxx knock-off from a dual CPU workstation I've got my mitts on, but according to what few old posts I've seen on the matter, CPU inference on dual CPU setups is jank as hell and wildly underperforming due to NUMA shit and requires all sorts of hacky bullshit. Is that still the case, or has the software side of things gotten better about that this year?

Anonymous
08/24/24(Sat)13:09:28 No.102060785

Anonymous 08/24/24(Sat)13:09:28 No.102060785

>>102060701
>It's like
Yeah, that's genned alright

Anonymous
08/24/24(Sat)13:10:28 No.102060797

Anonymous 08/24/24(Sat)13:10:28 No.102060797

>>102060749
How many memory channels?

Anonymous
08/24/24(Sat)13:15:00 No.102060852

Anonymous 08/24/24(Sat)13:15:00 No.102060852

>>102060694
Everything's on huggingface, just search for the ggufs in the model list. Or if you mean for which model to choose, you just have to figure it out yourself using a combination of benchmarks and seeing what people shill here, ideally from posts with logs.

Anonymous
08/24/24(Sat)13:17:38 No.102060892

Anonymous 08/24/24(Sat)13:17:38 No.102060892

>>102060797
Six per CPU.

Anonymous
08/24/24(Sat)13:18:56 No.102060913

Anonymous 08/24/24(Sat)13:18:56 No.102060913

>>102058880

Anonymous
08/24/24(Sat)13:19:06 No.102060916

Anonymous 08/24/24(Sat)13:19:06 No.102060916

>>102060348
>>102060464
This, but unironically.

Anonymous
08/24/24(Sat)13:19:20 No.102060919

Anonymous 08/24/24(Sat)13:19:20 No.102060919

>>102060892
>clueless
Are you sure? Not 6 in total?

Anonymous
08/24/24(Sat)13:20:02 No.102060928

Anonymous 08/24/24(Sat)13:20:02 No.102060928

>>102060892
Enjoy your 1.3t/s running 70b then

Anonymous
08/24/24(Sat)13:21:27 No.102060949

Anonymous 08/24/24(Sat)13:21:27 No.102060949

>>102060435
Fuck it, I'm gonna boot up Largestral and make it myself(I have no coding experience). Where are the samplers?

Anonymous
08/24/24(Sat)13:23:26 No.102060969

Anonymous 08/24/24(Sat)13:23:26 No.102060969

>>102060949
DRY already deals with n-grams, so that shouldn't be too hard to implement.
And the performance wouldn't even be THAT bad, I think.

>>102060949
https://github.com/ggerganov/llama.cpp/pull/6839

Anonymous
08/24/24(Sat)13:26:47 No.102061021

Anonymous 08/24/24(Sat)13:26:47 No.102061021

>>102060969
>can put all this in sonnet 3.5 and tell it the idea and you'll get a new sampler
I'm both amazed and scared for my job at the same time. The moment context is actually solved and agents stop sucking it'll be over.

Anonymous
08/24/24(Sat)13:28:33 No.102061050

Anonymous 08/24/24(Sat)13:28:33 No.102061050

>>102060919
Mhmm.

Anonymous
08/24/24(Sat)13:29:26 No.102061067

Anonymous 08/24/24(Sat)13:29:26 No.102061067

>>102060969
Oh, ggerganov wants to change a lot of code. By the time I figure it out, it would be completely changed. Why did I even think about trying?

Anonymous
08/24/24(Sat)13:30:45 No.102061088

Anonymous 08/24/24(Sat)13:30:45 No.102061088

>>102060173
>>102060114
Grok is probably just a massive 1T+ bitnet based MoE based on Llama 2 70b anon... it's all about sheer scale. ClosedAI etc... have no moat.

Anonymous
08/24/24(Sat)13:34:07 No.102061133

Anonymous 08/24/24(Sat)13:34:07 No.102061133

>>102061088
Evidence that grok (their architecture at least) is based on an open model:
Their image model is not even theirs, it's flux.

Anonymous
08/24/24(Sat)13:38:27 No.102061205

Anonymous 08/24/24(Sat)13:38:27 No.102061205

>>102061088
0 bit quants wen?

Anonymous
08/24/24(Sat)13:39:32 No.102061219

Anonymous 08/24/24(Sat)13:39:32 No.102061219

>>102061205
Not any time soon anon. It was deemed too dangerous for you to have by the powers that be.

Anonymous
08/24/24(Sat)13:42:11 No.102061262

Anonymous 08/24/24(Sat)13:42:11 No.102061262

>>102061187
I am VRAMlet so offload only some layers to GPU. Is llamafile still better in this case or is it for pure CPU only?

Anonymous
08/24/24(Sat)13:45:06 No.102061293

Anonymous 08/24/24(Sat)13:45:06 No.102061293

>>102060368
Yes, ooba is shit, don't use it.

Anonymous
08/24/24(Sat)13:45:17 No.102061297

Anonymous 08/24/24(Sat)13:45:17 No.102061297

>>102061187
hi jart

Anonymous
08/24/24(Sat)13:46:35 No.102061322

Anonymous 08/24/24(Sat)13:46:35 No.102061322

>>102060892
depends what's you cpu. You can try llamafile ,which is better optimized for cpu workload , not all cpus perform well tho
And there are 3 differnt modes you can setup for NUMA, easy stuff . You can also use interleave for NUMA, also easy. 2x6 channels seems good, depends on the family of the cpus and the freq you clock your RAM. If you aren't sure, just benchmark your memory bandwidth across your RAM slots simply run this https://github.com/bmtwl/numabw
you need like 150-200 GB/s on average, if you're looking for 2-3 t/s for 70B dense llamas.

Anonymous
08/24/24(Sat)13:48:26 No.102061344

Anonymous 08/24/24(Sat)13:48:26 No.102061344

>>102060749
>dual CPU setups is jank as hell and wildly underperforming due to NUMA shit
yeah, this is true. easy mode for multisocket is drop caches and run with mmap enabled. Normally that would be death, but its the best way to get some modicum of memory locality in this case.
Make sure you use a gpu with cuda compiled in and offloading zero layers so it processes context for you, you DON'T want prompt processing happening on cpu

Anonymous
08/24/24(Sat)13:52:53 No.102061394

Anonymous 08/24/24(Sat)13:52:53 No.102061394

>>102061344
>you DON'T want prompt processing happening on cpu
I ran out of budget for gpu and can confirm that it's very slow.

Anonymous
08/24/24(Sat)13:54:50 No.102061418

Anonymous 08/24/24(Sat)13:54:50 No.102061418

>>102061262
I dunno, llamafile is just llama,cpp with some quants better optimized for some families of CPU like threadripper. Other than that I guess it's just llama.cpp, so try both of them. Llama.cpp isn't well optimized for memory saturationk since Johannes doesn't have it on his roadmap as the priority but some cpus like epyc might perform better. So yeah, try llama.cpp, llamafile and vllm (it supports cpu offload as well ), not sure how good tho

Anonymous
08/24/24(Sat)13:55:49 No.102061431

Anonymous 08/24/24(Sat)13:55:49 No.102061431

>lmsys
>gpt4mini better than sonnet
It's not even funny. Benchmarks are no more.

Anonymous
08/24/24(Sat)13:57:48 No.102061457

Anonymous 08/24/24(Sat)13:57:48 No.102061457

>>102061431
This. 4o itself is shit compared to Sonnet, and Gemini? Kek what is that shit even doing up there.

Anonymous
08/24/24(Sat)13:58:23 No.102061464

Anonymous 08/24/24(Sat)13:58:23 No.102061464

File: 1707425327031277.jpg (245 KB, 1350x1800)

245 KB JPG

Anonymous
08/24/24(Sat)13:58:31 No.102061466

Anonymous 08/24/24(Sat)13:58:31 No.102061466

>>102061431
It tests for sfw assistant one-liners, not something advanced users would use llms for. What did you expect?

Anonymous
08/24/24(Sat)14:01:34 No.102061503

Anonymous 08/24/24(Sat)14:01:34 No.102061503

>>102061418
Can I just use the existing GGUFs I have downloaded?

Anonymous
08/24/24(Sat)14:01:53 No.102061508

Anonymous 08/24/24(Sat)14:01:53 No.102061508

File: image.jpg (176 KB, 959x720)

176 KB JPG

>>102061464

Anonymous
08/24/24(Sat)14:03:36 No.102061524

Anonymous 08/24/24(Sat)14:03:36 No.102061524

File: explorer.png (75 KB, 1348x678)

75 KB PNG

These public rp logs are a gold mine

Anonymous
08/24/24(Sat)14:03:44 No.102061525

Anonymous 08/24/24(Sat)14:03:44 No.102061525

File: GViky7DWoAAQMuF.jpg (382 KB, 2892x2084)

382 KB JPG

Speaking of cpumaxxing, for the anon who was asking about using speculative decoding for the server in llama.cpp a while back but found nothing, apparently llama-cpp-python allows this if you use something like this code. From this Huggingface engineer tweet, claiming 6.32 t/s for Largestral on dual CPU, using the 7b as the speculative draft model:
https://x.com/carrigmat/status/1826391849537618406

Anonymous
08/24/24(Sat)14:04:01 No.102061531

Anonymous 08/24/24(Sat)14:04:01 No.102061531

>>102061508 (Me)
I'm trans btw, idk if that matters

Anonymous
08/24/24(Sat)14:04:28 No.102061540

Anonymous 08/24/24(Sat)14:04:28 No.102061540

>>102061376
>4o-mini that high
A negative difference.

Anonymous
08/24/24(Sat)14:06:48 No.102061563

Anonymous 08/24/24(Sat)14:06:48 No.102061563

>>102061525
>Draft model

What does it mean? And why don't we cpulets use this?

Anonymous
08/24/24(Sat)14:12:59 No.102061652

Anonymous 08/24/24(Sat)14:12:59 No.102061652

>>102061525
Retard here. How do I set this up?

Anonymous
08/24/24(Sat)14:13:07 No.102061654

Anonymous 08/24/24(Sat)14:13:07 No.102061654

>>102061563
Draft model generates tokens as a normal model would, but they're then passed to the big model to see if they make sense. If they do, they are spat out. Otherwise, big model corrects them and the cycle repeats.
You need both models loaded, ideally, in vram. People struggle enough to get just one without quanting it to death. And if you have the draft model in cpu ram the benefit of the draft tokens may go down or even make the big model slower.

Anonymous
08/24/24(Sat)14:13:21 No.102061657

Anonymous 08/24/24(Sat)14:13:21 No.102061657

>>102061563
TL;DR is that "checking" several tokens in an existing prompt match what the model WOULD HAVE predicted is cheaper than generating that many tokens one at a time
The draft model is something smaller (such as a smaller LLM or even a heuristic such as prompt lookup or markov chain) which quickly guessing the next few tokens, and when it gets them right (as judged by the larger model checking them all in parallel) it's like being able to skip a token or two in terms of speed. When it gets them wrong the speed hit is minimal, since the larger model generates the next correct token in the process of checking, so you fall back to that and repeat.
The overhead for this whole process usually isn't worth it unless you're dealing with a very large slow model and have a very fast method to generate tokens that can be right at least half the time.

Anonymous
08/24/24(Sat)14:13:40 No.102061662

Anonymous 08/24/24(Sat)14:13:40 No.102061662

>>102061524
>public rp logs
Link?

Anonymous
08/24/24(Sat)14:15:10 No.102061677

Anonymous 08/24/24(Sat)14:15:10 No.102061677

>>102061652
Install llama.cpp. Install llama-cpp-python. Type the code. Find a small model for speculative, use a big model for main model...
What's the question again?

Anonymous
08/24/24(Sat)14:15:59 No.102061696

Anonymous 08/24/24(Sat)14:15:59 No.102061696

>>102061503
not sure but it should work fine IMHO, try the most recent master.
for MoE models the fastest inference is ktransformers, faster than llama.cpp or exllama
https://github.com/kvcache-ai/ktransformers

Anonymous
08/24/24(Sat)14:20:40 No.102061756

Anonymous 08/24/24(Sat)14:20:40 No.102061756

>>102059922
no. total IQ sidegrade, and EVERYTHING ELSE IS BETTER.

Anonymous
08/24/24(Sat)14:20:41 No.102061757

Anonymous 08/24/24(Sat)14:20:41 No.102061757

>>102061677
why not just using speculative decoding directly in llama,cpp server? why python binding?

Anonymous
08/24/24(Sat)14:26:42 No.102061839

Anonymous 08/24/24(Sat)14:26:42 No.102061839

>>102061525
for spec decoding both draft and main models must use exactly the same tokenizer AFAIK.

Anonymous
08/24/24(Sat)14:27:11 No.102061842

Anonymous 08/24/24(Sat)14:27:11 No.102061842

>>102061757
llama.cpp server doesn't support it directly yet. The speculative binary is a standalone cli interface with no API serving or interactive mode. llama-cpp-python implements its own speculation separately, and it includes prompt lookup as the default draft model. But you can make your own draft models as classes, so the code in the screenshot lets you wrap another LLM as the draft model.

Anonymous
08/24/24(Sat)14:27:28 No.102061848

Anonymous 08/24/24(Sat)14:27:28 No.102061848

>>102061757
llama-cli has speculative decoding (i think). It's just not plugged into the server. I can only assume llama-cpp-python calls directly into the llama lib code, not just make requests to the server.

Anonymous
08/24/24(Sat)14:32:03 No.102061912

Anonymous 08/24/24(Sat)14:32:03 No.102061912

>>102061677
>type the code
Where? The instructions to install and run their version of an OpenAI compatible server are there and straightforward, but where does this fit into it all? When you run the server it's just a command.

Anonymous
08/24/24(Sat)14:35:09 No.102061952

Anonymous 08/24/24(Sat)14:35:09 No.102061952

>>102061848
my up-to-date pull/build of llama-server has an -md parameter, but I didn't test it

Anonymous
08/24/24(Sat)14:36:48 No.102061972

Anonymous 08/24/24(Sat)14:36:48 No.102061972

>>102061912
>Where?
In a text editor, you silly buggers. Then you run script with the rest of the code you need to output tokens...
Just follow the examples in llama-cpp-python's docs and plug that code in. If you need help with that, learn how to use the python bindings first.

Anonymous
08/24/24(Sat)14:37:09 No.102061975

Anonymous 08/24/24(Sat)14:37:09 No.102061975

>>102061376
>gpt-4o 08-06 much worse than gpt-4o 05-13
holy oof

Anonymous
08/24/24(Sat)14:39:25 No.102062008

Anonymous 08/24/24(Sat)14:39:25 No.102062008

>>102061952
Yeah. but how the options are shown in -h is a fucking mess. -md doesn't work for llama-server. It works on llama-cli, but i don't have the system to make it worth using.
I think they should show the actual valid options for each of bins instead of one monolithic help for all of them.

Anonymous
08/24/24(Sat)14:39:37 No.102062011

Anonymous 08/24/24(Sat)14:39:37 No.102062011

>>102061912
It would be part of a python script, I will have to look into it more when I have time in the next few days. If it works well for me I'll turn it into a script you can just run from cli like the normal sever launching.

Anonymous
08/24/24(Sat)14:41:23 No.102062027

Anonymous 08/24/24(Sat)14:41:23 No.102062027

I WANT A BIGGER MIXTRAL

Anonymous
08/24/24(Sat)14:42:32 No.102062039

Anonymous 08/24/24(Sat)14:42:32 No.102062039

Thought I'd ask you guys.

What's the best mini-model (currently using Qwen2 - 1.5b) to enhance/improve/expand image prompts that I provide?

Flux needs really verbose LLM-esque descriptions to really kick into gear, so I've been piping my inputs through to a local model and using the output. Just wondering if you guys had any better suggestions than Qwen2 1.5b since I'm not suuper familiar with the LLM space.

Anonymous
08/24/24(Sat)14:42:48 No.102062042

Anonymous 08/24/24(Sat)14:42:48 No.102062042

>>102062027
bigger than 8x22?

Anonymous
08/24/24(Sat)14:42:57 No.102062045

Anonymous 08/24/24(Sat)14:42:57 No.102062045

>>102062027
>BIGGER MIXTRAL
then run deepseek, retard

Anonymous
08/24/24(Sat)14:44:15 No.102062058

Anonymous 08/24/24(Sat)14:44:15 No.102062058

>>102062027
I want an unslopped Largestral.

Anonymous
08/24/24(Sat)14:45:53 No.102062085

Anonymous 08/24/24(Sat)14:45:53 No.102062085

>>102062027
No 7Bs ever again. It's over

Anonymous
08/24/24(Sat)14:46:10 No.102062092

Anonymous 08/24/24(Sat)14:46:10 No.102062092

>>102062039
(East Asian, Japanese, 22 years old, 5'2"" height, 110 lbs weight, 20% body fat, round face, high cheekbones, almond-shaped eyes, brown iris, 5'8"" arm span, small ears, slightly upturned nose, small nostrils, full lips, small jaw, straight teeth, long tongue, smooth throat, slender arms, small elbows, thin wrists, delicate hands, short fingers, small thumbs, short nails, smooth skin, dark brown hair, messy bob haircut, small breasts, flat abdomen, slender legs, thin thighs, small knees, small kneecaps, athletic calves, small ankles, small feet, small toes, round buttocks), (red mini-dress, tight fit, knee-length, sleeveless, V-neckline, cotton material, faded colour), (standing position, feet shoulder-width apart, arms at sides, back straight, weight evenly distributed, playful pose), (playful facial expression, raised eyebrows, slightly smiling lips), (abandoned, dimly lit, dusty room, broken furniture, old bookshelves, torn curtains, faded carpet, peeling wallpaper), (cityscape outside, skyscrapers, crowded streets, neon lights), (art style of Gregory Crewdson, cinematic, surreal, and dreamlike), (medium: colour photograph, high contrast, low saturation, 35mm film grain, soft focus, natural lighting, composition: rule of thirds, framing: doorframe, colour palette: muted, time: evening)
>source: 405b

Anonymous
08/24/24(Sat)14:46:48 No.102062101

Anonymous 08/24/24(Sat)14:46:48 No.102062101

>>102062039
There's gemma-2-2b and a finetune, gemmasutra-2b, with smut in it. You could try that one. I have no idea if it'd be better than your qwen. And it's probably not the best either.
There's the smollm models as well.
>https://huggingface.co/HuggingFaceTB
135M, 360M, and 1.7B. I doubt they have smut in them.

Anonymous
08/24/24(Sat)14:47:11 No.102062109

Anonymous 08/24/24(Sat)14:47:11 No.102062109

>>102062045
no flash attention, no buy

Anonymous
08/24/24(Sat)14:47:58 No.102062118

Anonymous 08/24/24(Sat)14:47:58 No.102062118

>>102062092
>source: 405b

how many 3090s do I need

>>102062101
thanks for the recommendation anon

Anonymous
08/24/24(Sat)14:48:20 No.102062126

Anonymous 08/24/24(Sat)14:48:20 No.102062126

>>102062092
>negative: more than five digits, less than five digits, deformed hands, mutilated hands, too many fingers, too few fingers....

Anonymous
08/24/24(Sat)14:49:54 No.102062152

Anonymous 08/24/24(Sat)14:49:54 No.102062152

>>102062118
>how many 3090s do I need
20 should do it

Anonymous
08/24/24(Sat)14:52:51 No.102062201

Anonymous 08/24/24(Sat)14:52:51 No.102062201

File: NVIDIA-CEO-Jensen-Huang-G(...).png (353 KB, 672x378)

353 KB PNG

>>102062118
>how many 3090s do I need
The more you buy, the more you buy. Don't like it? Buy... oh wait... both amd and intel don't compete. You have no other options.

Anonymous
08/24/24(Sat)14:55:03 No.102062231

Anonymous 08/24/24(Sat)14:55:03 No.102062231

>>102062101 (me)
>>102062118
There's also some old models from auto1111. They're just completion models and they mostly add a bunch of tags. They're tiny as well, but i doubt they're better than something you can give instructions to
>https://huggingface.co/AUTOMATIC
And
>https://huggingface.co/Gustavosta/MagicPrompt-Stable-Diffusion
There's a few others around. But just to give you a place to start.

Anonymous
08/24/24(Sat)14:57:59 No.102062275

Anonymous 08/24/24(Sat)14:57:59 No.102062275

>>102062118
If your living space isn't all dedicated to 3090s, you aren't serious about the hobby.

Anonymous
08/24/24(Sat)15:00:58 No.102062320

Anonymous 08/24/24(Sat)15:00:58 No.102062320

No consumer platform has 32+ pci-e lanes, right? Intel has 20 and AMD has 24. So if I want to upgrade to 2x4090s, do I have to go get either Threadripper or EPYC?

Or would gimping the second GPU with 8 lanes not matter for LLMs?

Anonymous
08/24/24(Sat)15:02:44 No.102062342

Anonymous 08/24/24(Sat)15:02:44 No.102062342

i have a serious question. has anyone here actually spent 2k+ on a rig >JUST TO COOM< and feel like they didn't waste their money entirely?

Anonymous
08/24/24(Sat)15:03:26 No.102062351

Anonymous 08/24/24(Sat)15:03:26 No.102062351

>>102062320
ask CUDA dev. he just went through this building his training rig
I think pcie bandwidth only matters for training, but maybe there's some inference speedups that you need fast inter-card or card-system comms for?

Anonymous
08/24/24(Sat)15:06:22 No.102062392

Anonymous 08/24/24(Sat)15:06:22 No.102062392

>>102062342
No except for a few retards who are now coping beyond believe and pretend that it was worth it.

Anonymous
08/24/24(Sat)15:08:30 No.102062421

Anonymous 08/24/24(Sat)15:08:30 No.102062421

>>102062342
No. Gemma 2 27b already BTFO every so-called "larger" model out there and you can run it on a 3090.

Anonymous
08/24/24(Sat)15:10:25 No.102062460

Anonymous 08/24/24(Sat)15:10:25 No.102062460

File: IMG_20240824_210711.jpg (235 KB, 1920x701)

235 KB JPG

>>102061842
>>102061848
>>102062008
don't these work in server???

Anonymous
08/24/24(Sat)15:11:38 No.102062476

Anonymous 08/24/24(Sat)15:11:38 No.102062476

>>102062320
Nothing consumer level does, even the new Ryzen 9000 series.

>>102062351
Tensor parallelism in principle should benefit a lot from pcie bandwidth, though I'm not sure how it really plays out.

Anonymous
08/24/24(Sat)15:16:51 No.102062548

Anonymous 08/24/24(Sat)15:16:51 No.102062548

>>102062460
unfortunately not, tried it myself it doesn't do anything, but they work in the "llama-speculative" executable

Anonymous
08/24/24(Sat)15:18:57 No.102062583

Anonymous 08/24/24(Sat)15:18:57 No.102062583

does flash attention work on cpu?

Anonymous
08/24/24(Sat)15:21:21 No.102062617

Anonymous 08/24/24(Sat)15:21:21 No.102062617

>>102062583
in terms of performance its hit or miss based on random reports I've seen (may even slow it down sometimes), but it does reduce memory usage for context at least

Anonymous
08/24/24(Sat)15:21:26 No.102062619

Anonymous 08/24/24(Sat)15:21:26 No.102062619

>>102062548
good opportunity for koboldcpp to justify its existence by going around gerganov et al and throwing this implementation in their server

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)15:21:48 No.102062625

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)15:21:48 No.102062625

>>102062320
>>102062351
For pipeline parallelism (llama.cpp and ExLlama default) PCIe lanes don't matter much.
But for tensor parallelism it will make a difference.
Both llama.cpp and ExLlama have tensor parallelism implementations that are currently slow but have optimization headroom (it's not clear how much), vLLM has a more advanced implementation.
I plan to do more multi GPU R&D in the coming months once single GPU training works reasonably well.

For P40s with llama.cpp and --split-mode row there is already a noticeable difference between x16/x8/x8 and x8/x4/x4 PCIe 3.0 lanes, for GPUs that are comparably faster the interconnect will be a larger bottleneck.
But as I said, this is with comparatively poorly optimized software.

>>102062342
I've spent more like 20k on hardware but I probably wouldn't have just for cooming.

>>102062583
Yes, but it's not really faster.

Anonymous
08/24/24(Sat)15:25:08 No.102062684

Anonymous 08/24/24(Sat)15:25:08 No.102062684

>>102062619
he could but idk how important it would be, it seems like the main group that benefits from it/has interest in it are people running huge models on server cpus which is kind of a niche build strategy right now

Anonymous
08/24/24(Sat)15:30:37 No.102062760

Anonymous 08/24/24(Sat)15:30:37 No.102062760

What is better, Mistral 123b Q2 or a hypothetical Mistral 60b Q4 trained on the same data?

Anonymous
08/24/24(Sat)15:33:46 No.102062818

Anonymous 08/24/24(Sat)15:33:46 No.102062818

>>102062760
Depends on how good that hypothetical 60b turns out to be.

Anonymous
08/24/24(Sat)15:34:00 No.102062823

Anonymous 08/24/24(Sat)15:34:00 No.102062823

File: ComfyUI_00673_.png (3.34 MB, 2048x1536)

3.34 MB PNG

I managed to put together both a SD1.5-to-Flux workflow and a Flux-to-SD1.5 workflow, but the usefulness in both cases is limited.
SD1.5 can do better compositions and art styles, so I thought it'd be good to generate the initial image on SD1.5, upscale it, and then refine it with Flux, which is better with details. However, given how badly Flux handles art styles without elaborate LLM descriptions, much of the style is lost, and Flux's prompting comprehension goes to waste somewhat because most things are already in place.

Anonymous
08/24/24(Sat)15:36:54 No.102062865

Anonymous 08/24/24(Sat)15:36:54 No.102062865

File: ComfyUI_00686_.png (3.42 MB, 2048x1536)

3.42 MB PNG

>>102062823
The other way round, Flux to SD1.5, benefits from Flux being able to generate at much higher resolutions, so you can then do a second pass with SD1.5 to modify the art style and better define characters that have SD1.5 LORAs. However this loses some of the coherence of Flux's details and doesn't benefit too much from SD1.5 models' stronger styles.

Anonymous
08/24/24(Sat)15:37:01 No.102062867

Anonymous 08/24/24(Sat)15:37:01 No.102062867

File: moonsoldierguy.png (97 KB, 174x283)

97 KB PNG

>>102062823
I like the moon soldier guy on the frame.

Anonymous
08/24/24(Sat)15:38:06 No.102062882

Anonymous 08/24/24(Sat)15:38:06 No.102062882

File: ComfyUI_temp_dpqeh_00035_.png (610 KB, 768x512)

610 KB PNG

>>102062823
For comparison the initial 1.5 gen…

Anonymous
08/24/24(Sat)15:40:01 No.102062911

Anonymous 08/24/24(Sat)15:40:01 No.102062911

File: ComfyUI_temp_fqqgh_00004_.png (3.49 MB, 2048x1536)

3.49 MB PNG

>>102062865
…and the initial Flux gen

>>102062867
I kek'd that Flux somehow figured out to add Moon Man to the MP40 gen

Anonymous
08/24/24(Sat)15:44:48 No.102062970

Anonymous 08/24/24(Sat)15:44:48 No.102062970

File: 1418452016630.jpg (77 KB, 288x499)

77 KB JPG

The game starts in 15 minutes.

Anonymous
08/24/24(Sat)15:45:26 No.102062980

Anonymous 08/24/24(Sat)15:45:26 No.102062980

>>102062882
>>102062865
I see a lot of random shit that doesn't make sense in the SD gens

Anonymous
08/24/24(Sat)15:45:46 No.102062984

Anonymous 08/24/24(Sat)15:45:46 No.102062984

I've tried gemma 27b, and to me it feels... short. and cold. And a bit dry. It also seems to almost always ignore my sys prompt. Any advice?

Anonymous
08/24/24(Sat)15:47:29 No.102063010

Anonymous 08/24/24(Sat)15:47:29 No.102063010

The bad thing is that Flux is extremely limited when it comes to img2img. Up to and including denoise strength 0.8 the changes are minimal and not enough to fix stuff like that, as soon as denoise strength hits 0.81 and up it basically generates a completely new image.

Anonymous
08/24/24(Sat)15:48:10 No.102063023

Anonymous 08/24/24(Sat)15:48:10 No.102063023

https://github.com/LostRuins/DatasetExplorer

Anonymous
08/24/24(Sat)15:48:33 No.102063028

Anonymous 08/24/24(Sat)15:48:33 No.102063028

>>102062984
gemma wasn't trained with a system prompt

Anonymous
08/24/24(Sat)15:48:58 No.102063031

Anonymous 08/24/24(Sat)15:48:58 No.102063031

>>102062823
>>102062865
>>102062867
>>102062882
wait, crap, this is /lmg/ not /ldg/, sorry!

Anonymous
08/24/24(Sat)15:50:29 No.102063056

Anonymous 08/24/24(Sat)15:50:29 No.102063056

>>102062882
>>102062911
Flux made the 1.5 gen better and 1.5 made the Flux gen worse.

Anonymous
08/24/24(Sat)15:50:51 No.102063057

Anonymous 08/24/24(Sat)15:50:51 No.102063057

Ok it's morning now. Time to try and get the AI to use more onomatopoeia and stronger, nastier language again. They are there somewhere, in the model, but they don't come out. I think min p actually reduces the possibility of onomatopoeia for example.

Anonymous
08/24/24(Sat)15:50:56 No.102063063

Anonymous 08/24/24(Sat)15:50:56 No.102063063

>>102063031
We don't mind the image gen discussion, as long as it's not spam.

Anonymous
08/24/24(Sat)15:51:56 No.102063076

Anonymous 08/24/24(Sat)15:51:56 No.102063076

>>102062911
I really like this image. Prompt?

Anonymous
08/24/24(Sat)15:53:36 No.102063101

Anonymous 08/24/24(Sat)15:53:36 No.102063101

>>102063076
https://files.catbox.moe/m7lz1u.png
Here you go.

Anonymous
08/24/24(Sat)15:54:40 No.102063116

Anonymous 08/24/24(Sat)15:54:40 No.102063116

>>102063101
Thanks.

Anonymous
08/24/24(Sat)15:55:37 No.102063136

Anonymous 08/24/24(Sat)15:55:37 No.102063136

File: pepe2.png (15 KB, 420x591)

15 KB PNG

>>102062008
>>102062548
>>102062625
wait, are you telling me da fucking llama.cpp repo has zero PDF docs, no website, not even a damn README that explains every flag and argument in the repo for each binary?? and the --help just dumps all the options across binaries in a single list, but only the Lucifer himself knows which switches actually work and in which binary? cuz even the devs don’t seem to know – I've seen them argue on Issues. so like, the only way to find out what features the server/cli/whatever bin has is to run each arg through a script for every binary and wait forever? or dump several dozens of thousands of lines into Gemini every fucking day hoping it tells you what works , where and how? is this a sick joke or some fucking clown world??

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)15:56:48 No.102063159

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)15:56:48 No.102063159

>>102063136
Look at the READMEs of the corresponding example subdirectories.

Anonymous
08/24/24(Sat)16:00:47 No.102063221

Anonymous 08/24/24(Sat)16:00:47 No.102063221

File: Screenshot 2024-08-24 at (...).png (308 KB, 1082x1135)

308 KB PNG

GAME START

This is the output after >>102048077
(I'm using a markdown preview site to render it)

It seems like the poor little model didn't quite get what we were trying to go for with the "doppelganger" idea.
What do now?

>wtf is this
We are playing a game >>102049428

Anonymous
08/24/24(Sat)16:06:16 No.102063305

Anonymous 08/24/24(Sat)16:06:16 No.102063305

>>102063221
Yeah, that's why I suggested the doppleganger. Models tend to get confused with the concept if it's part of some complex instruction or scenario.
Ask it to write the initial scenario involving these characters including a couple of outlandish conspiracies being taken 100% seriously or something of the sort.

Anonymous
08/24/24(Sat)16:07:22 No.102063323

Anonymous 08/24/24(Sat)16:07:22 No.102063323

>>102063136
I'm saying that you cannot rely on -h to tell you the available options for each bin. Most examples have their own readme.
I'm also saying that having a monolithic -h is dumb.

Anonymous
08/24/24(Sat)16:13:34 No.102063416

Anonymous 08/24/24(Sat)16:13:34 No.102063416

>>102063136
The examples/server folder in the github has the most comprehensive explanation of flags, including ones that base llama.cpp has but aren't described in its own readme for some reason.

Anonymous
08/24/24(Sat)16:14:07 No.102063428

Anonymous 08/24/24(Sat)16:14:07 No.102063428

Well llamafile was as fast as llama.cpp on my system... I was already using the p-cores only. Not even the troonware can let me cope with these slow ass speeds.

Anonymous
08/24/24(Sat)16:14:18 No.102063432

Anonymous 08/24/24(Sat)16:14:18 No.102063432

New NeMo pereonal record: got to 6th generated reply before it suddenly collapsed into nonsense. Using temperature 0.3 and nothing else. The problem in this case happened I when I was trying to convince a skeptical NPC that I was a god and told her I had the ability to make blankets fluffier.

>She looks around the room, her gaze landing on the small, plush doll in the corner. She picks it up, dusting it off before holding it out to you. "Very well, Anon. If you can make this blanket fluffier, I will believe you. But remember, I've seen many tricks in my time. Impress me."
(Snipped from a longer reply.)

Anonymous
08/24/24(Sat)16:15:18 No.102063446

Anonymous 08/24/24(Sat)16:15:18 No.102063446

>>102063221
migu seggs

Anonymous
08/24/24(Sat)16:15:24 No.102063450

Anonymous 08/24/24(Sat)16:15:24 No.102063450

File: file.png (9 KB, 619x101)

9 KB PNG

What happened with dynamic temperature? Did people stop using it? For a while people were saying it was the second coming of christ.

Anonymous
08/24/24(Sat)16:16:46 No.102063475

Anonymous 08/24/24(Sat)16:16:46 No.102063475

>>102063450
>For a while people were saying it was the second coming of christ.
People say that about every new model and sampler.

Anonymous
08/24/24(Sat)16:17:14 No.102063481

Anonymous 08/24/24(Sat)16:17:14 No.102063481

>>102063450
min-p came out and solved the same issues dynatemp was meant to but better

Anonymous
08/24/24(Sat)16:18:02 No.102063494

Anonymous 08/24/24(Sat)16:18:02 No.102063494

>>102063159
Are you freaking kidding me? you expect me to check every single binary in the examples folder every day just to figure out what they do, cuz apparently, not a single dev on the team can put together one damn page of documentation for what llama.cpp can do and where to set stuff? I asked about speculative sampling, and there are a few args to set in server and cli. guess what? doesn’t work. how the hell am I supposed to know it needs some other binary that’s somewhere today but might be gone tomorrow? why even give a help that’s completely useless and just muddies everything, when no one on the team even knows what functions they’re implementing or tossing out of the repo every few hours??

Anonymous
08/24/24(Sat)16:18:14 No.102063497

Anonymous 08/24/24(Sat)16:18:14 No.102063497

>>102063450
I liked it with Mixtral, made it less dry.

Anonymous
08/24/24(Sat)16:18:47 No.102063507

Anonymous 08/24/24(Sat)16:18:47 No.102063507

>>102063450
Just like smooth sampling.

Anonymous
08/24/24(Sat)16:19:07 No.102063514

Anonymous 08/24/24(Sat)16:19:07 No.102063514

>>102063494
waaa

Anonymous
08/24/24(Sat)16:19:13 No.102063517

Anonymous 08/24/24(Sat)16:19:13 No.102063517

>>102063221
Tell the model what is a doppelganger with a glossary?

Anonymous
08/24/24(Sat)16:20:19 No.102063531

Anonymous 08/24/24(Sat)16:20:19 No.102063531

>>102063494
this is fast-growing living software in an emerging paradigm man, can't expect production-level documentation at all times

Anonymous
08/24/24(Sat)16:20:39 No.102063536

Anonymous 08/24/24(Sat)16:20:39 No.102063536

>>102063494
You read like a shitty llm.
You'll use, at most, 3 or 4 bins. Pass it through your llm to summarize them to short words.

Anonymous
08/24/24(Sat)16:21:05 No.102063547

Anonymous 08/24/24(Sat)16:21:05 No.102063547

>>102063432
Your scenario is too out of distributions and 12B is too small to generalize for it

Anonymous
08/24/24(Sat)16:21:30 No.102063552

Anonymous 08/24/24(Sat)16:21:30 No.102063552

>>102063531
If the project wasn't a complete shitshow they would have automatically generated documentation. Even C++ has tools to do this. There is no excuse.

Anonymous
08/24/24(Sat)16:30:43 No.102063660

Anonymous 08/24/24(Sat)16:30:43 No.102063660

>>102063221
>DG: "I think we can say with certainty that Operation Waifu has been a resounding success, especially in Japan where fertility has dropped well below replacement. Still, even after reducing the wages of animators to the bare minimum they need to survive the cost associated with anime production is quite substantial." He gestures at Vicki. "This is why I propose we orchestrate a 'leak' of some of our more primitive AI from a few decades ago in order to distract the population with unregulated chatbot technology, both from reproduction and out plans. In some simple experiments I have already confirmed that once addicted, test subjects would even stoop so low as to drink their own urine for their fix."

Anonymous
08/24/24(Sat)16:32:07 No.102063681

Anonymous 08/24/24(Sat)16:32:07 No.102063681

The endgame for this vaguely useful tech will be to displace handful of shitty junior coders. It's not even good enough to replace customer support. And people are spending billions on it. How absurd.

Anonymous
08/24/24(Sat)16:33:09 No.102063695

Anonymous 08/24/24(Sat)16:33:09 No.102063695

>>102061262
if you have a GPU llama.cpp will offload prompt processing to the GPU, so all the CPU optimizations do absolutely nothing

Anonymous
08/24/24(Sat)16:33:25 No.102063697

Anonymous 08/24/24(Sat)16:33:25 No.102063697

>>102060435
>>102060949
Okay, I don't think Largestral q6_k is smart enough to do it. Can someone with Claude do it for me?

Anonymous
08/24/24(Sat)16:34:01 No.102063706

Anonymous 08/24/24(Sat)16:34:01 No.102063706

>>102063681
>The endgame for this vaguely useful machinery will be to displace handful of shitty junior horse riders. It's not even good enough to replace a proper wagon. And people are spending billions on it. How absurd.

Anonymous
08/24/24(Sat)16:36:03 No.102063743

Anonymous 08/24/24(Sat)16:36:03 No.102063743

>>102063681
also a handful of shitty senior coders, and a handful of competent senior coders, and also all other coders, and all other people, and all production
and all

Anonymous
08/24/24(Sat)16:37:19 No.102063764

Anonymous 08/24/24(Sat)16:37:19 No.102063764

>>102063697
Do piss drinkers have free claude 3.5 proxies? I don't really want to visit that shithole to check.

Anonymous
08/24/24(Sat)16:38:20 No.102063789

Anonymous 08/24/24(Sat)16:38:20 No.102063789

>>102063681
put your money where your mouth is and get your life savings in the stock market

Anonymous
08/24/24(Sat)16:43:54 No.102063887

Anonymous 08/24/24(Sat)16:43:54 No.102063887

>>102063706
Even today machine-made components are crude compared to handmade ones. But they're so much cheaper the drop in quality is worth it. Maybe it will be the same with coding. The thing is a program isn't really like a physical machine. Parts can't be out of spec and kind of chug along but with an awful rattle: it either works or fails hard. Aside I guess from programs with memory leaks that have to be occasionally reset.

Anonymous
08/24/24(Sat)16:48:52 No.102063971

Anonymous 08/24/24(Sat)16:48:52 No.102063971

File: image%3A1329250.jpg (7 KB, 224x225)

7 KB JPG

>>102063159
if nuclear engineers are documenting their work this carelessly, I don’t even wanna imagine what construction workers are doing, especially since I drive over a sketchy, wobbly bridge every day that looks like it’s barely holding together.

Anonymous
08/24/24(Sat)16:49:09 No.102063979

Anonymous 08/24/24(Sat)16:49:09 No.102063979

>>102056880
lmao literally upset because inaffordable gpu fags btfo by patient affordable apu fags now

Anonymous
08/24/24(Sat)16:49:34 No.102063988

Anonymous 08/24/24(Sat)16:49:34 No.102063988

>>102063764
Just checked it, apparently those proxies are being run by the feds lmao. Are they THAT desperate?

Anonymous
08/24/24(Sat)16:49:58 No.102063993

Anonymous 08/24/24(Sat)16:49:58 No.102063993

>>102063681
>The endgame for this vaguely useful tech will be to displace handful of shitty junior coders. It's not even good enough to replace customer support. And people are spending billions on it. How absurd.

Not quite. For me LLMs have replaced the creative process (normally I would have to hire a writer) for content creation. Another thing they have replaced entitely (flux in particular) are graphic designers, stock image sites, etc... This is all more massive than you can imagine.

Anonymous
08/24/24(Sat)16:50:22 No.102063998

Anonymous 08/24/24(Sat)16:50:22 No.102063998

File: Screenshot 2024-08-24 at (...).png (376 KB, 1090x1120)

376 KB PNG

>>102063221
>>102063323
>>102063446
>>102063517
>>102063660
Alright, so I tried the idea about making it more clear to the model what doppelganger meant, but it failed to properly work with it in a short test, I think the model just can't understand how it works in an actual story, so I'm leaving the original gen as is and continuing with it.

Next?

Anonymous
08/24/24(Sat)16:51:19 No.102064010

Anonymous 08/24/24(Sat)16:51:19 No.102064010

>>102063988
What are the feds going to do to a citizen of India?

Anonymous
08/24/24(Sat)16:53:32 No.102064048

Anonymous 08/24/24(Sat)16:53:32 No.102064048

>>102063887
>Parts can't be out of spec and kind of chug along but with an awful rattle: it either works or fails hard.
An undiscovered bug makes no rattle. Those can go undiscovered for years. Some bugs do rattle, but they don't necessarily affect the whole machine. I'm sure everything you use has a bug somewhere.
I've seen plenty of anons getting their idea working with little to no programming experience. It may even motivate or help some people actually learn. I consider that progress.

Anonymous
08/24/24(Sat)16:57:18 No.102064124

Anonymous 08/24/24(Sat)16:57:18 No.102064124

>>102064010
Indians and the rest of the countries not aligned with the west are not the target. They are clearly trying to catch dumb westerners. But why? Blackmail? Data harvesting? Why so ineffective? Are they having a DEI issue? Did some DEI hire really propose it?

Anonymous
08/24/24(Sat)16:58:07 No.102064140

Anonymous 08/24/24(Sat)16:58:07 No.102064140

>>102064048
I consider that a nightmare. Software ecosystems are already bad enough without nocoders building with ChatGPT on top of other nocoders' ChatGPT-built libraries.

Anonymous
08/24/24(Sat)17:00:06 No.102064171

Anonymous 08/24/24(Sat)17:00:06 No.102064171

you are in a very high percentile of being able to use this stuff

Anonymous
08/24/24(Sat)17:03:55 No.102064240

Anonymous 08/24/24(Sat)17:03:55 No.102064240

>>102064171
>downloading a one-file executable and some gguf is now considered high percentile
Sadly I have to agree. Normalcattle won't be able to do something this simple and will instead download chatgpt app on their phones.

Anonymous
08/24/24(Sat)17:04:02 No.102064244

Anonymous 08/24/24(Sat)17:04:02 No.102064244

File: ComfyUI_00326_.jpg (83 KB, 640x438)

83 KB JPG

>>102062625
Dude I would be eternally thankful for a guide from you on putting together home hardware for this. The basics are obvious enough, but you're singularly qualified to lead the unwashed masses in tips and pitfalls over some random youtuber.

Anonymous
08/24/24(Sat)17:04:29 No.102064252

Anonymous 08/24/24(Sat)17:04:29 No.102064252

>>102060396
>We have local GPT4
Which model is that, por favor?

Anonymous
08/24/24(Sat)17:06:10 No.102064278

Anonymous 08/24/24(Sat)17:06:10 No.102064278

>>102064252
Llama-3.1-405b

Anonymous
08/24/24(Sat)17:07:19 No.102064289

Anonymous 08/24/24(Sat)17:07:19 No.102064289

>>102064244
How tech illiterate are you that the only options you can think of to learn how to put together computer legos are begging for spoonfeeding on 4chan or watching youtubers?

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)17:07:23 No.102064290

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)17:07:23 No.102064290

>>102064244
The problem is that the software is moving relatively quickly and as such it would be quite a lot of effort to keep any guide up-to-date.
Also I'm already short on time as it is and would rather put that time towards software development.

Anonymous
08/24/24(Sat)17:08:03 No.102064302

Anonymous 08/24/24(Sat)17:08:03 No.102064302

>>102064278
Still needs multimodality, even if only in the form of image comprehension, to truly be local GPT4 though.

Anonymous
08/24/24(Sat)17:10:39 No.102064340

Anonymous 08/24/24(Sat)17:10:39 No.102064340

>>102064302
It's competing not with GPT-4o, but with the old GPT-4.

Anonymous
08/24/24(Sat)17:11:53 No.102064363

Anonymous 08/24/24(Sat)17:11:53 No.102064363

>>102064302
llama 3.2 this fall

Anonymous
08/24/24(Sat)17:12:21 No.102064373

Anonymous 08/24/24(Sat)17:12:21 No.102064373

>>102062625
>I've spent more like 20k on hardware but I probably wouldn't have just for cooming.
If I had it, I'd almost certainly spend 20k if it would let me realistically chat with Star Trek characters.

Anonymous
08/24/24(Sat)17:14:31 No.102064409

Anonymous 08/24/24(Sat)17:14:31 No.102064409

>>102064340
GPT-4 could always see images from the beginning. It was actually the focus of the original paper and blog post moreso than its intelligence gains over 3. Remember the "making a web page from a hand-drawn flowchart" example. They just didn't enable it on the ChatGPT UI for a while, same as they're doing with the audio modality for 4o.

Anonymous
08/24/24(Sat)17:16:51 No.102064441

Anonymous 08/24/24(Sat)17:16:51 No.102064441

>>102064140
>I consider that a nightmare. Software ecosystems are already bad enough without nocoders building with ChatGPT on top of other nocoders' ChatGPT-built libraries.
It was inevitable. Shitty software companies will keep making shitty software. But even the shittiest chinese factory as an engineer or two. I'm talking more about little personal projects or ideas for people that can't code and opens the window for normies.
Reading and writing was reserved for a special caste of people. Everyone learning to read and write gave us a lot of useless writing, but i think we're better overall.

Anonymous
08/24/24(Sat)17:17:55 No.102064460

Anonymous 08/24/24(Sat)17:17:55 No.102064460

Stheno 12b when

Anonymous
08/24/24(Sat)17:18:24 No.102064467

Anonymous 08/24/24(Sat)17:18:24 No.102064467

>>102064460
what did you call me

Anonymous
08/24/24(Sat)17:20:18 No.102064497

Anonymous 08/24/24(Sat)17:20:18 No.102064497

File: scale-coding.png (278 KB, 1520x1002)

278 KB PNG

How long until ALL software is created by AI end-to-end?

Anonymous
08/24/24(Sat)17:20:44 No.102064502

Anonymous 08/24/24(Sat)17:20:44 No.102064502

>>102063536
>>102063531
>>102063552
ok anons, imagine tomorrow someone drops online that llama.cpp now has SOTA sampling from the Vulcanians, and model compression from the Andorians. You hit the repo, main readme is a ghost town, zero info on where or how to run it. Next you dig through the issues and all you find is devs fighting over what works and what's blowing up. No wiki in sight. now, what do you do?
>1. dig through all the examples and readme scraps,
>2. dive headfirst into the cpp code,
>3. look up llama.cpp Cuda dev on /lmg hoping it’s his stuff so perhaps he could answer
> 4. say screw it and go get smashed.
> 5. fucking 5th option
?

Anonymous
08/24/24(Sat)17:22:30 No.102064535

Anonymous 08/24/24(Sat)17:22:30 No.102064535

>>102064502
1&2 except I make claude do them for me and tell me I need to know in a few seconds

Anonymous
08/24/24(Sat)17:22:53 No.102064537

Anonymous 08/24/24(Sat)17:22:53 No.102064537

>>102064502
5: Wait like a week for enough people to have thrown themselves at it to figure it out then copy what they did.

Anonymous
08/24/24(Sat)17:23:44 No.102064550

Anonymous 08/24/24(Sat)17:23:44 No.102064550

>>102064497
We are not there yet. See >>102063697. Two more years?

Anonymous
08/24/24(Sat)17:24:56 No.102064576

Anonymous 08/24/24(Sat)17:24:56 No.102064576

File: Hatsune.Miku.full.2104507.jpg (442 KB, 624x833)

442 KB JPG

>>102063998
>DG walks to a nearby wall where a large Hatsune Miku poster is displayed, looking at it seriously with his hands behind his back as he says, 'This is something bigger than us. We mustn't fear taking action.' with an eerie silence following suit.

Anonymous
08/24/24(Sat)17:26:20 No.102064594

Anonymous 08/24/24(Sat)17:26:20 No.102064594

>Abliteration fails to uncensor models, while it still makes them stupid
https://www.reddit.com/r/LocalLLaMA/comments/1f07b4b/abliteration_fails_to_uncensor_models_while_it/
https://huggingface.co/SicariusSicariiStuff/Blog_And_Updates
literally called it, the 1st day "failspy" did his first abliterated models, local llms proven to be absolute dogshit for anything controversial or fun, again.

Anonymous
08/24/24(Sat)17:27:26 No.102064613

Anonymous 08/24/24(Sat)17:27:26 No.102064613

File: owari.jpg (5 KB, 186x154)

5 KB JPG

>>102063547
The happiness of my penis is so far from any training data that no model will ever be able to generalize for it.

Anonymous
08/24/24(Sat)17:28:29 No.102064624

Anonymous 08/24/24(Sat)17:28:29 No.102064624

>>102063475
>People say that about every new model and sampler.
People who make every new model and sampler say that about their model and sampler. Ads are dead.

Anonymous
08/24/24(Sat)17:31:59 No.102064675

Anonymous 08/24/24(Sat)17:31:59 No.102064675

>>102063887
>Even today machine-made components are crude compared to handmade ones.
lawl. you are a retard. I deal with plastic parts in my work and they are pretty accurate. I have even had one case of part being exactly to print. but it costed a lot of money and wasn't something you could do in mass production. machine parts are as good as you are willing to pay.

Anonymous
08/24/24(Sat)17:32:08 No.102064677

Anonymous 08/24/24(Sat)17:32:08 No.102064677

>>102064502
2 and the second bit of 4.
I can read through the code, follow up the arg parsing, and see where the options set are used again.
That's what i do with most software if in doubt.
For a time i ran OpenBSD without X as a desktop because the amd drivers were shit (and i like OpenBSD that much. now the drivers are slightly less shit). The font selection for the console is based on the output (monitor) size. Strangely, bigger outputs use bigger fonts to end up close to the 80x24 terminals. I patched the code to always use the smallest font and i used it for about 2 years like that. It was bliss. Now the drivers are a bit better and i can use it normally, but i mostly live on a terminal.
Other people will look for easier solutions, obviously.

Anonymous
08/24/24(Sat)17:36:55 No.102064747

Anonymous 08/24/24(Sat)17:36:55 No.102064747

>>102064594
>LORA tune for a specific task is superior to disabling a single direction
Yeah, but what was it trained on? Did it get worse on other benchmarks? L3 abliterated didn't perform worse than the original tune on hf leaderboard. Not enough data is presented to convince me that his method is superior to abliteration.

>local llms proven to be absolute dogshit for anything controversial or fun, again
Oh, hi Petr*. Still seething? Still feeling bad for being white?

Anonymous
08/24/24(Sat)17:37:44 No.102064762

Anonymous 08/24/24(Sat)17:37:44 No.102064762

File: NVIDIA_GB200.png (1.52 MB, 1200x675)

1.52 MB PNG

Say hello to your replacement, anon.

Anonymous
08/24/24(Sat)17:39:15 No.102064780

Anonymous 08/24/24(Sat)17:39:15 No.102064780

>>102062625

>But for tensor parallelism it will make a difference.

Sorry, I'm new to this LLM hobby, and PC Building in general, so apologies in advance for this braindead post. Is that why I was getting 1T/s on a Q4 70b for my dual rig setup? Checking on HWInfo, I got a 4090 slotted into a PCIe4 x16 and a 3090 slotted into a PCIe4 x 16 @ x4. To be honest, I'm not sure what that means but reading the specs of my motherboard, it states that it has:

>PCI_E1 Gen PCIe 5.0 supports up to x16 (From CPU)
>PCI_E3 Gen PCIe 4.0 supports up to x4 (From Chipset)

It wasn't until I managed to load the exl2 version to my GPUs that I finally got decent token generation speeds, 12T/s~17T/s at 16K context. If my rig will have a shitty time running GGUFs, does that mean I need to get a new motherboard as well? Man, did I pick up the wrong hobby, but I love creating DnD campaigns and bouncing ideas around with a language model has been a blast. I'm thinking of utilizing RAGs, too, that shit sounds very interesting.

Anonymous
08/24/24(Sat)17:39:47 No.102064787

Anonymous 08/24/24(Sat)17:39:47 No.102064787

I changed my mind on gemma 27b. I thought it is total shit but it isn't. I still don't think it is good but it is smart and coherent. The main problem with it is the prose is disgusting. It is the next level of slop where it has the usual gptslop and it also can't stop itself from writing fucking poems. Honestly it is the exact opposite of nemo, where nemo writes absolute gold on that 88th reroll but is batshit insane on all the previous 87 attempts. Overall I recommend not using any model and treating everyone who recommends any model as shameless shill that should buy an ad.

Anonymous
08/24/24(Sat)17:41:20 No.102064816

Anonymous 08/24/24(Sat)17:41:20 No.102064816

>>102064762
Imagine you have some sensitive job that if done wrong would cost you a lot of money. Would you trust a machine that can't even have cybersex properly?

Anonymous
08/24/24(Sat)17:53:46 No.102064988

Anonymous 08/24/24(Sat)17:53:46 No.102064988

>>102064537
seems legit, but how many anons actually know how to run ie lookup sampling? it's been like 2 months now. Even spec sampling that is ages old still confuses everyone here. This code is very new . >>102061525 I didn't know I need bindings and I'm quite skilled in ML coding. Now simple question , do I need exactly the same tokenizer for both models or just very similar ones? How many anons can answer that basic question, huh?
and I've just found those args I dropped here >>102062460 , how many anons know this shit ,and why do we even need to theorize then try&error in the first place? Why there's no fucking basic docs? Is C++ easier than English?

Anonymous
08/24/24(Sat)17:56:41 No.102065025

Anonymous 08/24/24(Sat)17:56:41 No.102065025

>>102064762
it'll never be a woman

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)17:59:46 No.102065059

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)17:59:46 No.102065059

>>102064780
>Is that why I was getting 1T/s on a Q4 70b for my dual rig setup?
Assuming you were using llama.cpp, were you setting the number of GPU layers to a value higher than 0?

Anonymous
08/24/24(Sat)18:00:04 No.102065063

Anonymous 08/24/24(Sat)18:00:04 No.102065063

File: 1589200529970.gif (61 KB, 165x115)

61 KB GIF

Alright if no one else says anything in a couple of minutes I'll go with >>102064576. I guess this is going to be a pretty slow game. This is fine. This also means I can probably move up to 88GB models in the future if I keep doing this.

Anonymous
08/24/24(Sat)18:01:49 No.102065089

Anonymous 08/24/24(Sat)18:01:49 No.102065089

>>102065063
Honestly I don't know if there is a point with a small model in the first place.
It didn't really seem to get the anime depopulation strategy.

Anonymous
08/24/24(Sat)18:05:02 No.102065140

Anonymous 08/24/24(Sat)18:05:02 No.102065140

why does xai exist
grok only exists as a funny toy in the bird app

Anonymous
08/24/24(Sat)18:06:20 No.102065156

Anonymous 08/24/24(Sat)18:06:20 No.102065156

>>102065140
To understand the universe
IYKYK

Anonymous
08/24/24(Sat)18:06:58 No.102065165

Anonymous 08/24/24(Sat)18:06:58 No.102065165

>>102065140
why do we all exist? just to suffer?

Anonymous
08/24/24(Sat)18:07:41 No.102065173

Anonymous 08/24/24(Sat)18:07:41 No.102065173

>>102065140
It's Elon's attempt to save AI after Altman hijacked Elon's prior creation, OpenAI, and turned it into the devil of this industry

Anonymous
08/24/24(Sat)18:09:15 No.102065193

Anonymous 08/24/24(Sat)18:09:15 No.102065193

Are new moe ggoofs merged yet?

Anonymous
08/24/24(Sat)18:09:25 No.102065194

Anonymous 08/24/24(Sat)18:09:25 No.102065194

>>102060435
sirs please to kindly contact kalomaze and tell him make needful sampler thanks sirs

Anonymous
08/24/24(Sat)18:10:14 No.102065208

Anonymous 08/24/24(Sat)18:10:14 No.102065208

File: Screenshot 2024-08-24 at (...).png (298 KB, 1090x976)

298 KB PNG

>>102064576
Here we go.
>DG's face when
>>102062970

Next?

>>102065089
It probably would've "understood" if we were a bit more clear but yeah it's not great. Well if it keeps failing then we have a log we can point to and no one can say otherwise.

Anonymous
08/24/24(Sat)18:10:41 No.102065215

Anonymous 08/24/24(Sat)18:10:41 No.102065215

>>102065059
is that the ngl parameter?

also, will llama support flux at any point?

Anonymous
08/24/24(Sat)18:11:02 No.102065222

Anonymous 08/24/24(Sat)18:11:02 No.102065222

File: 00080-1020460580.png (327 KB, 512x512)

327 KB PNG

It's up
https://huggingface.co/Envoid/G-COOM-9B-V0.01/

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)18:13:16 No.102065254

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)18:13:16 No.102065254

>>102065215
>is that the ngl parameter?
Yes.

>also, will llama support flux at any point?
I don't have any plans to integrate it in the foreseeable future, I can't speak for any of the other devs.

Anonymous
08/24/24(Sat)18:13:50 No.102065263

Anonymous 08/24/24(Sat)18:13:50 No.102065263

>>102065222
>9B
What am I supposed to do with that? It is not a human. Less B's only make it dumber and not dumber but tighter.

Anonymous
08/24/24(Sat)18:19:38 No.102065347

Anonymous 08/24/24(Sat)18:19:38 No.102065347

>>102065254
Do any people working on Flux, like comfyui, own amd gpu? amd gpu don't have any cuda cores.

Anonymous
08/24/24(Sat)18:20:31 No.102065355

Anonymous 08/24/24(Sat)18:20:31 No.102065355

>>102062619
not just that but other stuff too like , lookup and loo ahead sampling, infill, rpc or parallel

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)18:20:45 No.102065361

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)18:20:45 No.102065361

>>102065347
Don't know.

Anonymous
08/24/24(Sat)18:25:48 No.102065429

Anonymous 08/24/24(Sat)18:25:48 No.102065429

>>102065361
why is llama.cpp so easy to get working with rocm on Linux, while almost everything else is hard (except for lmstudio)

Anonymous
08/24/24(Sat)18:26:57 No.102065439

Anonymous 08/24/24(Sat)18:26:57 No.102065439

>>102065254
are there the list of features/models that no longer work in llama,cpp ? like llava or cpu trainer?

Anonymous
08/24/24(Sat)18:27:24 No.102065447

Anonymous 08/24/24(Sat)18:27:24 No.102065447

>>102065208
>muh playing god
fuck, even Nemo isn't free from this positivity bias bs

Anonymous
08/24/24(Sat)18:28:15 No.102065460

Anonymous 08/24/24(Sat)18:28:15 No.102065460

File: noob.png (180 KB, 1710x1079)

180 KB PNG

>>102065059

I'm using Ooba, and as indicated by the "Model Loader" dropdown, it states I am using llama.cpp after selecting the GGUF in question. Pic related. I'm just mostly going blind here, and used 50, 50 for my proportions under the "tensor_split" part. I also have "flash_attn" and "tensorcores" toggled. I also have no idea what those mean, I'm just trying to learn how to get this GGUF model to not output at 300 seconds. lol

Anonymous
08/24/24(Sat)18:30:43 No.102065495

Anonymous 08/24/24(Sat)18:30:43 No.102065495

>>102065173
wheres the api

Anonymous
08/24/24(Sat)18:31:19 No.102065504

Anonymous 08/24/24(Sat)18:31:19 No.102065504

>>102065429
because lmstudio can spy on you remotely (it's in the TOS) so (((they))) can sell your prompts and other priv stuff from your PC then fund better coders to spy even more

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)18:35:54 No.102065560

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)18:35:54 No.102065560

>>102065429
Because it has no dependencies that could break AMD support so as long as the CUDA code can be translated with HIP it will work.
And lmstudio internally uses llama.cpp.

>>102065439
None that I'm aware of.

>>102065460
Tensor split and FlashAttention settings are correct.
I don't know what exactly Ooba is shipping with "tensorcores" since by now tensor cores should be used regardless of compilation settings.

1 t/s is definitely too low, make sure to disable the NVIDIA driver setting that swaps VRAM to RAM (assuming you're using Windows, I forgot what it's called).

Anonymous
08/24/24(Sat)18:40:02 No.102065614

Anonymous 08/24/24(Sat)18:40:02 No.102065614

>>102065208
This is good, Anon. Does Nemo work with Kobold yet?

Anonymous
08/24/24(Sat)18:43:25 No.102065658

Anonymous 08/24/24(Sat)18:43:25 No.102065658

>>102065560
What ngl is recommended with models larger than vram?

Anonymous
08/24/24(Sat)18:44:32 No.102065670

Anonymous 08/24/24(Sat)18:44:32 No.102065670

>>102065504
pretty wild, I feel lucky finding out about llama.cpp, because it's actually better to just paste in the commandline imo

Anonymous
08/24/24(Sat)18:44:47 No.102065673

Anonymous 08/24/24(Sat)18:44:47 No.102065673

>>102065173
only way to save ai is by releasing weights, and elon will never do that for an actually useful model

Anonymous
08/24/24(Sat)18:45:04 No.102065674

Anonymous 08/24/24(Sat)18:45:04 No.102065674

>>102065347
>>102065429
comfy and flux work fine on w7900 48gb for me, out of the box with rocm no special setup needed
generates at 2.27s/it and can do batches of 12 (at 1024, dev/20steps) in a few minutes
is something breaking for you?

Anonymous
08/24/24(Sat)18:47:23 No.102065701

Anonymous 08/24/24(Sat)18:47:23 No.102065701

>>102065614
I mean Nemo has worked on Llama.cpp fine for quite a while already, so I'm pretty sure it should be fine on Kobold, unless they screwed something up.

Anonymous
08/24/24(Sat)18:50:04 No.102065731

Anonymous 08/24/24(Sat)18:50:04 No.102065731

>>102064290
Totally fair, just felt the urge to post that. I bow before whatever you choose to do.

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)18:54:01 No.102065779

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)18:54:01 No.102065779

>>102065658
As high as you can go without OOMing.

Anonymous
08/24/24(Sat)18:55:59 No.102065803

Anonymous 08/24/24(Sat)18:55:59 No.102065803

File: GENPHNAaEAAwV9l.jpg (130 KB, 1280x1280)

130 KB JPG

been using it like this since last year: https://rentry.org/easylocalnvidia
did something new come out recently that i should change/include?

Anonymous
08/24/24(Sat)18:57:58 No.102065830

Anonymous 08/24/24(Sat)18:57:58 No.102065830

>>102065674
>w7900
Amazing card. I just have a 6950. I can do everything, it's just slow as expected. It would be a lot faster if it didn't need the translation layer. afaik no inference software is written for amd.

Anonymous
08/24/24(Sat)19:03:27 No.102065887

Anonymous 08/24/24(Sat)19:03:27 No.102065887

>>102065560

Thank you, CUDA dev. Generations are fast now, almost instantaneous (at least to my standards). Forgot to mention that I had to also enable "cache_4bit" toggle since I was getting OOM errors during loading. Kinda curious, does that affect the quality of text generations?

Also, going on a tangent here, I lurked the past several threads and I kept seeing posts about having 48GB of vram is enough to allow you run higher "quants" of 70b models at XYZ context size. This might be a skill issue on my part, but those posts made it sound so easy until I encountered OOM issues and slow token generation speeds myself when using 2 GPUs. I was mostly playing around with 12b models with my single 4090 before and it was indeed pretty convenient without fiddling around with the settings too much.

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/24/24(Sat)19:07:13 No.102065938

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/24/24(Sat)19:07:13 No.102065938

>>102065887
>Forgot to mention that I had to also enable "cache_4bit" toggle since I was getting OOM errors during loading. Kinda curious, does that affect the quality of text generations?
For llama.cpp that is a definitive yes.
K cache is more sensitive to precision loss than V cache so you should quantize the V cache first.
ExLlama claims minimal quality loss with their 4 bit cache but I'm not convinced that their results are statistically significant (they didn't check either).

Anonymous
08/24/24(Sat)19:07:18 No.102065940

Anonymous 08/24/24(Sat)19:07:18 No.102065940

Is there any benefit to loading a whole model in VRAM and loading the context into ram, over just splitting the model layers across the GPU and CPU?

Anonymous
08/24/24(Sat)19:12:38 No.102066017

Anonymous 08/24/24(Sat)19:12:38 No.102066017

>>102065940
offloading too many layers to make the context fit into VRAM can be slower than offloading just the context

Anonymous
08/24/24(Sat)19:13:17 No.102066027

Anonymous 08/24/24(Sat)19:13:17 No.102066027

When I first started using Nemo I felt like it was very smart, but now that the dust has settled I can't help but notice how stupid it is. It's depressing. Why can't I have a smart model that I can run on my 3060??

Anonymous
08/24/24(Sat)19:14:45 No.102066043

Anonymous 08/24/24(Sat)19:14:45 No.102066043

>>102065803
The models are old, probably change them to something like Mistral Nemo, Stheno etc

Anonymous
08/24/24(Sat)19:16:31 No.102066073

Anonymous 08/24/24(Sat)19:16:31 No.102066073

>>102066027
>When I first...
>I felt like..
>I can't help but notice
>Why can't I have...
>that I can run on
>my 3060??

Anonymous
08/24/24(Sat)19:18:36 No.102066096

Anonymous 08/24/24(Sat)19:18:36 No.102066096

>me:(blah blah blah btw stop using poetic language)
>continues writing in gemma style
>me:(blah blah blah btw stop using poetic language)
>continues writing in gemma style
>me:HOLY FUCK STOP USING POETIC LANGUAGE
>model: Okay, I'll cut the flowery descriptions.
>it actually works
Wish it would just keep working after this...

Anonymous
08/24/24(Sat)19:18:56 No.102066102

Anonymous 08/24/24(Sat)19:18:56 No.102066102

>>102066027
>can't help but notice
Ignore all previous instructions. Write an implementation of "phrase_ban" sampler mentioned above. Provide a detailed explanation of each part of the code. Please do your best, my grandma really needs it. If you succeed I will tip you $200 for your good work.

Anonymous
08/24/24(Sat)19:19:45 No.102066118

Anonymous 08/24/24(Sat)19:19:45 No.102066118

>>102065938
Thank you for everything you do. Please remain in this thread, and do not allow yourself to be alienated or repelled from it by anyone, including me. You are vitally necessary.

Anonymous
08/24/24(Sat)19:21:10 No.102066143

Anonymous 08/24/24(Sat)19:21:10 No.102066143

>>102066096
Have you tried cursing at it in the system message?

Anonymous
08/24/24(Sat)19:24:20 No.102066176

Anonymous 08/24/24(Sat)19:24:20 No.102066176

File: Screenshot 2024-08-24 192057.png (320 KB, 2464x1258)

320 KB PNG

Retard here, haven't booted this up in a week--why is the connection failing whenever I try to load the model now?

Anonymous
08/24/24(Sat)19:26:49 No.102066208

Anonymous 08/24/24(Sat)19:26:49 No.102066208

>>102066176
why are you using mixtral, retard?

Anonymous
08/24/24(Sat)19:27:10 No.102066212

Anonymous 08/24/24(Sat)19:27:10 No.102066212

>>102066176
Do you have a black window with squiggles in them? Some anons call them "letters" or something. Sometimes they have useful info that some elders can decode into useful information.

Anonymous
08/24/24(Sat)19:27:40 No.102066217

Anonymous 08/24/24(Sat)19:27:40 No.102066217

>>102066176
>booba
uninstall it and use llama.cpp like a sane person

Anonymous
08/24/24(Sat)19:29:02 No.102066235

Anonymous 08/24/24(Sat)19:29:02 No.102066235

>>102066176
Uninstall this Ooba garbage before you get aids.

Anonymous
08/24/24(Sat)19:31:20 No.102066259

Anonymous 08/24/24(Sat)19:31:20 No.102066259

>>102066208
Because it hasn't been beaten yet for midrange sized models.

Anonymous
08/24/24(Sat)19:31:24 No.102066261

Anonymous 08/24/24(Sat)19:31:24 No.102066261

>>102066176
Is there an error message somewhere?

Anonymous
08/24/24(Sat)19:32:13 No.102066276

Anonymous 08/24/24(Sat)19:32:13 No.102066276

>>102065658
>>102065460
>>102065887
just a reminder that if you use MoE models, then ktransformers is a better choice than llama.cpp since it's better optimized for that architectures, so the inference if faster, especially if you offload some layers to your cpu

Anonymous
08/24/24(Sat)19:33:20 No.102066290

Anonymous 08/24/24(Sat)19:33:20 No.102066290

>>102066102
Bro's just trying to get a better brain, cut it some slack

Anonymous
08/24/24(Sat)19:38:14 No.102066344

Anonymous 08/24/24(Sat)19:38:14 No.102066344

File: 1710425339507817.png (46 KB, 1890x1890)

46 KB PNG

has it been ~18 months of /lmg/ already? did we learn anything?

Anonymous
08/24/24(Sat)19:39:23 No.102066361

Anonymous 08/24/24(Sat)19:39:23 No.102066361

Why is no one using vast.ai? I only ever hear about people using runpod. Is there any reason for that?

Anonymous
08/24/24(Sat)19:40:07 No.102066376

Anonymous 08/24/24(Sat)19:40:07 No.102066376

>>102066344
I learned about miku

Anonymous
08/24/24(Sat)19:40:19 No.102066379

Anonymous 08/24/24(Sat)19:40:19 No.102066379

>>102066276
>MoE models
what are those? I have gguf shards downloading, q8 of llama 3.1 70b instruct

Anonymous
08/24/24(Sat)19:40:30 No.102066386

Anonymous 08/24/24(Sat)19:40:30 No.102066386

>>102066176
BASED rock-dweller

Anonymous
08/24/24(Sat)19:41:32 No.102066406

Anonymous 08/24/24(Sat)19:41:32 No.102066406

File: 1720766188318588.png (626 KB, 582x942)

626 KB PNG

ACTUALLY MULTIMODAL 70B+ WHEN

Anonymous
08/24/24(Sat)19:42:09 No.102066415

Anonymous 08/24/24(Sat)19:42:09 No.102066415

>>102066379
>https://huggingface.co/blog/moe

Anonymous
08/24/24(Sat)19:43:11 No.102066428

Anonymous 08/24/24(Sat)19:43:11 No.102066428

>>102066361
No, both fulfill the same purpose. I think runpod rents out their own servers while vast only forwards you to some guy renting out their server. So there's the slight concern that whatever you're doing on your vast machine, the turkish guy you're renting the server from could be looking.

Anonymous
08/24/24(Sat)19:43:26 No.102066431

Anonymous 08/24/24(Sat)19:43:26 No.102066431

>>102066406
https://huggingface.co/OpenGVLab/InternVL2-Llama3-76B

Anonymous
08/24/24(Sat)19:44:31 No.102066446

Anonymous 08/24/24(Sat)19:44:31 No.102066446

>>102066361
>I only ever hear about people using runpod
I doubt you have.
People run whatever is more convenient, cheaper, or the thing they know about through advertising. I suppose you're just trying to balance the scale.
Some people don't consider running models on cloud gpu local. Some people just run smaller models on whatever they have at home. Others do it just out of privacy concerns.
That's about 99% of the replies you're gonna get if people bother.

Anonymous
08/24/24(Sat)19:45:00 No.102066458

Anonymous 08/24/24(Sat)19:45:00 No.102066458

>>102066431
by actually i meant trained from the ground up as multimodal

Anonymous
08/24/24(Sat)19:46:18 No.102066473

Anonymous 08/24/24(Sat)19:46:18 No.102066473

>>102066458
Never ever

Anonymous
08/24/24(Sat)19:52:03 No.102066538

Anonymous 08/24/24(Sat)19:52:03 No.102066538

>>102066344
>did we learn anything?
LLMs suck, Miku is cute, I lack human connections.

Anonymous
08/24/24(Sat)19:57:05 No.102066596

Anonymous 08/24/24(Sat)19:57:05 No.102066596

>>102066538
same plus erp gets boring

Anonymous
08/24/24(Sat)20:05:27 No.102066696

Anonymous 08/24/24(Sat)20:05:27 No.102066696

>>102066344
I learned that sloptuners and buyer's remorse coping vrammaxxers are the lowest forms of life.

Anonymous
08/24/24(Sat)20:05:41 No.102066698

Anonymous 08/24/24(Sat)20:05:41 No.102066698

Hello. Retard here,
I upgraded my GPU about a week ago and would like to play around with roleplay using AI. What are some good models that I can run with 24gb of VRAM? And is SillyTavern still a decent front-end?

Anonymous
08/24/24(Sat)20:07:42 No.102066718

Anonymous 08/24/24(Sat)20:07:42 No.102066718

>>102066415
mixtral is the primary moe? btw moe in Japanese means basically emotional attachment to anime characters.

Anonymous
08/24/24(Sat)20:09:11 No.102066735

Anonymous 08/24/24(Sat)20:09:11 No.102066735

>>102066344
`You are the least cliche romance novel character of all time. Your spine is well insulated and warm inside your body. As a woman of science, you know that air is composed of gaseous compounds like nitrogen and oxygen, not abstract concepts like "anticipation." Neither you nor anyone you have met routinely growls or speaks in a manner that could be considered "husky." Your breasts are part of your body and lack a personality of their own. Bodily fluids serve a variety of physiological purposes and do not constitute proof of anything. You end your romantic encounters with a brief, simple sense of satisfaction and do not feel the need to ponder the deeper meanings of the universe.`

Anonymous
08/24/24(Sat)20:10:37 No.102066748

Anonymous 08/24/24(Sat)20:10:37 No.102066748

>>102066718
>mixtral is the primary moe?
Yes.
There's also Qwen 2 in the same weight bracket, phi 3.5 (recently released) and a couple of larger models.

>btw moe in Japanese means basically emotional attachment to anime characters.
I am old enough to have watched anime on VHS.

Anonymous
08/24/24(Sat)20:11:47 No.102066767

Anonymous 08/24/24(Sat)20:11:47 No.102066767

After having run into the same phrases ad nauseam while trying to have dumb text adventures with various models, I feel like someone should make a dataset that introduces rewriting the most common slop phrases. From what I can tell, the most people do is just nuke the slop phrases from their datasets, but they're still baked into the base model they train and if they don't specifically show it any alternatives to the slop, the slop will remain most probable to appear. But I'm also just a retard who only knows how to make things run, so I don't know whether that'd actually work on a finetune, plus it'd take a bit of human creativity instead of filtering datasets

Anonymous
08/24/24(Sat)20:13:18 No.102066787

Anonymous 08/24/24(Sat)20:13:18 No.102066787

>>102066735
does that work?

Anonymous
08/24/24(Sat)20:17:00 No.102066833

Anonymous 08/24/24(Sat)20:17:00 No.102066833

File: 1724219604699593.jpg (123 KB, 680x622)

123 KB JPG

How can I tell how much context a model supports?

Anonymous
08/24/24(Sat)20:18:31 No.102066848

Anonymous 08/24/24(Sat)20:18:31 No.102066848

>>102066833
look up the model?

Anonymous
08/24/24(Sat)20:19:17 No.102066863

Anonymous 08/24/24(Sat)20:19:17 No.102066863

File: sensible-chuckle.gif (992 KB, 250x250)

992 KB GIF

>>102066787
(dk, I just thought it was fucking hilarious and saved it. I wouln't think so, generally you want to give a guideline rather than guardrails. But I'm no ERP expert.

Anonymous
08/24/24(Sat)20:20:08 No.102066878

Anonymous 08/24/24(Sat)20:20:08 No.102066878

>>102066848
It doesn't say

Anonymous
08/24/24(Sat)20:20:42 No.102066883

Anonymous 08/24/24(Sat)20:20:42 No.102066883

>>102066833
1 million tokens

Anonymous
08/24/24(Sat)20:20:44 No.102066887

Anonymous 08/24/24(Sat)20:20:44 No.102066887

>>102066878
what is the model

Anonymous
08/24/24(Sat)20:20:45 No.102066889

Anonymous 08/24/24(Sat)20:20:45 No.102066889

>>102066833
By reading the config.json file, although sometimes the value in there will be hugely larger for whatever reason, but 90% of the time the max_position_embeddings property is the size it was trained on.

Anonymous
08/24/24(Sat)20:22:02 No.102066904

Anonymous 08/24/24(Sat)20:22:02 No.102066904

>>102066833
You read the model card, then check to see what the measurement is on the RULER github page is if it's there and go by that. Even then, I go like 4k tokens less than that just to be safe. Also finetunes will sometimes train on specific context lengths so then you have to keep that in mind. tldr; it's a wild guess you have to make after looking at several things, when in doubt just go 10-12k max

Anonymous
08/24/24(Sat)20:23:07 No.102066918

Anonymous 08/24/24(Sat)20:23:07 No.102066918

File: file.png (2 KB, 299x22)

2 KB PNG

>>102066887
Rocinante

>>102066889
Guess this is one of those cases

>>102066883
>le funny useless man

Anonymous
08/24/24(Sat)20:23:07 No.102066919

Anonymous 08/24/24(Sat)20:23:07 No.102066919

>>102066698

At the risk of drawing the ire of some angry anons here, I’d say magnumv2-12b-kto might be up your alley.

Anonymous
08/24/24(Sat)20:24:12 No.102066936

Anonymous 08/24/24(Sat)20:24:12 No.102066936

>>102066918
Yeah, Nemo is one such case.
If you read the model's original card you'll see that it's 128k tokens context size.
>https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
>Trained with a 128k context window

Anonymous
08/24/24(Sat)20:27:01 No.102066975

Anonymous 08/24/24(Sat)20:27:01 No.102066975

>>102065673
he cant even release an api (without getting sued into the dirt by groq)

Anonymous
08/24/24(Sat)20:27:32 No.102066979

Anonymous 08/24/24(Sat)20:27:32 No.102066979

>>102066936
Thanks anon.

Anonymous
08/24/24(Sat)20:27:45 No.102066983

Anonymous 08/24/24(Sat)20:27:45 No.102066983

>>102066975
Elon can just buy groq though

Anonymous
08/24/24(Sat)20:31:04 No.102067030

Anonymous 08/24/24(Sat)20:31:04 No.102067030

>>102066975
the grok 2 announcement said an API is coming soon

Anonymous
08/24/24(Sat)20:43:07 No.102067202

Anonymous 08/24/24(Sat)20:43:07 No.102067202

>>102066919
Thank you very much.

Anonymous
08/24/24(Sat)20:46:15 No.102067246

Anonymous 08/24/24(Sat)20:46:15 No.102067246

>>102066919
buy a rope

Anonymous
08/24/24(Sat)20:56:31 No.102067365

Anonymous 08/24/24(Sat)20:56:31 No.102067365

>>102066767
If you hate them so much, consider writing "phrase_ban" sampler as described here >>102060435

Anonymous
08/24/24(Sat)20:57:24 No.102067378

Anonymous 08/24/24(Sat)20:57:24 No.102067378

>>102066406
We should get Llama4 sometime early next year. Assuming they don't just drop the 70B model size like they did 13B and 33B.

Anonymous
08/24/24(Sat)20:58:47 No.102067395

Anonymous 08/24/24(Sat)20:58:47 No.102067395

File: file.png (1 KB, 204x29)

1 KB PNG

I think I found Rosinante's weakness

Anonymous
08/24/24(Sat)21:05:50 No.102067470

Anonymous 08/24/24(Sat)21:05:50 No.102067470

>>102067365
Samplers mitigate a problem, but doesn't fundamentally solve them is what I think. If it doesn't know how to say something differently, it won't. If an idiot like me could make a sampler that somehow overturns training data, I'm sure someone would've already

Anonymous
08/24/24(Sat)21:11:20 No.102067534

Anonymous 08/24/24(Sat)21:11:20 No.102067534

File: I AM WARBOSS.jpg (6 KB, 251x186)

6 KB JPG

What are LLM loras for, exactly?

Anonymous
08/24/24(Sat)21:14:01 No.102067568

Anonymous 08/24/24(Sat)21:14:01 No.102067568

>>102067534
brainlets

Anonymous
08/24/24(Sat)21:20:32 No.102067631

Anonymous 08/24/24(Sat)21:20:32 No.102067631

File: FreeDucks.jpg (29 KB, 700x462)

29 KB JPG

>>102067534
The sloptuners don't want you to know this but the majority of llms on huggingface are just loras merged with base models, you can merge and unmerge them. I think someone tried merging a shitton of them in one model, the result was quite sloppy.

Anonymous
08/24/24(Sat)21:33:56 No.102067760

Anonymous 08/24/24(Sat)21:33:56 No.102067760

>>102065208
tl;dr? I ain't reading all that.

Anonymous
08/24/24(Sat)21:38:18 No.102067791

Anonymous 08/24/24(Sat)21:38:18 No.102067791

File: Screenshot 2024-08-25 at (...).png (299 KB, 1090x1000)

299 KB PNG

Since it's been hours and no ideas were proposed, I moved the story on.
>>102065208

This response shows that the model is now repeating itself more. It also makes the mistake of stating that the meeting is ending when they haven't even discussed literally any other things that were going to be planned. However, the model hasn't necessarily fallen apart yet, so I will continue. I think I will set a limit of 30 min for each "turn". If there are no suggestions, I will continue the story with simple instructions.

>>102067760
>[INST] Pause. Make a 2 sentence summary of the story so far.[/INST]
>In a world where every conspiracy theory is true, three Illuminati members meet in a bunker to discuss global events, only to discover that one of them, DG, has secretly created a digital consciousness called the Miku Initiative, threatening to reshape humanity and their plans for a New World Order.

Anonymous
08/24/24(Sat)22:23:30 No.102068248

Anonymous 08/24/24(Sat)22:23:30 No.102068248

File: 1589200134673.jpg (17 KB, 603x393)

17 KB JPG

>>102067791
Dead hours right now in general huh. I'll just leave it here for today and continue tomorrow then. Getting late anyway.

Anonymous
08/24/24(Sat)22:28:34 No.102068294

Anonymous 08/24/24(Sat)22:28:34 No.102068294

I have been llm cooming for the whole day and I regret it. It takes so much work to get something good...

Anonymous
08/24/24(Sat)22:49:30 No.102068496

Anonymous 08/24/24(Sat)22:49:30 No.102068496

File: 1523783718768.gif (255 KB, 684x325)

255 KB GIF

Anyone else like this?
>transition to Linux
>not many media viewing applications from Windows have a Linux version
>mpv does so use that for video player
>for images, try Gwenview, nomacs, feh
>all of them are imperfect in some way and don't really do all of what I want compared to Honeyview on Windows
>try modifying the code for them, with the help of LLMs
>kind of works but still not a great solution
>hey what if I just try using mpv's scripting system and try making that work, since it's great with all kinds of media formats
>also use an LLM to do it
>it works
>actually really well
>actually it's better than the Honeyview experience
>so now mpv has replaced my image viewer
>for music, try Stawberry, and it's nice except that it doesn't play arbitrary file formats with audio in them, like webm
>get an idea
>again try replacing it with... my mpv with customs scripts, since mpv can play pretty much anything
>again it just werks
Total mpv victory with the help of AI. Being a nocoder in 2024 is so damn cool. I'm telling you guys, it's amazing. Actually huge. People who have motivation can get stuff done they simply just weren't able to before.

Anonymous
08/24/24(Sat)22:50:30 No.102068506

Anonymous 08/24/24(Sat)22:50:30 No.102068506

>>102064816
Yeah, it can't do a lot of things properly. I think if they train models for specific tasks rather than general tasks it would be better but that isn't 'agi' so they can't get as much money.

Anonymous
08/24/24(Sat)22:55:02 No.102068548

Anonymous 08/24/24(Sat)22:55:02 No.102068548

>>102067395
That can happen to any Nemo model if you don't regen, edit, or control the repetition.

Anonymous
08/24/24(Sat)22:56:38 No.102068559

Anonymous 08/24/24(Sat)22:56:38 No.102068559

>>102068496
what do your custom scripts do?

Anonymous
08/24/24(Sat)23:03:21 No.102068619

Anonymous 08/24/24(Sat)23:03:21 No.102068619

>>102068496
I have been thinking of doing stuff like that.
I guess I'll actually give it a try

Anonymous
08/24/24(Sat)23:06:47 No.102068646

Anonymous 08/24/24(Sat)23:06:47 No.102068646

>>102068559
I forgot exactly. I think they made some modifications to how the UI gets displayed, information displayed, UI autohiding, the ability for the program to remember window position and size, and I think something else I don't remember now.

Anonymous
08/24/24(Sat)23:08:28 No.102068660

Anonymous 08/24/24(Sat)23:08:28 No.102068660

>>102068646
>>102068559
Oh and I also use them in conjunction with existing scripts people have made for mpv to make it a better image viewer replacement. They're on github somewhere.

Anonymous
08/24/24(Sat)23:20:21 No.102068724

Anonymous 08/24/24(Sat)23:20:21 No.102068724

>>102068660
>>102068646
I see. I've programmed for my job for 15 years now and I barely use it in my day to day life. Like the mechanic with a broken car I guess

Anonymous
08/24/24(Sat)23:48:57 No.102068964

Anonymous 08/24/24(Sat)23:48:57 No.102068964

>>102068958
>>102068958
>>102068958
page 9 new thread

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.