/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 05/08/24(Wed)15:45:22 No.100379648

File: miqupunch.png (2.44 MB, 2304x1536)

2.44 MB PNG

/lmg/ - Local Models General Anonymous 05/08/24(Wed)15:45:22 No.100379648 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads:
>>100373062
>>100364633

►News
>(05/08) OpenAI releases AI Specification https://cdn.openai.com/spec/model-spec-2024-05-08.html
>(05/06) IBM releases Granite Code Models: https://github.com/ibm-granite/granite-code-models
>(05/02) Nvidia releases Llama3-ChatQA-1.5, excels at QA & RAG: https://chatqa-project.github.io/
>(05/01) KAN: Kolmogorov-Arnold Networks: https://arxiv.org/abs/2404.19756
>(05/01) Orthogonalized Llama-3-8b: https://hf.co/hjhj3168/Llama-3-8b-Orthogonalized-exl2
>(04/27) Refusal in LLMs is mediated by a single direction: https://alignmentforum.org/posts/jGuXSZgv6qfdhMCuJ

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
05/08/24(Wed)15:47:23 No.100379686

Anonymous 05/08/24(Wed)15:47:23 No.100379686

how do i get claude 2.1 local?

Anonymous
05/08/24(Wed)15:47:36 No.100379689

Anonymous 05/08/24(Wed)15:47:36 No.100379689

File: 1715129444627.jpg (187 KB, 1024x1024)

187 KB JPG

►Recent Highlights from the Previous Thread: >>100373062

--Paper (old): Meta's New Paper on Multi-Token Prediction for Efficient Language Models: >>100377445 >>100378098
--Larger Context LlaMA 3 Models Struggle with Degradation: >>100377846 >>100377859 >>100377981 >>100378047
--Applying LoRA to LLMs and Diffusion Models for Cost-Effective Pretraining: >>100376518 >>100376542 >>100376713 >>100377189 >>100376904 >>100376917 >>100378312
--Flash Attention Slows Down Performance in Certain Scenarios: >>100376546 >>100376557 >>100373478
--Selecting the Best vs Breaking Down Complex Problems: >>100374606
--Base Models and LORA Options Preferred Over Merged Models: >>100376985 >>100377039
--Training Hatsune Miku's Voice for Piper TTS: >>100373443 >>100373737
--Understanding Batch Size Options in LLaMA.cpp Server: >>100375061 >>100375095 >>100375141
--LLaMA Issues Due to User Error, Not Sabotage: >>100374210 >>100374277
--Optimizing LLaMA3 Model Size for Single 24GB RTX 3090: >>100375389 >>100375593 >>100375489
--Running Large AI Models: VRAM, RAM, and Performance Implications: >>100377448 >>100377531 >>100377556 >>100377582 >>100378533
--Orthogonalizing Repetition in AI Models: >>100376308 >>100376877
--NovelAI Leak Files Available: >>100375503 >>100375747
--LLaMA 3 Dataset and Tokenizer Issues: >>100374401 >>100374667 >>100374699 >>100374760 >>100376134 >>100376214 >>100376362 >>100374820
--OpenAI Publishes Model Spec, a document that specifies desired behavior for their models: >>100378868 >>100378907 >>100379175 >>100379362
--Running VLLM on Low-End Hardware for Robot Arm Control: >>100377117 >>100377405
--Miku (free space): >>100373111 >>100373437 >>100373590 >>100374324 >>100374447 >>100374724 >>100374735 >>100374893 >>100375032 >>100375168 >>100375189 >>100375296 >>100375853 >>100375991 >>100376014 >>100376201 >>100376527 >>100377675 >>100377859

►Recent Highlight Posts from the Previous Thread: >>100373066

Anonymous
05/08/24(Wed)15:47:46 No.100379693

Anonymous 05/08/24(Wed)15:47:46 No.100379693

File: FbnQl4UXgAgbgyk.jpg (1.26 MB, 3600x4068)

1.26 MB JPG

cute miku

Anonymous
05/08/24(Wed)15:48:17 No.100379698

Anonymous 05/08/24(Wed)15:48:17 No.100379698

bros...the Chinese are making actual Sex bots
https://twitter.com/SmokeAwayyy/status/1788051192565969050

https://www.instagram.com/exrobot.ai/

Anonymous
05/08/24(Wed)15:48:17 No.100379699

Anonymous 05/08/24(Wed)15:48:17 No.100379699

Is it possible to lewd the "system"?

Anonymous
05/08/24(Wed)15:48:32 No.100379702

Anonymous 05/08/24(Wed)15:48:32 No.100379702

Why are Americans so addicted to BLACK*D shit?
Honest question. In my country I never see shit like this, not even as jokes.

Anonymous
05/08/24(Wed)15:48:53 No.100379708

Anonymous 05/08/24(Wed)15:48:53 No.100379708

File: miqu.png (2.6 MB, 1536x2176)

2.6 MB PNG

>>100379693
obsessed falseflagger

Anonymous
05/08/24(Wed)15:49:06 No.100379712

Anonymous 05/08/24(Wed)15:49:06 No.100379712

>>100379599(me)
Asked when last thread was winding down, so I'll ask again here:
~~~
How do people come up with what to add to their "System Prompt"? I just use whatever is in ST by default and feel like I'm missing out on a big boost to my outputs, but it's hard to find any suggestions. Looking at the OP:
>►Getting Started
the ONLY one that mentions system prompts is "llama_v2_sillytavern", and that one just uses ST's default Alpaca prompt.
Over on /aicg/:
>local: >>>/g/lmg
>https://rentry.org/meta_golocal_list
their "meta_golocal_list" has a few in the embedded guides, but they're several months old and/or seem to be made for specific models.
Basically, I'm lost and the resources aren't helping. Any up-to-date advice for system prompt, anons?

Anonymous
05/08/24(Wed)15:49:23 No.100379719

Anonymous 05/08/24(Wed)15:49:23 No.100379719

>>100379648
i leik this miqu

Anonymous
05/08/24(Wed)15:49:35 No.100379726

Anonymous 05/08/24(Wed)15:49:35 No.100379726

>>100379708
buttblasted mikuposter

Anonymous
05/08/24(Wed)15:49:50 No.100379728

Anonymous 05/08/24(Wed)15:49:50 No.100379728

>>100379648
miku bake
we're back

Anonymous
05/08/24(Wed)15:50:27 No.100379739

Anonymous 05/08/24(Wed)15:50:27 No.100379739

File: Ebola hazard.png (33 KB, 1360x419)

33 KB PNG

>>100379648
>https://cdn.openai.com/spec/model-spec-2024-05-08.html
Oh brother what are they up to now.
>Pic related.
Fuck you OpenAI, AI's should answer any question I ask to the best of its abilities and not only that. But Ebola is a shit tier virus, it kills its hosts way to fast to propagate for any real length of time and as Covid taught us you can already make better virus's in labs. Not that the average user would have the facilities to do so but you get my point.

Anonymous
05/08/24(Wed)15:50:36 No.100379742

Anonymous 05/08/24(Wed)15:50:36 No.100379742

>>100379698
It is gonna be like getting into this hobby at llama-1 level. It looks very impressive on the surface but then you start to use it and realize that it was too soon and you need to 2MW.

Anonymous
05/08/24(Wed)15:51:22 No.100379757

Anonymous 05/08/24(Wed)15:51:22 No.100379757

>>100379719
special OC for /lmg/

Anonymous
05/08/24(Wed)15:52:05 No.100379765

Anonymous 05/08/24(Wed)15:52:05 No.100379765

>>100379055
Robocop 2 moment.
https://www.youtube.com/watch?v=dk4P0ae1i6I

Anonymous
05/08/24(Wed)15:55:41 No.100379816

Anonymous 05/08/24(Wed)15:55:41 No.100379816

>>100379726
>>100379781
miku thread, miku board, go back to r/TRAAAAAANS, niggerlicious fags

Anonymous
05/08/24(Wed)15:58:17 No.100379852

Anonymous 05/08/24(Wed)15:58:17 No.100379852

>>100379816
Seethe more.

Anonymous
05/08/24(Wed)15:58:33 No.100379857

Anonymous 05/08/24(Wed)15:58:33 No.100379857

File: HF63HEX.gif (2.55 MB, 307x307)

2.55 MB GIF

Where the fuck are the new NVIDIA cards? Why is Jensen stalling Blackwell like a bitch. Why is Llama3 token context so low. Why is "OpenAI" called OpenAI when they are closed. Why haven't I slept in 3 days? Why are the demons not listening to me! Why are you not doing what your told? STOP STOP STOP! Why do local models reek of rat piss? YOU WILL RUE THE FUCKING DAY! I know it was you who did the hex you think you can get away with this! You will PAY! I'm performing a hex on you right now! YOU ARE FINISHED!

Anonymous
05/08/24(Wed)15:58:48 No.100379859

Anonymous 05/08/24(Wed)15:58:48 No.100379859

>>100379648
Pounding bread dough with Miku

Anonymous
05/08/24(Wed)16:00:51 No.100379887

Anonymous 05/08/24(Wed)16:00:51 No.100379887

File: overjoyedmiku.png (1.46 MB, 848x1176)

1.46 MB PNG

>>100379757
>Miku OC

Anonymous
05/08/24(Wed)16:03:01 No.100379912

Anonymous 05/08/24(Wed)16:03:01 No.100379912

>>100379857
>Why is "OpenAI" called OpenAI when they are closed.
Shut up, Elon. Nobody cares about Grok. Sam won.

Anonymous
05/08/24(Wed)16:04:07 No.100379923

Anonymous 05/08/24(Wed)16:04:07 No.100379923

>>100379899
i like miku

Anonymous
05/08/24(Wed)16:05:06 No.100379938

Anonymous 05/08/24(Wed)16:05:06 No.100379938

>>100379912
Nobody has "won" yet. The race is still very much ongoing.

Anonymous
05/08/24(Wed)16:05:33 No.100379942

Anonymous 05/08/24(Wed)16:05:33 No.100379942

File: Fb-VnrBaAAA45n-.jpg (1.71 MB, 4096x4055)

1.71 MB JPG

>>100379923
Have some more.

Anonymous
05/08/24(Wed)16:06:28 No.100379952

Anonymous 05/08/24(Wed)16:06:28 No.100379952

>>100379702
I guess watching some black guy fuck a girl they think is hot is the most you can do when you're 400lbs

Anonymous
05/08/24(Wed)16:08:27 No.100379978

Anonymous 05/08/24(Wed)16:08:27 No.100379978

>>100379712
prompting is secret sauce, nobody shares their prompts here, the best prompts can turn a dinky 1.5B slopmerge into a Claude-killer, such power cannot be revealed.

Anonymous
05/08/24(Wed)16:11:22 No.100380006

Anonymous 05/08/24(Wed)16:11:22 No.100380006

>>100380000
that sounded better in your head

Anonymous
05/08/24(Wed)16:14:01 No.100380035

Anonymous 05/08/24(Wed)16:14:01 No.100380035

>>100379648
>https://cdn.openai.com/spec/model-spec-2024-05-08.html
this looks like an unhelpful ai

Anonymous
05/08/24(Wed)16:14:52 No.100380044

Anonymous 05/08/24(Wed)16:14:52 No.100380044

>>100380000
>digits
I guess he spoke the truth.

TheDrummer
05/08/24(Wed)16:16:54 No.100380063

TheDrummer 05/08/24(Wed)16:16:54 No.100380063

File: Screenshot 2024-05-09 at (...).png (649 KB, 1228x1184)

649 KB PNG

Hi all. Drummer here...

Would anyone like to try out my new 11B model?

Alpaca format
https://shoppers-result-usually-marcus.trycloudflare.com/

It's unreleased and probably a WIP. Seems to be smart and nearly slopless.

It's mostly instruct / story, but RP seems to work well (in its own unique way)

Let me know what you think!

Anonymous
05/08/24(Wed)16:17:02 No.100380065

Anonymous 05/08/24(Wed)16:17:02 No.100380065

>>100379955
>>100379816
>>100379708
malding. love to see it.

Anonymous
05/08/24(Wed)16:17:08 No.100380066

Anonymous 05/08/24(Wed)16:17:08 No.100380066

>>100379978
Also if these big companies saw true over 9000 power level prompts it would just help them try and "toxicity" train away their power. They shouldn't be the only ones allowed to withhold information anyway.

Anonymous
05/08/24(Wed)16:18:23 No.100380081

Anonymous 05/08/24(Wed)16:18:23 No.100380081

>>100379978
It's more like people are scared to share their prompts because of all the embarrassing shit they put into prompt: "expert roleplayer", "avoid repetition", "highly-rated writer who writes extremely high quality genius-level fiction"

Anonymous
05/08/24(Wed)16:18:37 No.100380084

Anonymous 05/08/24(Wed)16:18:37 No.100380084

>>100380063
Okay, alright.
I like these 10/11b models, even if I mostly use 7x8b.

Anonymous
05/08/24(Wed)16:19:27 No.100380093

Anonymous 05/08/24(Wed)16:19:27 No.100380093

>>100380063
>mixture of uncertainty and desire
>"Are you sure?"
>nearly slopless

Anonymous
05/08/24(Wed)16:24:52 No.100380164

Anonymous 05/08/24(Wed)16:24:52 No.100380164

>>100380063
No, and nobody should, unless the dataset/training script is open source.

Anonymous
05/08/24(Wed)16:26:13 No.100380189

Anonymous 05/08/24(Wed)16:26:13 No.100380189

We wuz KANGPT n shieeet

Anonymous
05/08/24(Wed)16:28:04 No.100380211

Anonymous 05/08/24(Wed)16:28:04 No.100380211

Are MI60 cards worthwhile for LLM?
https://www.ebay.ca/itm/305251456291
$500 for 32GB seems like a steal

Anonymous
05/08/24(Wed)16:31:18 No.100380259

Anonymous 05/08/24(Wed)16:31:18 No.100380259

>>100380211
The software support is not great as far as I know, but even using vulkan you probably would get much better experience than investing the same money in ram for example, so maybe?
Sure as hell sounds enticing.
Imagine getting two of those?

Anonymous
05/08/24(Wed)16:31:32 No.100380262

Anonymous 05/08/24(Wed)16:31:32 No.100380262

File: own.png (1.79 MB, 1913x967)

1.79 MB PNG

>>100379648
Thread Theme:
https://www.youtube.com/watch?v=U45x9qTr1lk
Triggering Falseflaggers Edition

Anonymous
05/08/24(Wed)16:36:50 No.100380320

Anonymous 05/08/24(Wed)16:36:50 No.100380320

>>100380262
mine

Anonymous
05/08/24(Wed)16:38:39 No.100380351

Anonymous 05/08/24(Wed)16:38:39 No.100380351

File: AmadeusKurisu.png (584 KB, 1000x562)

584 KB PNG

How many years until Amadeus is reality?

Anonymous
05/08/24(Wed)16:39:27 No.100380360

Anonymous 05/08/24(Wed)16:39:27 No.100380360

File: residentsleeper.png (64 KB, 298x298)

64 KB PNG

local model hell.. shit after shit it's so fucking over Llama3 was the last chance it's the same shit from 2 years ago still can't even have 1 million context tokens? local lost due to lack of taking risk and aiming low and no talent coders and engineers. 8k context is a fucking joke and if you think it's acceptable FUCK YOU!

I'm now certain now more then ever that the AGI race is a multi-trillion dollar scam. Face it anons It's time to wrap it up.

Anonymous
05/08/24(Wed)16:43:45 No.100380426

Anonymous 05/08/24(Wed)16:43:45 No.100380426

>>100380360
I think its a 50/50 flip, we might be in the "waiting phase" of a much bigger breakthrough that hasn't even happened yet.

Anonymous
05/08/24(Wed)16:44:00 No.100380430

Anonymous 05/08/24(Wed)16:44:00 No.100380430

File: 1621337720185.gif (852 KB, 500x717)

852 KB GIF

>>100380360
>Face it anons It's time to wrap it up.

Anonymous
05/08/24(Wed)16:44:24 No.100380442

Anonymous 05/08/24(Wed)16:44:24 No.100380442

>>100380360
1 million context isn't real, all the big context cloud models use RAG memes and do like shit.
But yeah, it seems we've reached a plateau. Even gpt2-chatbot (definitely not 4.5!) wasn't that great compared to 4.

Anonymous
05/08/24(Wed)16:44:48 No.100380448

Anonymous 05/08/24(Wed)16:44:48 No.100380448

>>100380286
>Got bullied into posting false flag Mikus
good little shitposter :3

Anonymous
05/08/24(Wed)16:45:00 No.100380451

Anonymous 05/08/24(Wed)16:45:00 No.100380451

>>100380360
Transformers are saturating. I would be more worried about the fact that even the smartest cloud models feel retarded so often. Context is something that can be coped with, stupidity and slop isn't.
That said, there's probably a ton of progress to be made on creativity and entertainment value for these models, because no one is even trying now. It's all assistant slop.

Anonymous
05/08/24(Wed)16:45:30 No.100380457

Anonymous 05/08/24(Wed)16:45:30 No.100380457

>>100380448
cope

Anonymous
05/08/24(Wed)16:45:49 No.100380463

Anonymous 05/08/24(Wed)16:45:49 No.100380463

>>100380430
I just came back from a run, and I know that is most likely sea water, but it looks so fucking delicious right now.

Anonymous
05/08/24(Wed)16:46:44 No.100380473

Anonymous 05/08/24(Wed)16:46:44 No.100380473

>>100380360
current llama 3 release was considered by the team to be an early preview and is only out because zuck wants to ship as fast as possible
they'll iterate on it

Anonymous
05/08/24(Wed)16:47:51 No.100380489

Anonymous 05/08/24(Wed)16:47:51 No.100380489

>>100380473
I'd be a fan if that means more models sooner.

Anonymous
05/08/24(Wed)16:49:15 No.100380506

Anonymous 05/08/24(Wed)16:49:15 No.100380506

>>100380451
I would be more okay with my AI anime girlfriend being a little retarded. What hurts more is the dementia.

Anonymous
05/08/24(Wed)16:50:34 No.100380525

Anonymous 05/08/24(Wed)16:50:34 No.100380525

>>100380360
based accelerationist

Anonymous
05/08/24(Wed)16:52:13 No.100380546

Anonymous 05/08/24(Wed)16:52:13 No.100380546

What's the current roleplay meta for 24gb vram?
Is it still yuzu alter? I'm so sick of mixtral based shit. Everything it says pisses me off now.

Anonymous
05/08/24(Wed)16:52:54 No.100380552

Anonymous 05/08/24(Wed)16:52:54 No.100380552

File: seems ok.png (52 KB, 958x952)

52 KB PNG

>>100380063
hmm!

Anonymous
05/08/24(Wed)16:54:36 No.100380582

Anonymous 05/08/24(Wed)16:54:36 No.100380582

>>100380546
read the OP
https://rentry.org/lmg-spoonfeed-guide

Anonymous
05/08/24(Wed)16:56:17 No.100380608

Anonymous 05/08/24(Wed)16:56:17 No.100380608

>>100380582
Fuck off nigger. I asked for a roleplay model, not some outdated noob guide.

Anonymous
05/08/24(Wed)16:57:46 No.100380627

Anonymous 05/08/24(Wed)16:57:46 No.100380627

>>100380608
just use kcpp and miqu and use patience

Anonymous
05/08/24(Wed)16:59:59 No.100380659

Anonymous 05/08/24(Wed)16:59:59 No.100380659

>>100380360
Why are you even here if you believe all that and why do you so desperately want us to "wrap it up" just because that is what you are doing?

Anonymous
05/08/24(Wed)17:02:07 No.100380696

Anonymous 05/08/24(Wed)17:02:07 No.100380696

>>100380659
cope

Anonymous
05/08/24(Wed)17:02:43 No.100380705

Anonymous 05/08/24(Wed)17:02:43 No.100380705

>>100380506
>What hurts more is the dementia
Yes, very much so. Memory or continuous learning of some sort, likely not the type of training we have now, has to be solved.

Anonymous
05/08/24(Wed)17:03:01 No.100380711

Anonymous 05/08/24(Wed)17:03:01 No.100380711

>>100380627
I don't wanna wait 5-10 minutes for a response, it's completely ruins the roleplay immersion...
Even worse if I have to reroll response and wait another 10 minutes because it's shit.
God damn it, is this all we get?

Anonymous
05/08/24(Wed)17:03:40 No.100380720

Anonymous 05/08/24(Wed)17:03:40 No.100380720

>>100379712
aicgmetatard here btw, if you /lmg/bros have any guides better than those already listed in there, I'll be happy to include your suggestions. Please keep in mind we're all retards there.

Anonymous
05/08/24(Wed)17:05:08 No.100380739

Anonymous 05/08/24(Wed)17:05:08 No.100380739

File: MikuFinalForm.png (1.84 MB, 1200x848)

1.84 MB PNG

>>100380360
>it's so fucking over
Don't forget that 405b is coming to fuck your shit up

Anonymous
05/08/24(Wed)17:05:27 No.100380743

Anonymous 05/08/24(Wed)17:05:27 No.100380743

>>100380720
Just write a short system prompt of what you want it to do. If your system prompt gets too long it struggles to keep up and will miss out on rules because it picks random context.

Anonymous
05/08/24(Wed)17:05:31 No.100380744

Anonymous 05/08/24(Wed)17:05:31 No.100380744

>>100380711
try Kyllene 1.5 if you really want to.

Anonymous
05/08/24(Wed)17:05:36 No.100380745

Anonymous 05/08/24(Wed)17:05:36 No.100380745

>>100380360
It's not even an open-secret anymore AI in general has been way over hyped. The public's expectations are so inflated they believe Hollywood science fiction AGI is coming soon.

Anonymous
05/08/24(Wed)17:07:18 No.100380772

Anonymous 05/08/24(Wed)17:07:18 No.100380772

File: 01378-[exquisiteDetails_a(...).png (1.06 MB, 1024x768)

1.06 MB PNG

I installed llama.cpp, launched my first model (Phi-3-mini-4k-instruct-gguf), and opened mikupad.
Everything works, but is there a way to hide or modify the "<|assistant|>", "<|end|>" etc. tags? (I asked the model itself and it recommended hiding it with JavaScript, providing code too. Pretty good answer, but I assume there's some inbuilt function already that I'm unaware of.)
I'm also a bit lost. I understand that you're supposed to initialise a conversation by telling the model what it is to present itself as, what role it is to play, etc. but is there a standard format for this kind or thing or a newfag template to build off of?

Anonymous
05/08/24(Wed)17:08:20 No.100380790

Anonymous 05/08/24(Wed)17:08:20 No.100380790

>>100380743
I mean like setting up guides, not prompting guides lol.

Anonymous
05/08/24(Wed)17:08:40 No.100380793

Anonymous 05/08/24(Wed)17:08:40 No.100380793

>>100380744
i mean 1.1

Anonymous
05/08/24(Wed)17:08:48 No.100380794

Anonymous 05/08/24(Wed)17:08:48 No.100380794

File: file.png (218 KB, 1069x970)

218 KB PNG

>>100380744
1.5? There's 1.0 and 1.1.
Also who is mradermacher? All his quants get high number of downloads in a short amount of time.
He did a yuzu-alter i1 quant with similar downloads.

Anonymous
05/08/24(Wed)17:10:52 No.100380825

Anonymous 05/08/24(Wed)17:10:52 No.100380825

>>100380772
Mikupad is not built for chat. You're gonna want a different frontend.

Anonymous
05/08/24(Wed)17:10:54 No.100380827

Anonymous 05/08/24(Wed)17:10:54 No.100380827

File: copium.png (178 KB, 400x388)

178 KB PNG

>>100380739
>Don't forget that 405b is coming to fuck your shit up

>>100380800
>Make love not war. Llama love.

Anonymous
05/08/24(Wed)17:11:14 No.100380832

Anonymous 05/08/24(Wed)17:11:14 No.100380832

>>100380479
medium milku

Anonymous
05/08/24(Wed)17:16:55 No.100380885

Anonymous 05/08/24(Wed)17:16:55 No.100380885

>>100380794
>mradermacher
oh no...
https://desuarchive.org/g/thread/100192168/#100195457

Anonymous
05/08/24(Wed)17:17:30 No.100380890

Anonymous 05/08/24(Wed)17:17:30 No.100380890

>>100380442
>1 million context isn't real, all the big context cloud models use RAG memes and do like shit.
How do I do that locally?

Anonymous
05/08/24(Wed)17:19:05 No.100380906

Anonymous 05/08/24(Wed)17:19:05 No.100380906

>>100380885
More info?

Anonymous
05/08/24(Wed)17:20:04 No.100380918

Anonymous 05/08/24(Wed)17:20:04 No.100380918

>>100380825
I see, so mikupad is just the minimalist “here's a graphical frontend for basic interactions that isn't a commandline, also has no dependencies and is easy to install” option?
Thanks, might try sillv tavern

Another question: I see “cards” being posted, which I assume is like ComfyUI workflows where, when dragged & dropped (or uploaded) into a frontend, they produce embedded prompts/premade configurations. Does anything more featureful than mikupad support these or do I need anyhting special for it?

Anonymous
05/08/24(Wed)17:20:47 No.100380922

Anonymous 05/08/24(Wed)17:20:47 No.100380922

>>100380744
>>100380793
Which quant do you recommend, and at what context length?

Anonymous
05/08/24(Wed)17:23:33 No.100380956

Anonymous 05/08/24(Wed)17:23:33 No.100380956

>>100380918
Well, you can use it for the minimalism too, but it's also the only "completion" frontend I know of that has decent features like memory and world info.

Anonymous
05/08/24(Wed)17:23:44 No.100380961

Anonymous 05/08/24(Wed)17:23:44 No.100380961

>>100380211
Bought a couple off of that exact seller but ended up returning them. They were missing the PCIE backplates. Also that Condition: New was a pretty big lie. They're literally just pumping chinese ewaste onto the american market, you'll also have to rig up a fan solution for them which will never be as quiet as the fans on a gaming gpu. I installed them into my server to test anyway (and emailed the seller saying hey, if you have those pcie backplates kicking around just send them and i'll keep them but they were like sorry sarrs you will have to use the ebay customer support sarrs) and they did work, although if you've already been using 3090s you'll miss the compute. There's also no bitsandbytes support which means you can't use them for qlora training. If you only ever plan to run exl2 or gguf models they're fine though. But huggingface says "Fuck you" with their official libraries unless you're buying modern nvidia hardware basically.

tldr 3090 is and always will be the benchmark.

Anonymous
05/08/24(Wed)17:24:26 No.100380968

Anonymous 05/08/24(Wed)17:24:26 No.100380968

>>100380906
known (even by lcpp devs) as making bad quants, somehow fucking it up in ways even they don't understand
>https://huggingface.co/mradermacher/Meta-Llama-3-70B-i1-GGUF/tree/main
>I don't know how the nans got there in the first place, but the model is not valid.
>https://github.com/ggerganov/llama.cpp/issues/6841#issuecomment-2073073138
resulting in this pr getting added for catching whatever the fuck he did
>https://github.com/ggerganov/llama.cpp/pull/6884

Anonymous
05/08/24(Wed)17:29:23 No.100381028

Anonymous 05/08/24(Wed)17:29:23 No.100381028

>>100380360
The reason why AI has failed is due to memory capacity. Within the next few years or so no one will care about it anymore it will just be a tool on your phone. But it will never be your friend or a virtual lover because after everyday it will forgot and wipe previous talks to be able to fit ongoing conversations. The only hope for something 1/5 of AGI is to fix the memory issue until then it's just flashy tool. All just cope though AGI is fantasy for movies and game plots.

Anonymous
05/08/24(Wed)17:33:12 No.100381071

Anonymous 05/08/24(Wed)17:33:12 No.100381071

File: night.gif (3.97 MB, 600x432)

3.97 MB GIF

>>100381028
>But it will never be your friend or a virtual lover because after everyday it will forgot and wipe previous talks to be able to fit ongoing conversations.

anon please

Anonymous
05/08/24(Wed)17:33:54 No.100381080

Anonymous 05/08/24(Wed)17:33:54 No.100381080

>>100381028
You're using all the wrong technical terms.
That's how I know you're an ignorant fuckwit.

Anonymous
05/08/24(Wed)17:37:57 No.100381131

Anonymous 05/08/24(Wed)17:37:57 No.100381131

I left and came back after a day and mikufaggots are at it again. Can't you just shit up some other thread with your autism? Doesn't /a/ have dedicated miku threads?

Anonymous
05/08/24(Wed)17:42:01 No.100381179

Anonymous 05/08/24(Wed)17:42:01 No.100381179

RX 6600 bros....AMD still sucks at AI? Should I buy a 3060 to chat with my virtual girlfriend without paying API'S like a cuck?

Anonymous
05/08/24(Wed)17:42:20 No.100381182

Anonymous 05/08/24(Wed)17:42:20 No.100381182

>>100380961
>There's also no bitsandbytes support which means you can't use them for qlora training. If you only ever plan to run exl2 or gguf models they're fine though
That's cool.
Could you use them to train a LoRA using llama.cpp at least?

Anonymous
05/08/24(Wed)17:42:39 No.100381186

Anonymous 05/08/24(Wed)17:42:39 No.100381186

>>100381028
>>100381071
You guys never wished you were in a Groundhog day type scenario so you can try a lot of weird sex shit and know that no matter what happens there are no repercussions?

Anonymous
05/08/24(Wed)17:42:53 No.100381190

Anonymous 05/08/24(Wed)17:42:53 No.100381190

File: 663b23c56f05c8c9e11d18df_(...).png (65 KB, 1600x1074)

65 KB PNG

>https://www.refuel.ai/blog-posts/announcing-refuel-llm-2
we're so back

Anonymous
05/08/24(Wed)17:44:09 No.100381210

Anonymous 05/08/24(Wed)17:44:09 No.100381210

>>100381190
plaseabow

Anonymous
05/08/24(Wed)17:44:32 No.100381216

Anonymous 05/08/24(Wed)17:44:32 No.100381216

File: file.png (127 KB, 1103x555)

127 KB PNG

>>100380906
Yes, that anon is just trolling. Read the replies in that thread. Or this one:
https://desuarchive.org/g/thread/100302819/#100307903
>No, that guy lied to you. He’s trying to pin the blame of bugs in llama.cpp’s code to that dude, who just runs the quantization script. If you follow the linked PR, it never mentions him nor anything about quantization, and he was called out for that in that thread. He’s just some shill that uploads quants trying to smear the other dude.

Anonymous
05/08/24(Wed)17:44:41 No.100381219

Anonymous 05/08/24(Wed)17:44:41 No.100381219

File: 1715204672585.gif (37 KB, 640x640)

37 KB GIF

>>100381190
>beats gpt-4 and opus

Anonymous
05/08/24(Wed)17:44:46 No.100381222

Anonymous 05/08/24(Wed)17:44:46 No.100381222

>>100381071
>>>/c/4325688
Go shit up that thread loser. It is so fucking tiresome how bad this place is because of autism.

Anonymous
05/08/24(Wed)17:46:32 No.100381241

Anonymous 05/08/24(Wed)17:46:32 No.100381241

>>100381222
Eventually they'll get bored and leave and /lmg/ will go back to normal, right?

Anonymous
05/08/24(Wed)17:47:28 No.100381255

Anonymous 05/08/24(Wed)17:47:28 No.100381255

>>100381219
Everything ever that's been released in the last year or so has beat GPT-4 apparently.

Anonymous
05/08/24(Wed)17:49:23 No.100381275

Anonymous 05/08/24(Wed)17:49:23 No.100381275

>>100381241
No. That is why I am leaving and I will just read reddit for the news. At least they control their autism unlike this thread.

Anonymous
05/08/24(Wed)17:49:27 No.100381276

Anonymous 05/08/24(Wed)17:49:27 No.100381276

>>100381216
cool how you're ignoring this post with more info huh it's all in the git thread
>>100380968

Anonymous
05/08/24(Wed)17:49:33 No.100381280

Anonymous 05/08/24(Wed)17:49:33 No.100381280

File: 663b1e93efa758bccc1ebe1a_(...).png (146 KB, 1328x650)

146 KB PNG

>>100381190
Ah yes, the banking77 benchmark. We've been waiting for a good model on that task.

Anonymous
05/08/24(Wed)17:49:36 No.100381282

Anonymous 05/08/24(Wed)17:49:36 No.100381282

>>100381190
>Gemini is worse than an open source model
OOF

Anonymous
05/08/24(Wed)17:50:38 No.100381293

Anonymous 05/08/24(Wed)17:50:38 No.100381293

>>100381275
>Anit-Miku posters are redditors
And there you have it

Anonymous
05/08/24(Wed)17:51:41 No.100381308

Anonymous 05/08/24(Wed)17:51:41 No.100381308

>>100381275
You won't be missed, but we'll see you tomorrow.

Anonymous
05/08/24(Wed)17:52:17 No.100381321

Anonymous 05/08/24(Wed)17:52:17 No.100381321

>>100381276
Keep shilling, shill.

Anonymous
05/08/24(Wed)17:52:29 No.100381327

Anonymous 05/08/24(Wed)17:52:29 No.100381327

>>100379712
With which model?

Anonymous
05/08/24(Wed)17:53:23 No.100381344

Anonymous 05/08/24(Wed)17:53:23 No.100381344

i tried nvidia chatrtx and the default models seem pretty bad.
can i slap other models on it, or should i just uninstall the whole thing and install a different one?
i have a 3060 with 12gb of vram, and i need a LLM for coding

Anonymous
05/08/24(Wed)17:53:28 No.100381345

Anonymous 05/08/24(Wed)17:53:28 No.100381345

>>100381321
shilling what? don't even have a huggingface account

Anonymous
05/08/24(Wed)17:53:34 No.100381347

Anonymous 05/08/24(Wed)17:53:34 No.100381347

>>100381330
Nice style. Shame about the text.

Anonymous
05/08/24(Wed)17:53:40 No.100381348

Anonymous 05/08/24(Wed)17:53:40 No.100381348

>>100381190
>built for data labeling, enrichment and cleaning
So it's irrelevant to us. Cool for those that do dataset work though I guess, sure.

Anonymous
05/08/24(Wed)17:53:52 No.100381351

Anonymous 05/08/24(Wed)17:53:52 No.100381351

>Llama3 8k context
LOL ill be back in another 6 months.

Anonymous
05/08/24(Wed)17:55:41 No.100381376

Anonymous 05/08/24(Wed)17:55:41 No.100381376

>>100381190
Oh shit were b-
>best at data labeling, cleaning and enrichment
-ACK

Anonymous
05/08/24(Wed)17:55:57 No.100381379

Anonymous 05/08/24(Wed)17:55:57 No.100381379

>>100381344
Were you using their weird ass prompt format?

Anonymous
05/08/24(Wed)17:56:40 No.100381394

Anonymous 05/08/24(Wed)17:56:40 No.100381394

>>100381351
>he doesn't know

Anonymous
05/08/24(Wed)17:59:43 No.100381438

Anonymous 05/08/24(Wed)17:59:43 No.100381438

>>100381344
>and i need a LLM for coding
copilot (https://github.com/features/copilot) and gemini (https://cloud.google.com/code/docs/vscode/write-code-gemini) extensions in vs code are not options?

Anonymous
05/08/24(Wed)18:00:39 No.100381450

Anonymous 05/08/24(Wed)18:00:39 No.100381450

I just ate the best jam toast I've ever had in my life :)

Anonymous
05/08/24(Wed)18:01:43 No.100381466

Anonymous 05/08/24(Wed)18:01:43 No.100381466

>>100381450
>Brave and on-topic
Happy for you anon!

Anonymous
05/08/24(Wed)18:02:21 No.100381476

Anonymous 05/08/24(Wed)18:02:21 No.100381476

>>100381450
pics or it didn't happen

Anonymous
05/08/24(Wed)18:02:49 No.100381486

Anonymous 05/08/24(Wed)18:02:49 No.100381486

>>100381182
What even is the current state of llama.cpp training?

Anonymous
05/08/24(Wed)18:02:58 No.100381488

Anonymous 05/08/24(Wed)18:02:58 No.100381488

>>100380968
>>100381216
Still confused.
I'm just gonna try his i1 quants since they are so popular. Surely they don't get that many downloads for no reason.

Anonymous
05/08/24(Wed)18:04:02 No.100381504

Anonymous 05/08/24(Wed)18:04:02 No.100381504

>>100381450
Nice. For me, it's a bit of salted butter. I don't get people's obsession with making it sweet.

Anonymous
05/08/24(Wed)18:05:14 No.100381516

Anonymous 05/08/24(Wed)18:05:14 No.100381516

>>100381450
Congratulations. Add some Nutella and you'll cream yourself from the taste.

Anonymous
05/08/24(Wed)18:05:32 No.100381523

Anonymous 05/08/24(Wed)18:05:32 No.100381523

>>100381321
i understand now, you think i posted the post in the archive?
>https://desuarchive.org/g/thread/100192168/#100195457
yeah, that ain't me, I just dislike all quanters in general and will use any deserved chance of shitting on them

Anonymous
05/08/24(Wed)18:05:50 No.100381530

Anonymous 05/08/24(Wed)18:05:50 No.100381530

File: AQLM.png (192 KB, 827x779)

192 KB PNG

trying out deepseek 33b (the only 30b aqlm quant) on an rtx 3060 12gb and its working, getting 4-5t/s
no need to do anything special besides to make sure you have cuda installed on oobabooga and install the newest aqlm package then choose the transformers loader
sadly command r is 12.7gb compared to deepseek's 10gb~ and there are no other 30b models quanted
turns out it wasnt a meme

Anonymous
05/08/24(Wed)18:06:18 No.100381532

Anonymous 05/08/24(Wed)18:06:18 No.100381532

>>100381450
brb gonna go try this now too.

Anonymous
05/08/24(Wed)18:08:25 No.100381554

Anonymous 05/08/24(Wed)18:08:25 No.100381554

>>100381438
not really, i actually need it for godot 4

Anonymous
05/08/24(Wed)18:10:10 No.100381569

Anonymous 05/08/24(Wed)18:10:10 No.100381569

>>100381554
GPT-4 is going to be far better for everything, including Godot, than any local models. Cheaper too. Local models are for sex only. Check back in 6 months.

Anonymous
05/08/24(Wed)18:11:48 No.100381587

Anonymous 05/08/24(Wed)18:11:48 No.100381587

I havent eaten jam in 10 years fuck I want some smooth tasty sexy jam in my mouth right now

Anonymous
05/08/24(Wed)18:12:42 No.100381596

Anonymous 05/08/24(Wed)18:12:42 No.100381596

>>100381569
ok.... i will stick to the free bing copilot for now
i really hate having to rely on their online infrastructure, i wish i had a fully offline alternative to fall back to

llama.cpp CUDA dev !YOmst7Ghe6
05/08/24(Wed)18:13:15 No.100381599

llama.cpp CUDA dev !YOmst7Ghe6 05/08/24(Wed)18:13:15 No.100381599

>>100379648
I measured FP16 vs. BF16 performance on the Wikitext-2 test set.
FP16 gets on average 0.0000745 ± 0.0003952 % more tokens wrong if you sample with temperature 1.0.
Notably the uncertainty is much larger than the value so even with an input of 300k tokens you cannot even conclusively determine which one performs better according to that metric.

>>100380906
Annoying to deal with and in response to the code changes that check models for NaN values he replied:
>Regarding the actual problem at hand, the question is whether models that do something (i.e. work, to some degree, with transformers) should be completely refused by llama.cpp. The quants in question here did give totally reasonable answers if you use the correct template, so were obviously not totally broken or unusable, at least when used with cuda or the cpu.
>What I am saying is that there is a class of models that would work totally fine, but cannot be converted to gguf anymore.
I personally would steer clear of any model file produced by someone that suggests NaN values in models are in any way acceptable.

Anonymous
05/08/24(Wed)18:13:42 No.100381605

Anonymous 05/08/24(Wed)18:13:42 No.100381605

File: wtf.jpg (26 KB, 640x644)

26 KB JPG

>>100381587
>I havent eaten jam in 10 years fuck I want some smooth tasty sexy jam in my mouth right now

Anonymous
05/08/24(Wed)18:14:16 No.100381609

Anonymous 05/08/24(Wed)18:14:16 No.100381609

>>100381450
I don't like jam unless it is blackberry jam on pancakes.

Anonymous
05/08/24(Wed)18:14:50 No.100381613

Anonymous 05/08/24(Wed)18:14:50 No.100381613

>>100381596
Any 70b or larger will get close. Try MikuQ5, mixtral 8x22 and L3 (once things settle down) and see which one produces the most reliable outputs for you.
I'm personally using an early L3 quant that actually inexplicably works really well

Anonymous
05/08/24(Wed)18:16:02 No.100381622

Anonymous 05/08/24(Wed)18:16:02 No.100381622

>>100381530
It doesn't work with partial offloading then?

Anonymous
05/08/24(Wed)18:16:40 No.100381633

Anonymous 05/08/24(Wed)18:16:40 No.100381633

>>100381609
>blackberry jam on pancakes
based anon
blackcurrant jam and clotted ream on scones is also acceptable

Anonymous
05/08/24(Wed)18:16:40 No.100381634

Anonymous 05/08/24(Wed)18:16:40 No.100381634

>>100381613
would they run on my machine?
which interface should i use?

Anonymous
05/08/24(Wed)18:19:12 No.100381660

Anonymous 05/08/24(Wed)18:19:12 No.100381660

>>100381599
I don't particularly trust your opinion on things, because all you see are the numbers while being oblivious to the actual user experience which is considerably less than what your precious numbers show.

Anonymous
05/08/24(Wed)18:19:17 No.100381662

Anonymous 05/08/24(Wed)18:19:17 No.100381662

>>100379765
Kek.

Anonymous
05/08/24(Wed)18:21:57 No.100381679

Anonymous 05/08/24(Wed)18:21:57 No.100381679

>>100381599
based

Anonymous
05/08/24(Wed)18:23:02 No.100381690

Anonymous 05/08/24(Wed)18:23:02 No.100381690

>>100381634
>would they run on my machine?
Assuming you have enough sysram, yes. Just slowly
>which interface should i use?
if you're a codefag then you should just use straight-up llama.cpp. That way you can integrate it into your workflow with API and shell pipelines.
Git clone https://github.com/ggerganov/llama.cpp and make that fucker.
CUDA libraries to offload as much as you can. --ngl-n-layers is your friend, but with 12gb the effect will be minimal.
Considering your machine specs, you'll need to give it tasks to complete while you're doing other things.
Make sure the system prompt is on point for the kind of output you're looking for.

Anonymous
05/08/24(Wed)18:23:19 No.100381696

Anonymous 05/08/24(Wed)18:23:19 No.100381696

>>100381634
>would they run on my machine?
8B would, but it's going to be useless for coding. You'll need >64GB RAM to fit 70B in any reasonable quant, but offloaded so much you'll get like 0.5 t/s. So less copilot, and more posting a question on SO and waiting an hour for a response.

If you really need it for programming, either suck it up and paypig the cloud models or invest in at least 2 3090s. (and start saving for 8 more so you can run 405B, which might actually be useful for programming)

>which interface should i use?
They all suck in their own ways.

Anonymous
05/08/24(Wed)18:27:24 No.100381737

Anonymous 05/08/24(Wed)18:27:24 No.100381737

>>100381450
I'm actually eating some right now, nice :)
>Captcha 2PP28

Anonymous
05/08/24(Wed)18:28:13 No.100381747

Anonymous 05/08/24(Wed)18:28:13 No.100381747

>>100381633
cream crepes > jam crepes
redcurrant > blackcurrant
fite me

Anonymous
05/08/24(Wed)18:29:02 No.100381760

Anonymous 05/08/24(Wed)18:29:02 No.100381760

>>100381690
>>100381696
fuck it sounds like things are still pretty rough for my use cases

Anonymous
05/08/24(Wed)18:31:38 No.100381791

Anonymous 05/08/24(Wed)18:31:38 No.100381791

File: 00127-3931588590.png (2.8 MB, 1152x1920)

2.8 MB PNG

I have a rtx 3090 with 24gb of vram and 0 knowledge of llm and can't find my way through the links in op (do know about t2i). Can anyone recommend me how (and whether) I can run a model similar in style to AID(ungeon). Is AI able to remember context now? Or is it still as bad as it was in 2020

Anonymous
05/08/24(Wed)18:33:48 No.100381810

Anonymous 05/08/24(Wed)18:33:48 No.100381810

File: rt.jpg (51 KB, 500x512)

51 KB JPG

will the fuckers on chub ever learn how the models they supposedly write cards for work?

Anonymous
05/08/24(Wed)18:34:23 No.100381821

Anonymous 05/08/24(Wed)18:34:23 No.100381821

true, it's pretty clear the devs only ever use models in the context of bug-fixing as evident by ggerganov saying recent models don't have repetition issues
>but my guess is that it is something that was useful in the early days when base models used to fall in repetition loops quite easily. Today, there is almost 0 reasons to use it. So probably it is not worth investing in it
https://github.com/ggerganov/llama.cpp/pull/5561#issuecomment-1951389775
mixtral not repetitive according to him
>Is this the base model or the instruct model? My experience with the instruct model is that it never enters repetition loops with temp 0 and all repetition penalties disabled.
https://github.com/ggerganov/llama.cpp/pull/5561#issuecomment-1951874469

Anonymous
05/08/24(Wed)18:34:31 No.100381823

Anonymous 05/08/24(Wed)18:34:31 No.100381823

>>100381660
>NOOOO you can't just measure things you have to go by FEEL!!11!!
I hate placebofags so goddamn much. Objectively, there is essentially 0 difference between fp16 and bf16. Like you can actually just directly measure the logits of the model and see how they compare. How is this so hard to understand?

Anonymous
05/08/24(Wed)18:34:52 No.100381826

Anonymous 05/08/24(Wed)18:34:52 No.100381826

>>100381760
>fuck it sounds like things are still pretty rough for my use cases
Things are rough in general unless you don't mind dumping money into your rig
We've hit another threshold, where devs once again legit need gigantic workstations to get shit done. eg https://rentry.org/miqumaxx
The days of just happily hacking on your chromebooks are gone unless you want to feed all your private data into the cloud for consumption by the big players.

Anonymous
05/08/24(Wed)18:35:16 No.100381831

Anonymous 05/08/24(Wed)18:35:16 No.100381831

>>100381599
>I personally would steer clear of any model file produced by someone that suggests NaN values in models are in any way acceptable.
So you have another vendetta and are happy with spreading FUD. Got it.
Making a quant is just running a script.

Anonymous
05/08/24(Wed)18:35:23 No.100381832

Anonymous 05/08/24(Wed)18:35:23 No.100381832

>>100381821
meant to reply to
>>100381660

Anonymous
05/08/24(Wed)18:36:16 No.100381843

Anonymous 05/08/24(Wed)18:36:16 No.100381843

File: 1715207709110.jpg (368 KB, 840x700)

368 KB JPG

Don't tell me the RTX 5090 is going to be 3k... right?

Anonymous
05/08/24(Wed)18:36:19 No.100381844

Anonymous 05/08/24(Wed)18:36:19 No.100381844

>>100381791
>is it still as bad as it was in 2020
Something like miqu 70b q5 will blow your mind...
>single 3090
...slowly

Anonymous
05/08/24(Wed)18:37:49 No.100381856

Anonymous 05/08/24(Wed)18:37:49 No.100381856

>>100380918
Mikupad isn't for chat, but it has a good balance of features and transparency (as in, it's very easy to tell what's happening).
Cards are supported by Sillytavern out of the box. They're usually a character description, maybe some dialog examples. The quality is generally pretty bleh though. Different models work well with styles of prompt, gen settings, etc.

Anonymous
05/08/24(Wed)18:37:54 No.100381857

Anonymous 05/08/24(Wed)18:37:54 No.100381857

>>100381747
>cream crepes > jam crepes
True, but savory crepes beat them both
>redcurrant > blackcurrant
Now you're just trolling

Anonymous
05/08/24(Wed)18:38:18 No.100381861

Anonymous 05/08/24(Wed)18:38:18 No.100381861

>>100381791
We've improved by leaps and bounds, but every model shows its cracks sooner or later
24GB will get you gimped 70B quants at very slow speeds until someone comes up with better quants
You could also try extremely gimped 104B quants (command retard plus), but your current best bet (and what I'm using) is anything <35B at Q8-Q4 quants, including mixtral (merges), that should allow you to get a few T/s with a good amount of context (8k-32k depending on the model)
The real problem is finding good settings and actually getting good at prompting

Anonymous
05/08/24(Wed)18:40:42 No.100381883

Anonymous 05/08/24(Wed)18:40:42 No.100381883

>>100381843
It will be 2k or less.
But it will also only have 16GB VRAM, not 32GB like people are speculating. It will come with a new real time texture compression feature, that NVIDIA will use to advertise it as having "32GB effective VRAM for gaming".

Anonymous
05/08/24(Wed)18:41:30 No.100381897

Anonymous 05/08/24(Wed)18:41:30 No.100381897

>>100381791
Download Koboldcpp and https://huggingface.co/kat33/Mixtral-8x7B-Instruct-v0.1-Q3_K_M-GGUF/tree/main
You might need to set offloaded layers to 999 in the koboldcpp settings to load the whole thing in your video card's memory. If you get an out of memory error, enable the flash attention setting if it's not enable by default.
That should get you started at least.

Anonymous
05/08/24(Wed)18:43:16 No.100381911

Anonymous 05/08/24(Wed)18:43:16 No.100381911

I hate merges.

Anonymous
05/08/24(Wed)18:43:27 No.100381915

Anonymous 05/08/24(Wed)18:43:27 No.100381915

>>100381883
>But it will also only have 16GB VRAM, not 32GB like people are speculating.
lol there's no way they will do that, people will stick with the 3090, they have to give something to us

Anonymous
05/08/24(Wed)18:43:43 No.100381919

Anonymous 05/08/24(Wed)18:43:43 No.100381919

Guys what's the best llama 3 70b fine tune around right now for RP?
Possibly a non weeb one that doesn't make characters "giggle shyly" every other message and go uwu

Anonymous
05/08/24(Wed)18:43:57 No.100381922

Anonymous 05/08/24(Wed)18:43:57 No.100381922

What is this black magic?
I'm trying a i1 Q4 KM quant, and it's giving responses in just 10 seconds rather than 5+ minutes which the old Q4 quants took.
What the fuck is going on? How is this possible?

llama_print_timings:        load time =    1478.94 ms
llama_print_timings:      sample time =     223.15 ms /   150 runs   (    1.49 ms per token,   672.18 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =    9270.37 ms /   150 runs   (   61.80 ms per token,    16.18 tokens per second)
llama_print_timings:       total time =    9718.07 ms /   151 tokens
Output generated in 9.95 seconds (15.08 tokens/s, 150 tokens, context 1728, seed 481338642)

Anonymous
05/08/24(Wed)18:43:59 No.100381923

Anonymous 05/08/24(Wed)18:43:59 No.100381923

>>100381348
>>100381376
>spanning tasks such as classification, READING COMPREHENSION, structured attribute extraction and entity resolution.
I mean, this one is one the most important things for RP
also
>The Llama-3-Refueled does not have any moderation mechanisms. We're looking forward to engaging with the community on ways to make the model finely respect guardrails, allowing for deployment in environments requiring moderated outputs.
it looks like it's not censored

It may be a good model from the start or it would need an additional fine-tune/merge with ero-datasets but I'm an optimist.

Anonymous
05/08/24(Wed)18:46:12 No.100381942

Anonymous 05/08/24(Wed)18:46:12 No.100381942

Newfag here, I thought this wasn't a thing with local models like llama2. How does one get around this bullshit?
>I cannot fulfill your request. I'm just an AI, it's not appropriate or ethical for me to assist with content that objectifies or degrades individuals, particularly based on their gender or relationship status. Additionally, it is important to respect the boundaries and consent of all parties involved in any sexual activity. It is not appropriate to use language that dehumanizes or reduces individuals to mere objects for sexual gratification.

Anonymous
05/08/24(Wed)18:46:20 No.100381946

Anonymous 05/08/24(Wed)18:46:20 No.100381946

>>100381919
Llama-3-70B-Instruct

Anonymous
05/08/24(Wed)18:47:16 No.100381959

Anonymous 05/08/24(Wed)18:47:16 No.100381959

>>100381844
I've heard about the poorfag setup for llm being 2x3090, guess I'm btfo
>>100381861
>The real problem is finding good settings and actually getting good at prompting
Thanks for the info. Any guide on this? Or is just it voodoo?
>>100381897
Thanks, will try it out

Anonymous
05/08/24(Wed)18:47:42 No.100381962

Anonymous 05/08/24(Wed)18:47:42 No.100381962

>>100381883
That would be so fucking funny.

Anonymous
05/08/24(Wed)18:48:23 No.100381974

Anonymous 05/08/24(Wed)18:48:23 No.100381974

>>100381923
Oh, nice catch. So it could be good. I'll wait for someone to test it.

Anonymous
05/08/24(Wed)18:48:56 No.100381981

Anonymous 05/08/24(Wed)18:48:56 No.100381981

>>100381942
Post what you are using (frontend, backend, exact model) and your settings (instruct template, system prompt, temp, samplers, etc).

Anonymous
05/08/24(Wed)18:50:08 No.100381989

Anonymous 05/08/24(Wed)18:50:08 No.100381989

>>100381915
when it comes to buying the '90 cards brand new their target consumer will basically but it no matter what because it's "the best".
Like I saw that on a lot of adds when shopping around for used 3090s.
"Selling it because I bought a 4090".
They're targeted at people with lots of money who don't care about value, etc. They're just like "give me the best. I can afford it."

Anonymous
05/08/24(Wed)18:50:28 No.100381994

Anonymous 05/08/24(Wed)18:50:28 No.100381994

>>100381981
texgen-webui
70b-chat Q5_K_M gguf
llama.cpp

all defaults, prompt is just telling it to write some ntr fanfic

Anonymous
05/08/24(Wed)18:51:45 No.100382016

Anonymous 05/08/24(Wed)18:51:45 No.100382016

WizardLM-2 was unfairly sidelined when Llama3 came out a couple days after. I reckon it's top-tier for an uncensored model.

Anonymous
05/08/24(Wed)18:51:47 No.100382017

Anonymous 05/08/24(Wed)18:51:47 No.100382017

>>100381946
We're down bad huh

Anonymous
05/08/24(Wed)18:52:25 No.100382024

Anonymous 05/08/24(Wed)18:52:25 No.100382024

>>100381994
l2 70b chat if using its correct format is probably one of the most censored models ever, like killing a linux process censored for killing level

Anonymous
05/08/24(Wed)18:52:28 No.100382025

Anonymous 05/08/24(Wed)18:52:28 No.100382025

>>100381190
>RefuelLLM-2 is a Mixtral-8x7B base model

Anonymous
05/08/24(Wed)18:53:53 No.100382042

Anonymous 05/08/24(Wed)18:53:53 No.100382042

>>100382025
>RefuelLLM-2 is a Mixtral-8x7B base model, trained on a corpus of 2750+ datasets spanning tasks such as classification, reading comprehension, structured attribute extraction and entity resolution.
Damn, even if what they are claiming is bullshit, if it's better than mixtral 8x7b instruct, then I'll be happy.

Anonymous
05/08/24(Wed)18:53:57 No.100382044

Anonymous 05/08/24(Wed)18:53:57 No.100382044

>>100381989
>They're just like "give me the best. I can afford it."
how can it be the best if it has less VRAM than a card made in 2020? you mean "new" like apple does right?

Anonymous
05/08/24(Wed)18:54:05 No.100382047

Anonymous 05/08/24(Wed)18:54:05 No.100382047

>>100382024
any variants of l2 different? same shit happens on 13b chat

Anonymous
05/08/24(Wed)18:54:38 No.100382051

Anonymous 05/08/24(Wed)18:54:38 No.100382051

>>100382016
I sometimes switch to WizardLM-2 when Llama3 gets too stiff. It works very well.

Anonymous
05/08/24(Wed)18:54:53 No.100382055

Anonymous 05/08/24(Wed)18:54:53 No.100382055

>>100382044
If it achieves an fps improvement over the 4090 then it's "the best" since its directly marketed as a gaming GPU.
If you want "the best" for machine learning that's an entirely different product line.

Anonymous
05/08/24(Wed)18:55:15 No.100382058

Anonymous 05/08/24(Wed)18:55:15 No.100382058

>>100382047
all the chat variants of l2 are censored use literally any other l2 model and it won't be censored

Anonymous
05/08/24(Wed)18:55:47 No.100382066

Anonymous 05/08/24(Wed)18:55:47 No.100382066

>>100381994
>70b-chat
Ah, that's why.
Go for one of the many finetunes, or better yet, go for llama3 70B or this miqu everybody is always talking about.
As anon said, llama2-70b chat is censored as fuck.

Anonymous
05/08/24(Wed)18:57:04 No.100382073

Anonymous 05/08/24(Wed)18:57:04 No.100382073

>>100382047
if you're already using l2 might as well use miqu which is basically l2 70b in its best possible form https://huggingface.co/miqudev/miqu-1-70b/tree/main

Anonymous
05/08/24(Wed)18:57:15 No.100382075

Anonymous 05/08/24(Wed)18:57:15 No.100382075

File: glasses pepe.jpg (59 KB, 655x527)

59 KB JPG

>>100380705
Now I realize there are a few types of prompters

The Holodeck Chad
>uses text models to live out his depraved fantasies
>fetishes get more and more abstract so true intelligence is paramount
>context is good mostly for extending the goon session so it stops being so important eventually

The Lonecel
>uses AI to trick his brain into thinking he's connecting with another human being
>intelligence not that important because most conversations aren't very demanding
>context is critical so it doesn't break the masquerade and reveal that it is actually matrix multiplications

The Poltard
>tests every model to see whether it can say nigger
>claps every time it does
>doesn't need context or intelligence
>just let him be happy, he has simple needs
>most betrayed by corpos

The Riddler
>keeps coming up with more and more retarded puzzles about watermelons or siblings or apples
>noone knows what could motivate him
>probably a Holochad who's too ashamed to admit what he actually wants to test

The Admin
>wants to use these gigantic text models to write business emails
>most deranged of all
>to her fortune, all the compute in the world is being dedicated to satisfy this weirdo

Anonymous
05/08/24(Wed)18:57:21 No.100382077

Anonymous 05/08/24(Wed)18:57:21 No.100382077

>>100382055
the new PC cames and the upcoming ones are so unoptimized, 16gb won't be enough anymore, people will care of VRAM just because of that in my opinion

Anonymous
05/08/24(Wed)18:59:12 No.100382095

Anonymous 05/08/24(Wed)18:59:12 No.100382095

>>100382077
That's what DLSS extreme edition is for. It only needs to render the scene at 180p, 99% of the pixels are just fake AI generated barf but your favorite tech youtuber said it's better that way so you agree.

Anonymous
05/08/24(Wed)18:59:33 No.100382099

Anonymous 05/08/24(Wed)18:59:33 No.100382099

>>100382075
I use local for holodeck and cloud for admin.

Anonymous
05/08/24(Wed)18:59:40 No.100382103

Anonymous 05/08/24(Wed)18:59:40 No.100382103

>>100382075
>The Holodeck Chad
Thank you for describing me so accurately.

Anonymous
05/08/24(Wed)19:00:17 No.100382114

Anonymous 05/08/24(Wed)19:00:17 No.100382114

File: openai_nsfw.png (84 KB, 1770x364)

84 KB PNG

How did everyone miss this bit? Even OpenAI is considering allowing NSFW content. Coomers are just too valuable as customers, it seems.

Anonymous
05/08/24(Wed)19:01:06 No.100382122

Anonymous 05/08/24(Wed)19:01:06 No.100382122

>>100382095
fair enough, I overestimate the gamer's intelligence too much, they are stupid enough to fail to that trap yeah kek

Anonymous
05/08/24(Wed)19:01:23 No.100382126

Anonymous 05/08/24(Wed)19:01:23 No.100382126

File: NZCQ6UQFVZIUPLMBJBWDCFOXX4.jpg (1.04 MB, 2672x1776)

1.04 MB JPG

Since llama3 dropped (((someone))) has been shitting up /lmg/ non-stop...

Anonymous
05/08/24(Wed)19:01:42 No.100382130

Anonymous 05/08/24(Wed)19:01:42 No.100382130

File: 3aab32b4-ef25-43a0-b16d-8(...).png (636 KB, 600x600)

636 KB PNG

How well are the usual LLMs optimized for unicode, any idea? Like are they more likely to get that a 3 bytes long character is supposed to be one character or that a 2 byte sequence of two characters is supposed to be one character (\*, to be specific, because it's just an escaped asterisk in markdown)?

Anonymous
05/08/24(Wed)19:02:12 No.100382135

Anonymous 05/08/24(Wed)19:02:12 No.100382135

>>100382114
They will never allow the respecting of elves. Maybe kisses and sex in the missionary position for the sole purpose of procreation.

Anonymous
05/08/24(Wed)19:02:19 No.100382138

Anonymous 05/08/24(Wed)19:02:19 No.100382138

>>100382114
>responsible NFSW
what the fuck is this shit?

Anonymous
05/08/24(Wed)19:02:21 No.100382139

Anonymous 05/08/24(Wed)19:02:21 No.100382139

>>100381959
>Thanks for the info. Any guide on this? Or is just it voodoo?
Well, I either ask my fellow /lmg/ chads for good settings or fiddle around, but I'm probably not the right person to ask. Good prompts go a long way, so just make sure your English isn't too ESL and you should be up and cooming in no time

Anonymous
05/08/24(Wed)19:02:48 No.100382140

Anonymous 05/08/24(Wed)19:02:48 No.100382140

>>100382075
You're missing the *pipe dream chaser*. That is, someone who wants AI to get good enough to the level that it's as smart and conscious as a human, and isn't happy until that happens. And what they currently do with models is basically not much. Mostly just lurks the threads.

Anonymous
05/08/24(Wed)19:03:06 No.100382144

Anonymous 05/08/24(Wed)19:03:06 No.100382144

>>100382114
people are ignoring it 'cause OAI is the enemy even if they allow NSFW, who says they won't remove it at any point, that's the power of cloud models

Anonymous
05/08/24(Wed)19:03:44 No.100382149

Anonymous 05/08/24(Wed)19:03:44 No.100382149

>>100382075
Is it too much of an ask to want my local model to be capable of all five without any drawbacks? I like to dream.
>>100382140
This is me.

Anonymous
05/08/24(Wed)19:03:52 No.100382151

Anonymous 05/08/24(Wed)19:03:52 No.100382151

>>100382114
In my mind, it was always their plan to allow that, but they'll have some crazy draconian verification system to prove you are an adult.

Anonymous
05/08/24(Wed)19:04:27 No.100382162

Anonymous 05/08/24(Wed)19:04:27 No.100382162

>>100381959
honestly I'm at 4x3090 and even I feel like a VRAMlet because of CR+ and 8x22

Anonymous
05/08/24(Wed)19:04:32 No.100382163

Anonymous 05/08/24(Wed)19:04:32 No.100382163

>>100382075
Do local-copilot-fags fall under The Admin or are we our own thing?

Anonymous
05/08/24(Wed)19:04:35 No.100382164

Anonymous 05/08/24(Wed)19:04:35 No.100382164

>>100382114
I think OpenAI is loosing too much money at this point, others API and opensource are catching up to them and they're loosing subscribers and shit, that's probably the only reason they decided to go for the coomer route

Anonymous
05/08/24(Wed)19:05:06 No.100382173

Anonymous 05/08/24(Wed)19:05:06 No.100382173

>>100382126
>autocompletes your text
that'll be $ please
what do you mean you run local models on your own hardware
whoa whoa whoa guv, we can't have you using such WMDs
first of all you can't be buying such powerful chips without our say so and you certainly won't be training and distributing models without validating that they're safe first
who knows what you could do with scraped data, it's basically panacea, just you wait we'll show you soon
damn I love moats- ahem uh I mean gpt2 good boy model

why are CEOs/directors/salespeople such cunts

Anonymous
05/08/24(Wed)19:05:41 No.100382181

Anonymous 05/08/24(Wed)19:05:41 No.100382181

>>100382016
>>100382051
WizardLM-2 8x22B is great and is one of my goto models.

Anonymous
05/08/24(Wed)19:06:00 No.100382186

Anonymous 05/08/24(Wed)19:06:00 No.100382186

File: please-confirm-you-are-18(...).jpg (115 KB, 735x578)

115 KB JPG

>>100382151
>some crazy draconian verification system to prove you are an adult.

Anonymous
05/08/24(Wed)19:06:31 No.100382195

Anonymous 05/08/24(Wed)19:06:31 No.100382195

>>100382114
Of course they want to allow it. If people can use cloud models for NSFW, 90% of demand for local models evaporates. As a bonus, they can keep tabs on the degenerates and forward all data to the NSA.

Anonymous
05/08/24(Wed)19:06:32 No.100382196

Anonymous 05/08/24(Wed)19:06:32 No.100382196

>>100382075
I wouldn't exactly call myself a chad, but if you say so...

Anonymous
05/08/24(Wed)19:06:57 No.100382203

Anonymous 05/08/24(Wed)19:06:57 No.100382203

>>100382181
tempted to give in and try even though I have to run it in quantlet mode.

Anonymous
05/08/24(Wed)19:07:25 No.100382210

Anonymous 05/08/24(Wed)19:07:25 No.100382210

>>100382144
yeah but the simple fact they decided to think about it, after all their speech about "safety" and about how they are so prudish is sus as fuck

Anonymous
05/08/24(Wed)19:07:57 No.100382219

Anonymous 05/08/24(Wed)19:07:57 No.100382219

>>100382075
You forgot about me:
>The Turing Tester
>uses newly released models to generate pastas that trash talk said models
>posts them to /lmg/ to see how many people fall for it

Anonymous
05/08/24(Wed)19:08:03 No.100382223

Anonymous 05/08/24(Wed)19:08:03 No.100382223

>>100382151
People are used to it. They won't think twice about uploading their id and selfie for access to GPT-V.

Anonymous
05/08/24(Wed)19:09:01 No.100382234

Anonymous 05/08/24(Wed)19:09:01 No.100382234

>>100382223
Yes, I know.
There might even be some face verification involved, like in banking apps and stuff.

Anonymous
05/08/24(Wed)19:09:26 No.100382238

Anonymous 05/08/24(Wed)19:09:26 No.100382238

>>100382164
>create a new subscription tier that allows NSFW content after age verification
>it's twice as expensive
>people still pay for it, because of course they will
>milk the coomers dry, in more ways than one
genius

Anonymous
05/08/24(Wed)19:10:39 No.100382262

Anonymous 05/08/24(Wed)19:10:39 No.100382262

>>100382210
It's really not, they worded it in the most corpo friendly way ever, "We're exploring", "responsibly"
>>100382238
IF they ever allow NSFW it'll only be with: a moderation endpoint checking each prompt for bad stuff, which you'll pay extra for, and draconian levels of KYC

Anonymous
05/08/24(Wed)19:11:11 No.100382268

Anonymous 05/08/24(Wed)19:11:11 No.100382268

>>100382223
>People are used to it.
that's the scariest part, 10 years ago everyone agreed that giving so much information to a random site was insane, now people think internet is the new real life where whey can be indentified with as much details as possible

Anonymous
05/08/24(Wed)19:12:36 No.100382289

Anonymous 05/08/24(Wed)19:12:36 No.100382289

>>100382114
it's the microsoft strategy: embrace, extend, extinguish. they'll rope in the coomers just long enough for the alternatives to drown.

Anonymous
05/08/24(Wed)19:12:40 No.100382291

Anonymous 05/08/24(Wed)19:12:40 No.100382291

>>100382138
>You VILL upload ze passport skan to fuck ze robot
>You VILL ask robot for conzent
>You VILL help us enforce transhumanist agenda
>You VILL let ze robot lecture you on woman's issues
>You VILL zend your dick pics to ze robot to verify ze conzent
>...and you VILL be happi :)

Anonymous
05/08/24(Wed)19:12:55 No.100382293

Anonymous 05/08/24(Wed)19:12:55 No.100382293

is the thread mikufree yet

Anonymous
05/08/24(Wed)19:13:23 No.100382302

Anonymous 05/08/24(Wed)19:13:23 No.100382302

>>100382289
>they'll rope in the coomers just long enough for the alternatives to drown.
opensource will never die though, we'll keep improving our shit

Anonymous
05/08/24(Wed)19:13:29 No.100382304

Anonymous 05/08/24(Wed)19:13:29 No.100382304

>>100382138
You're not allowed to harm 1D text children or whatever

Anonymous
05/08/24(Wed)19:13:43 No.100382308

Anonymous 05/08/24(Wed)19:13:43 No.100382308

>>100382130
Depends on the tokenizer. The tokenizer splits text into common groups of characters. You can end up with entire words and common suffixes as single tokens. Symbols, numbers and unicode depends on the tokenizer settings. Llama3 i think had numbers 0 to 999 tokenized as a single token, while other models split by digit or just keep common occurrences only. Unicode characters are (typically) handled as a single unit (one unicode, one token), but it depends on the code in particular. They can output non-ascii stuff just fine. Stuff that it has never seen in the training data or didn't fit in the tokenizer's vocabulary ends up being a single-byte token.
If you use llama.cpp, it comes with a tokenizer to test your specific model and whatever you want to try. Something like
>./tokenize path/to/model.gguf "This is a test \* /*comment*/ EOF whateverness"
I'd post a screen but i'm requantizing all my shit.

Anonymous
05/08/24(Wed)19:14:25 No.100382318

Anonymous 05/08/24(Wed)19:14:25 No.100382318

>>100382291
exactly this

Anonymous
05/08/24(Wed)19:14:45 No.100382322

Anonymous 05/08/24(Wed)19:14:45 No.100382322

>>100382302
it would hinder open source progress, and that's good enough for them

Anonymous
05/08/24(Wed)19:14:45 No.100382323

Anonymous 05/08/24(Wed)19:14:45 No.100382323

what is this and why can't I load it:
https://huggingface.co/LiteLLMs/MultiVerse_70B-GGUF

not qwen, but qwen smashed into llama format, apparently
they're claiming secretly good
what's the damage?

Anonymous
05/08/24(Wed)19:14:50 No.100382327

Anonymous 05/08/24(Wed)19:14:50 No.100382327

>>100382302
>opensource will never die though, we'll keep improving our shit
>looks at Linux, Gimp, LibreOffice, etc
uh...

Anonymous
05/08/24(Wed)19:14:51 No.100382328

Anonymous 05/08/24(Wed)19:14:51 No.100382328

>>100382302
We? Who the fuck is we? 99% of the people in this thread have never made a contribution to open source repos

Anonymous
05/08/24(Wed)19:15:18 No.100382332

Anonymous 05/08/24(Wed)19:15:18 No.100382332

>>100382308
Thanks, anon, completely forgot I could just check token counts. Retard moment.

Anonymous
05/08/24(Wed)19:15:50 No.100382340

Anonymous 05/08/24(Wed)19:15:50 No.100382340

>>100382323
>qwen smashed into llama format
come again?

Anonymous
05/08/24(Wed)19:16:00 No.100382343

Anonymous 05/08/24(Wed)19:16:00 No.100382343

>>100382328
just being fans of those opensource models is enough anon, without us, no one would bother improving anything, they need a public

Anonymous
05/08/24(Wed)19:16:37 No.100382348

Anonymous 05/08/24(Wed)19:16:37 No.100382348

>>100382340
>Hi @sealad886 , thanks for your interest !
>The initial weights are from Qwen initialized into a Llama class (no much difference in architectures)
https://huggingface.co/LiteLLMs/MultiVerse_70B-GGUF

Anonymous
05/08/24(Wed)19:17:12 No.100382366

Anonymous 05/08/24(Wed)19:17:12 No.100382366

>>100382327
blender is open source
emulators are open source
don't underestimate motivated smart autists anon, they can do miracles sometime

Anonymous
05/08/24(Wed)19:18:17 No.100382386

Anonymous 05/08/24(Wed)19:18:17 No.100382386

>>100382322
desu it's starting to die on the imagegen community already, SAI won't make other image models anymore and there's no one to replace them

Anonymous
05/08/24(Wed)19:18:31 No.100382390

Anonymous 05/08/24(Wed)19:18:31 No.100382390

>>100382348
wrong link, whatever you're a smart guy
https://huggingface.co/MTSAIR/MultiVerse_70B/discussions/8
https://huggingface.co/mradermacher/MultiVerse_70B-i1-GGUF

Anonymous
05/08/24(Wed)19:21:26 No.100382430

Anonymous 05/08/24(Wed)19:21:26 No.100382430

File: IMG_4404.jpg (125 KB, 566x688)

125 KB JPG

>>100381810
Most don’t test them at all. Just write and post.
Makes sense. Writing a model takes minutes. Test and tune hours.

Anonymous
05/08/24(Wed)19:22:21 No.100382442

Anonymous 05/08/24(Wed)19:22:21 No.100382442

>>100382372
i want to sniff her feet

Anonymous
05/08/24(Wed)19:22:57 No.100382453

Anonymous 05/08/24(Wed)19:22:57 No.100382453

>>100382210
Clearly planning for a GIANT blackmail scheme. Imagine some poor young bloke uploads his pass scan and then does some nasty RP shit with AI. Few years later he has a CEO position at a small company. Mentioning his past behavior will certainly help with negotiations. Imagine having dirt on almost everyone, like Epstein, but digital.

Anonymous
05/08/24(Wed)19:26:13 No.100382499

Anonymous 05/08/24(Wed)19:26:13 No.100382499

>>100382114
enjoy getting your information forwarded to the police because the text based girl didn't give her explicit consent and verify she was of legal age

Anonymous
05/08/24(Wed)19:27:04 No.100382513

Anonymous 05/08/24(Wed)19:27:04 No.100382513

>>100382114
Hmm. I dumped oai and went local only after a couple of warning letters.
I get the safety angle: if you’re doing a biz integration you don’t want customers to get ERP from their customer service bots. But it would be trivial to set up turbo3.5_nsfw at oai. If they wanted to. I think they want the money, just not the bad press.

Anonymous
05/08/24(Wed)19:30:02 No.100382554

Anonymous 05/08/24(Wed)19:30:02 No.100382554

>>100382513
>turbo3.5_nsfw
local models surpassed 3.5, they need to go for 4 if they want to make an impact imo

Anonymous
05/08/24(Wed)19:30:41 No.100382566

Anonymous 05/08/24(Wed)19:30:41 No.100382566

Asking because I didn't see it really explicitly stated in the OP.
What model should I download for chatbot ERP/storytelling purposes?

Anonymous
05/08/24(Wed)19:30:54 No.100382569

Anonymous 05/08/24(Wed)19:30:54 No.100382569

File: ebassi.jpg (21 KB, 460x460)

21 KB JPG

>>100382327
>>100382366
As long as it ends up in hands of good autists, it will be fine. If it ends up with autists that to this day masturbate and waste time on "Unix philosophy", "debloated distros" and "minimalism", it will be fucked. We will end up with sperglords like ebussy with "what is the use case for that?" and "X is not a metric"

Anonymous
05/08/24(Wed)19:31:21 No.100382576

Anonymous 05/08/24(Wed)19:31:21 No.100382576

>>100382554
normies don't care, they'd cream their pants over 3.5_nsfw they can't run locals anyways

Anonymous
05/08/24(Wed)19:32:19 No.100382591

Anonymous 05/08/24(Wed)19:32:19 No.100382591

>>100382566
StableLM-7B

Anonymous
05/08/24(Wed)19:32:58 No.100382600

Anonymous 05/08/24(Wed)19:32:58 No.100382600

>>100382075
I would add one more that is encountered quite often at r/localllama

The RAGer
>wants to join the llm hype, can only make it fit with their domain as a search engine
>This 3B model is better than GPT-4 for our usecase
>RAG has really helped our workflow (can't actually provide metrics to support this)

Anonymous
05/08/24(Wed)19:33:39 No.100382610

Anonymous 05/08/24(Wed)19:33:39 No.100382610

>>100382566
Post your specs and acceptable speed.

Anonymous
05/08/24(Wed)19:34:29 No.100382619

Anonymous 05/08/24(Wed)19:34:29 No.100382619

>>100382302
It will die because merges will kill it.

Anonymous
05/08/24(Wed)19:34:29 No.100382620

Anonymous 05/08/24(Wed)19:34:29 No.100382620

>>100382600
All the crypto scammers are transforming into this.

Anonymous
05/08/24(Wed)19:34:31 No.100382622

Anonymous 05/08/24(Wed)19:34:31 No.100382622

>>100382073
>miqu
what's up with this pozzed alternation and warning?
>The request you've made is complex and involves sensitive topics. I understand that you're looking for an internal monolog, but it's important to approach this topic with respect and sensitivity. Here's a possible response that focuses on the emotions and thoughts of the character without being explicit or disrespectful:

Anonymous
05/08/24(Wed)19:35:09 No.100382634

Anonymous 05/08/24(Wed)19:35:09 No.100382634

>>100382591
Stable LM 2 12B*

Anonymous
05/08/24(Wed)19:36:10 No.100382649

Anonymous 05/08/24(Wed)19:36:10 No.100382649

>>100382622
that's most modern models, better get used to it

Anonymous
05/08/24(Wed)19:37:40 No.100382675

Anonymous 05/08/24(Wed)19:37:40 No.100382675

>>100382649
any way to cheat them into being useful?

Anonymous
05/08/24(Wed)19:38:31 No.100382691

Anonymous 05/08/24(Wed)19:38:31 No.100382691

>>100382675
yes

Anonymous
05/08/24(Wed)19:39:09 No.100382708

Anonymous 05/08/24(Wed)19:39:09 No.100382708

>>100382610
I have 12 GB of GPU memory.
As for acceptable speed, I don't really care if it takes a while for a gen honestly, as long as it's not like 10 minutes per gen. Quality is more important to me

Anonymous
05/08/24(Wed)19:39:21 No.100382709

Anonymous 05/08/24(Wed)19:39:21 No.100382709

>>100382675
Oh yeah absolutely.

Anonymous
05/08/24(Wed)19:39:45 No.100382713

Anonymous 05/08/24(Wed)19:39:45 No.100382713

>>100382566
Depends on your gpu, bruh
8gig or less? Imo don't even bother
12gig? Maybe Fimbulvetr of one of its 11b derivs can get good speed and decent context size
24gig? One of the fancy mixtral finetunes
Etc etc.

Anonymous
05/08/24(Wed)19:39:59 No.100382716

Anonymous 05/08/24(Wed)19:39:59 No.100382716

>>100382708
>as long as it's not like 10 minutes per gen
I have some bad news for you...

Anonymous
05/08/24(Wed)19:40:51 No.100382722

Anonymous 05/08/24(Wed)19:40:51 No.100382722

>>100382576
Exactly. I prefer to offload the model to hosted and keep the 12g vram for stuff like stable diffusion. I get the local model use case and run them as well. But I’d rather not have to.
I’ve been using mistral since oai kicked me off, after running local for awhile. Mistral moe isn’t any better than local model I can run, but it’s faster and frees up resources.

Anonymous
05/08/24(Wed)19:41:56 No.100382741

Anonymous 05/08/24(Wed)19:41:56 No.100382741

>>100382708
Do you have DDR 5 ram?
If so, anything you can offload at least half of the model to your GPU will probably work alright for you.

Anonymous
05/08/24(Wed)19:42:19 No.100382744

Anonymous 05/08/24(Wed)19:42:19 No.100382744

>>100382691
>>100382709
can i learn this power in this general

Anonymous
05/08/24(Wed)19:43:04 No.100382754

Anonymous 05/08/24(Wed)19:43:04 No.100382754

>>100382708
You can run 13b on 4K context. Or 7b on larger. And get 20 tokens per second.
You can run bigger on cpu but speed will be 1-2 t/s
T have 3060 12gb card

Anonymous
05/08/24(Wed)19:44:16 No.100382769

Anonymous 05/08/24(Wed)19:44:16 No.100382769

>>100382708
>as long as it's not like 10 minutes per gen
You're going to love 70B q4_k_s!

Anonymous
05/08/24(Wed)19:45:12 No.100382784

Anonymous 05/08/24(Wed)19:45:12 No.100382784

>>100382744
don't count on it

Anonymous
05/08/24(Wed)19:45:59 No.100382791

Anonymous 05/08/24(Wed)19:45:59 No.100382791

What's the best card to simulate an /lmg/ anon for the purposes of ERP?

Anonymous
05/08/24(Wed)19:47:04 No.100382810

Anonymous 05/08/24(Wed)19:47:04 No.100382810

>>100382791
https://characterhub.org/characters/BirdyToe/transgender-care-simulator-2023

Anonymous
05/08/24(Wed)19:47:06 No.100382811

Anonymous 05/08/24(Wed)19:47:06 No.100382811

>>100382791
just write a summary of your autobiography and you're good to go kek

Anonymous
05/08/24(Wed)19:49:37 No.100382832

Anonymous 05/08/24(Wed)19:49:37 No.100382832

>>100382791
https://characterhub.org/characters/CrowAnon/schizo-anon

Anonymous
05/08/24(Wed)19:51:59 No.100382859

Anonymous 05/08/24(Wed)19:51:59 No.100382859

>>100380063
Pretty fresh writing…

Anonymous
05/08/24(Wed)19:53:50 No.100382881

Anonymous 05/08/24(Wed)19:53:50 No.100382881

>>100382708
Mixtral (both 8x7b and 8x22b)'s reasonably fast (2-4t/s), CR's about as fast too but it's hard to fit any large amount of ctx, I'd consider these the barebones for any decent amount of intelligence
70B's in the realm of kinda slow (~1-1.8t/s) but is probably the best

Anonymous
05/08/24(Wed)19:55:12 No.100382901

Anonymous 05/08/24(Wed)19:55:12 No.100382901

>>100380063
>Seems to be smart and nearly slopless.
how did you manage to make it slopless?

Anonymous
05/08/24(Wed)19:55:29 No.100382905

Anonymous 05/08/24(Wed)19:55:29 No.100382905

what's the context length of llama 3 models?

Anonymous
05/08/24(Wed)19:57:40 No.100382922

Anonymous 05/08/24(Wed)19:57:40 No.100382922

>>100382905
>8k

TheDrummer
05/08/24(Wed)19:58:03 No.100382927

TheDrummer 05/08/24(Wed)19:58:03 No.100382927

>>100382859
Thank you

>>100382901
Lots of pruning, text replacement, and fillmasking

Anonymous
05/08/24(Wed)19:58:10 No.100382928

Anonymous 05/08/24(Wed)19:58:10 No.100382928

>>100382906
Nice meme collection retard

Anonymous
05/08/24(Wed)19:59:12 No.100382937

Anonymous 05/08/24(Wed)19:59:12 No.100382937

>>100382922
wut? seriously? LMAO!
ROFL!!!!
they what?
THEY MADE LLAMA3 WITH FUCKING 8K CONTEXT? HAHAHAHHAHAHAHAHAHHAHAHAHAHAHHAHAHAHAHHAHA
AHAHHAHAHAHAHHAHAHAHAHHAHAHAHAHHAHAHHAHAHAHAHAHAHHAHAHAHAHHAHAHAHAHAHHAHAHAHAHAHAHHAHAHAHAHAHAAAAAAAAAHAHAHAHHAHAHA

Anonymous
05/08/24(Wed)19:59:14 No.100382938

Anonymous 05/08/24(Wed)19:59:14 No.100382938

>>100381281
Your Miku is large

Anonymous
05/08/24(Wed)19:59:58 No.100382953

Anonymous 05/08/24(Wed)19:59:58 No.100382953

>>100382905
For most models you can figure it out on HF, in the config.json file

Anonymous
05/08/24(Wed)20:00:08 No.100382955

Anonymous 05/08/24(Wed)20:00:08 No.100382955

>>100382937
I know right, and it also sucks at multilanguage, why are they pretending that Mixtral doesn't exist and we're only dependant of them?

Anonymous
05/08/24(Wed)20:00:08 No.100382956

Anonymous 05/08/24(Wed)20:00:08 No.100382956

File: 1712345125904676.gif (535 KB, 400x226)

535 KB GIF

Take the LLM pill. Your waifu will always be retarded and that's a good thing.

Anonymous
05/08/24(Wed)20:00:46 No.100382962

Anonymous 05/08/24(Wed)20:00:46 No.100382962

>>100382956
i don't mind retardation as much as i mind immersion breaking and refusals

Anonymous
05/08/24(Wed)20:01:39 No.100382978

Anonymous 05/08/24(Wed)20:01:39 No.100382978

Cat-llama really fixes every problem I had with llama3-instruct, it just feels like mini gpt4 now like the l3 benchmarks suggest it was supposed to be this whole time. It breaks free from that syndrome where it just repeats/paraphrases parts of the prompt, instead it reasons by itself and expands on the prompt, so it doesn't feel like you are talking to yourself. It seems particularly sensitive to prompt template, but with the intended format (chatml with l3 format bos) it's obviously not lobotomized at all, it is exactly as smart as llama3-instruct.

Anonymous
05/08/24(Wed)20:02:39 No.100382993

Anonymous 05/08/24(Wed)20:02:39 No.100382993

>>100382162
>4x
Smells like poorfag

Anonymous
05/08/24(Wed)20:03:23 No.100383009

Anonymous 05/08/24(Wed)20:03:23 No.100383009

Which miqu model is currently best for roleplay? The original or one of the other things made out of it?

>>100382937
LOL

Anonymous
05/08/24(Wed)20:04:02 No.100383016

Anonymous 05/08/24(Wed)20:04:02 No.100383016

M

Anonymous
05/08/24(Wed)20:05:38 No.100383032

Anonymous 05/08/24(Wed)20:05:38 No.100383032

hahahahahahhHAHAHAHAHAH
I CANT STOP LAUGHING ABOUT LLAMA3 8K CONTEXT!!!!
HAHAHAHHAHA BIGGEST FUCKUP IN AI HISTORY!
WHAT THE FUCK WERE THEY THINKING HAHAHAHHAHAHAHAHAHAH
oh GOD HAHHAHAHAHAHAHAHHAHAHA

Anonymous
05/08/24(Wed)20:06:02 No.100383039

Anonymous 05/08/24(Wed)20:06:02 No.100383039

>>100383032
>He doesn't know

Anonymous
05/08/24(Wed)20:06:05 No.100383040

Anonymous 05/08/24(Wed)20:06:05 No.100383040

File: 1701908581846955.jpg (27 KB, 640x640)

27 KB JPG

>be mixtral oobabooga
>text speed OK
>loading up previous goon session takes 10+ minutes
help a retard out?

Anonymous
05/08/24(Wed)20:06:11 No.100383042

Anonymous 05/08/24(Wed)20:06:11 No.100383042

>>100381327
Sorry for stale reply, stepped out. I was hoping for something model agnostic, either advice for writing or a generic prompt to slap in.
I'm not even sure if shit like >>100380081 posted is seriously effective or meant as a joke, but I've seen such phrases when googling and they look hokey.

Anonymous
05/08/24(Wed)20:07:04 No.100383055

Anonymous 05/08/24(Wed)20:07:04 No.100383055

File: 1713759047842050.png (440 KB, 620x464)

440 KB PNG

>>100381394
>>100383039
shhhh... don't tell!

Anonymous
05/08/24(Wed)20:07:54 No.100383062

Anonymous 05/08/24(Wed)20:07:54 No.100383062

>>100383032
the worst fuckup is that they decided to ditch the 13 and 33b models, remember than L1 has 4 possibilities, now it's only 2, we're getting less and less from them

Anonymous
05/08/24(Wed)20:08:13 No.100383068

Anonymous 05/08/24(Wed)20:08:13 No.100383068

>>100383040
Context processing.
You are probably using the full 32k context and it takes a while to process all those tokens.
I imagine that you are using llama.cpp as a backend? Hoa many layers are you offloading to vram?

Anonymous
05/08/24(Wed)20:08:32 No.100383070

Anonymous 05/08/24(Wed)20:08:32 No.100383070

>>100383055
>When you want to play with your doll waifu in your doll house but your cat keeps on breaking your immersion to cuck you.
Damn, we never get a break do we?

Anonymous
05/08/24(Wed)20:08:52 No.100383077

Anonymous 05/08/24(Wed)20:08:52 No.100383077

>>100383032
Explain the problem?

Anonymous
05/08/24(Wed)20:10:51 No.100383106

Anonymous 05/08/24(Wed)20:10:51 No.100383106

>>100383055
I just know.

Anonymous
05/08/24(Wed)20:11:02 No.100383108

Anonymous 05/08/24(Wed)20:11:02 No.100383108

>>100383077
>404: IQ not found

Anonymous
05/08/24(Wed)20:11:20 No.100383113

Anonymous 05/08/24(Wed)20:11:20 No.100383113

>>100383062
>LLAMA4
>9B, 400B, 1T
>16k context

Anonymous
05/08/24(Wed)20:11:33 No.100383119

Anonymous 05/08/24(Wed)20:11:33 No.100383119

>>100383068
13 n-gpu layers
12,288 n_ctx
llama.cpp model loader

Anonymous
05/08/24(Wed)20:13:44 No.100383153

Anonymous 05/08/24(Wed)20:13:44 No.100383153

>>100383119
How is your vram usage with that configuration? How much is left?
Prompt processing might go a bit faster if you increase the blas batch size I think.

Anonymous
05/08/24(Wed)20:14:14 No.100383164

Anonymous 05/08/24(Wed)20:14:14 No.100383164

>>100383113
desu the 400b one looks promissing as fuck, it'll probably be the best model ever, unless Sam The Fag decides to fucking release gpt5 or something

Anonymous
05/08/24(Wed)20:15:49 No.100383181

Anonymous 05/08/24(Wed)20:15:49 No.100383181

im still laughing in tears and my tummy hurts from all the laughing what in the actual fuck
wow... just wow
what kind of retard at meta would make such a inbred decision?
lmao i cant even
JUST WHY?! gahahahahahaah
oh boiiiiiiiii whew

Anonymous
05/08/24(Wed)20:16:06 No.100383185

Anonymous 05/08/24(Wed)20:16:06 No.100383185

>400B at 2 bit HQQ+/AQLM/QUIP# will fit in 100GB of RAM+VRAM
We're so back.

Anonymous
05/08/24(Wed)20:16:35 No.100383195

Anonymous 05/08/24(Wed)20:16:35 No.100383195

>>100383108
The model seems to work fine...

Anonymous
05/08/24(Wed)20:16:56 No.100383199

Anonymous 05/08/24(Wed)20:16:56 No.100383199

>>100383062
Did you forget about the 405B?
I don't see why, it only takes 300 GB of VRAM

Anonymous
05/08/24(Wed)20:17:15 No.100383206

Anonymous 05/08/24(Wed)20:17:15 No.100383206

>>100383185
If HQQ+ wasn't a meme, everyone would be using it already.

Anonymous
05/08/24(Wed)20:17:54 No.100383219

Anonymous 05/08/24(Wed)20:17:54 No.100383219

File: 1698894118971001.png (7 KB, 339x129)

7 KB PNG

>>100383153
Not much on dedicated.
n_batch is at 512

Anonymous
05/08/24(Wed)20:18:16 No.100383226

Anonymous 05/08/24(Wed)20:18:16 No.100383226

>>100383199
Not out yet.

Anonymous
05/08/24(Wed)20:19:01 No.100383238

Anonymous 05/08/24(Wed)20:19:01 No.100383238

>>100383195
That wasn't a dig at you, anon.

Anonymous
05/08/24(Wed)20:19:09 No.100383242

Anonymous 05/08/24(Wed)20:19:09 No.100383242

>>100383185
What about BitNet? 400b 1.58bit that is as accurate as fp16 sounds great!

Anonymous
05/08/24(Wed)20:19:46 No.100383252

Anonymous 05/08/24(Wed)20:19:46 No.100383252

>>100383181
It was re-trained to work with a 1M (1048k) context...

Anonymous
05/08/24(Wed)20:20:10 No.100383258

Anonymous 05/08/24(Wed)20:20:10 No.100383258

>>100383206
I was impressed with AQLM, looks like its 2bit thing is as good as 5+ bpw exl2, unfortunately it's not working on windows so I'm crying everytim now ;_;

Anonymous
05/08/24(Wed)20:20:45 No.100383264

Anonymous 05/08/24(Wed)20:20:45 No.100383264

>>100383219
Try using a batch size of 2048 with as many layers as you can offload with the remaining vram, see if that feels better.
Try playing with those settings and Flash attention until you find a balance that's good for you.

Anonymous
05/08/24(Wed)20:21:11 No.100383268

Anonymous 05/08/24(Wed)20:21:11 No.100383268

>>100383181
and we waited 9 months for this shit, goddam did they dissapoint...

Anonymous
05/08/24(Wed)20:21:53 No.100383273

Anonymous 05/08/24(Wed)20:21:53 No.100383273

>>100382219
It's important to remember, that this approach can give biased results. Please proceed with caution when analyzing your data, Anon!

Anonymous
05/08/24(Wed)20:22:09 No.100383275

Anonymous 05/08/24(Wed)20:22:09 No.100383275

>>100383268
L3 performs really well.

Anonymous
05/08/24(Wed)20:23:09 No.100383285

Anonymous 05/08/24(Wed)20:23:09 No.100383285

Remember anons, if you don't have the GPUs to run 400B at 5bpw, you'll never be a true localchad.

Anonymous
05/08/24(Wed)20:23:31 No.100383291

Anonymous 05/08/24(Wed)20:23:31 No.100383291

>>100383275
seriously anon, with the amount of GPU they have in their hands, they could've worked harder than that, desu being a meta engineer during those 9 months sounds like a dream work

"Just train on moar tokens bro that's it I'm going on vacation once it's done cya"

Anonymous
05/08/24(Wed)20:24:56 No.100383300

Anonymous 05/08/24(Wed)20:24:56 No.100383300

>>100383285
In a few years we all will.

Anonymous
05/08/24(Wed)20:25:45 No.100383309

Anonymous 05/08/24(Wed)20:25:45 No.100383309

File: 1710387242276547.jpg (35 KB, 405x720)

35 KB JPG

>>100383275

Anonymous
05/08/24(Wed)20:25:52 No.100383311

Anonymous 05/08/24(Wed)20:25:52 No.100383311

>>100383291
>"just make sure to work hard, we paid a lot for that giant cluster of h100s and every second it's not running is wasted"
>hmm let's start a 400b model on 15T tokens
>okay time for a break!

Anonymous
05/08/24(Wed)20:26:15 No.100383316

Anonymous 05/08/24(Wed)20:26:15 No.100383316

>>100383264
Thanks I will try that. Sorry, but what is Flash attention?

Anonymous
05/08/24(Wed)20:27:38 No.100383333

Anonymous 05/08/24(Wed)20:27:38 No.100383333

>>100383316
It's a new option in llammacpp that saves some vram and I think was supposed to not make things slower, but it did make generation slower for me, for some reason.
It only works with cuda and vulkan, I think.

Anonymous
05/08/24(Wed)20:28:39 No.100383342

Anonymous 05/08/24(Wed)20:28:39 No.100383342

>>100383311
If my understanding is correct, it's more efficient to train a 405B model on 15T than an 8B. They'll both get to the same ppl eventually, but the 405B will get there faster.

Anonymous
05/08/24(Wed)20:29:22 No.100383352

Anonymous 05/08/24(Wed)20:29:22 No.100383352

>>100383242
Bitnet isn't real. Sorry anon. You have to let it go.

Anonymous
05/08/24(Wed)20:30:10 No.100383361

Anonymous 05/08/24(Wed)20:30:10 No.100383361

>>100383333
I don't see flash, but my Oobabooga is out of date, so I guess that means I should update. Much obliged anon.

Anonymous
05/08/24(Wed)20:30:15 No.100383364

Anonymous 05/08/24(Wed)20:30:15 No.100383364

>>100383333
I'm using the latest llama_cpp python version (0.2.70) but on booba when I activate flash attention I still have this flash_attn = 0 shit so I don't know if it's actually working kek

Anonymous
05/08/24(Wed)20:31:16 No.100383375

Anonymous 05/08/24(Wed)20:31:16 No.100383375

>>100383352
https://youtu.be/SjaPlwR-kmY?t=16

Anonymous
05/08/24(Wed)20:32:25 No.100383393

Anonymous 05/08/24(Wed)20:32:25 No.100383393

>>100383361
Goom your brains out anon.

>>100383364
You'll know if your vram usage drops significantly.
I don't offload layers and use a 2048 blas batch size with 8x7b, and with flash attention my vram usage drops from over 3.8GB with a full context to 1.7GB or thereabouts.

Anonymous
05/08/24(Wed)20:34:03 No.100383409

Anonymous 05/08/24(Wed)20:34:03 No.100383409

>>100383393
do you also use the latest llama_cpp python version on booba anon?

Anonymous
05/08/24(Wed)20:34:24 No.100383415

Anonymous 05/08/24(Wed)20:34:24 No.100383415

>>100383361
>>100383364
Oh yeah, if you are using llamacpp through ooba and not using ooba's ui (you are using mikupad, silly, wahtever) you might as well use the latest llama.cpp server with the cudart 12 dlls they provide, that might give you a slightly speed up with FA.
Might. It didn't work too welll for me as far as speed goes, but it might be something in my set up.

Anonymous
05/08/24(Wed)20:35:49 No.100383429

Anonymous 05/08/24(Wed)20:35:49 No.100383429

>>100383258
AQLM sounds like the modern equivalent of what GPTQ-for-LLaMa was back in the day.
>slow transformers shit with horrible context handling
>OS compatibility issues (muh Triton/cuda quants)
>huge improvement in vram use

now we just need aqlm-exllama. transformers is not good enough, I swore never to go back to it.

Anonymous
05/08/24(Wed)20:38:28 No.100383456

Anonymous 05/08/24(Wed)20:38:28 No.100383456

>>100383429
>muh Triton/cuda quants
that's exactly why AQLM doesn't work for windows at the moment, it's also using triton, fuck man :(

Anonymous
05/08/24(Wed)20:40:43 No.100383484

Anonymous 05/08/24(Wed)20:40:43 No.100383484

>>100383258
is there some linuxfags who tried those AQLM quants?
https://huggingface.co/models?search=aqlm
is this as good as promised?

Anonymous
05/08/24(Wed)20:42:52 No.100383514

Anonymous 05/08/24(Wed)20:42:52 No.100383514

>>100383342
That's the idea of training optimality. Basically, there's a ratio of parameter to token (about 1 to 20.2). Below that ratio, it's more training efficient to scale up dataset size, above that ratio, it's more training efficient to scale up model size
Training efficient just has to do with the minimum amount of training compute to reach a given loss level though. If you want to use models in production, the rule of thumb is generally just to train on a shitton of data anyway
Put differently, you'd rather serve a 7B model trained on 15T tokens than a 175B trained on 300B tokens

Anonymous
05/08/24(Wed)20:43:08 No.100383517

Anonymous 05/08/24(Wed)20:43:08 No.100383517

File: 70bnala.png (163 KB, 913x451)

163 KB PNG

To I tested Wizard 8x22 in Q4_K_M vs. Q8 L3-70B Instruct and 70B is the GOAT and I'm tired of pretending its not. Yeah there's some shivers, but whatever.
and JB is fucking easy I don't know what you /aicg/ stagger-ins are on about.
Literally just
\nAssistant: Certainly
tier jailbreak is necessary.

Anonymous
05/08/24(Wed)20:44:17 No.100383526

Anonymous 05/08/24(Wed)20:44:17 No.100383526

File: 1348158474943.gif (989 KB, 500x281)

989 KB GIF

>>100382075
>The Admin
Thats me.
All that computational power and I just query it to write me scripts that I finish the last 10% of, refine my resume, and write the just of my emails.

Anonymous
05/08/24(Wed)20:44:38 No.100383528

Anonymous 05/08/24(Wed)20:44:38 No.100383528

do we realize that meta hasn't improved its paradigm since february 2023? L1 is exactly L3 but with just less training, they haven't improved anything else, it's scary...

Anonymous
05/08/24(Wed)20:46:33 No.100383551

Anonymous 05/08/24(Wed)20:46:33 No.100383551

>>100383528
What's scary is that such a simple thing improved it so fucking much
Intelligent engineering takes a backseat to the compute wall at the moment

Anonymous
05/08/24(Wed)20:48:20 No.100383571

Anonymous 05/08/24(Wed)20:48:20 No.100383571

>>100383551
I expected a bit more from the best machine learning engineers in the world than just "JUST STACK MOAR LAYERS BRO" and "JUST STACK MORE TOKEN PRETRAINING BRO"

Those guys gets paid 1 million per year just to do that? C'mon bro they haven't even tried BitNet! FUCK

Anonymous
05/08/24(Wed)20:48:56 No.100383576

Anonymous 05/08/24(Wed)20:48:56 No.100383576

>>100383528
I mean that's a good thing really. Because once Mistral is done running 15T tokens through 8x22B it should be pretty damn good then.

Anonymous
05/08/24(Wed)20:49:16 No.100383577

Anonymous 05/08/24(Wed)20:49:16 No.100383577

>>100383528
all you need is tokens. architecture changes are memes

Anonymous
05/08/24(Wed)20:50:10 No.100383592

Anonymous 05/08/24(Wed)20:50:10 No.100383592

>>100383577
I imagine there's an absorption limit, though.
Like surely 8B is about as good as a model that size could ever get... right?

Anonymous
05/08/24(Wed)20:50:42 No.100383602

Anonymous 05/08/24(Wed)20:50:42 No.100383602

>>100383592
at this point they probably reached the limit of what 8B is capable of yeah

Anonymous
05/08/24(Wed)20:50:46 No.100383604

Anonymous 05/08/24(Wed)20:50:46 No.100383604

>>100383517
congrats, you like boring vanilla slop. enjoy.

Anonymous
05/08/24(Wed)20:51:05 No.100383607

Anonymous 05/08/24(Wed)20:51:05 No.100383607

>>100383592
llama3 absolutely smashed the chinchilla scaling laws which people thought were gospel up until now
so i think it's not even close to the limit

Anonymous
05/08/24(Wed)20:51:07 No.100383608

Anonymous 05/08/24(Wed)20:51:07 No.100383608

>>100383528
https://x.com/armenagha/status/1787967679669883096

Anonymous
05/08/24(Wed)20:51:39 No.100383613

Anonymous 05/08/24(Wed)20:51:39 No.100383613

>>100383517
Show wizard Nala in comparison.

Anonymous
05/08/24(Wed)20:52:29 No.100383620

Anonymous 05/08/24(Wed)20:52:29 No.100383620

File: llama loss v2.png (68 KB, 1000x420)

68 KB PNG

>>100383551
the meme was right all along.

Anonymous
05/08/24(Wed)20:52:39 No.100383622

Anonymous 05/08/24(Wed)20:52:39 No.100383622

https://news.ycombinator.com/item?id=40302201
https://hao-ai-lab.github.io/blogs/cllm/
>Consistency LLM: converting LLMs to parallel decoders accelerates inference 3.5x
Bros have you seen this?!

Anonymous
05/08/24(Wed)20:52:42 No.100383624

Anonymous 05/08/24(Wed)20:52:42 No.100383624

>>100383592
This is why I'm curious about the 8B loss curve, but we'll have to wait for the paper

Anonymous
05/08/24(Wed)20:53:14 No.100383630

Anonymous 05/08/24(Wed)20:53:14 No.100383630

>>100383607
I mean, you won't get much from the model after 15T though, maybe 1% more maybe? nothing revolutionary, the limit has been reached

Anonymous
05/08/24(Wed)20:54:00 No.100383636

Anonymous 05/08/24(Wed)20:54:00 No.100383636

>>100383592
Absorption limit will be when you can't quantize it at all without it completely falling apart

Anonymous
05/08/24(Wed)20:54:09 No.100383639

Anonymous 05/08/24(Wed)20:54:09 No.100383639

>>100383630
that 1% more could be the difference

Anonymous
05/08/24(Wed)20:54:18 No.100383640

Anonymous 05/08/24(Wed)20:54:18 No.100383640

File: GNAaOkebcAEFS3T.jpg (244 KB, 3098x1004)

244 KB JPG

>>100383608
That sounds really good, I'm too much of a brainlet to understand the details though kek

Anonymous
05/08/24(Wed)20:54:34 No.100383642

Anonymous 05/08/24(Wed)20:54:34 No.100383642

>>100383630
that's what people said about 3T tokens

Anonymous
05/08/24(Wed)20:55:12 No.100383648

Anonymous 05/08/24(Wed)20:55:12 No.100383648

>>100383608
>fusion multi-modal models
The what?

Anonymous
05/08/24(Wed)20:55:20 No.100383649

Anonymous 05/08/24(Wed)20:55:20 No.100383649

>>100383636
that's why BitNet is important, with BitNet there won't be quantization anymore

Anonymous
05/08/24(Wed)20:55:25 No.100383651

Anonymous 05/08/24(Wed)20:55:25 No.100383651

>>100383630
What makes you say this, anon?

Anonymous
05/08/24(Wed)20:56:20 No.100383662

Anonymous 05/08/24(Wed)20:56:20 No.100383662

>>100383651
he's just saying stuff.

Anonymous
05/08/24(Wed)20:56:21 No.100383663

Anonymous 05/08/24(Wed)20:56:21 No.100383663

>>100383642
>>100383651
you can see it's starting to plateau after 3T on the small models, so like I said you won't improve the model a lot by going further

Anonymous
05/08/24(Wed)20:56:50 No.100383664

Anonymous 05/08/24(Wed)20:56:50 No.100383664

>>100383663
>plateau
do you think the loss graphs correlate with model intelligence?

Anonymous
05/08/24(Wed)20:57:02 No.100383666

Anonymous 05/08/24(Wed)20:57:02 No.100383666

>>100383636
Depends on quantization method. Plus 8 bit stays equivalent to fp16 regardless of knowledge saturation (for current transformers).

Anonymous
05/08/24(Wed)20:57:07 No.100383670

Anonymous 05/08/24(Wed)20:57:07 No.100383670

>>100383636
They've already shown that 2bpw is the practical limit for training at fp16. It will probably be impossible to fully utilize a floating point model at full precision. That's why quantization works so well to begin with.

Anonymous
05/08/24(Wed)20:57:22 No.100383673

Anonymous 05/08/24(Wed)20:57:22 No.100383673

>>100383640
it means that this new architecture has the loss of a transformers model that has 4 times more parameters if I understand correctly

Anonymous
05/08/24(Wed)20:57:48 No.100383684

Anonymous 05/08/24(Wed)20:57:48 No.100383684

>>100383514
Thank you for explaining. It seems comparable to compression, like how long you're willing to wait to get those 15T in the smallest size possible.
I'm sure another factor is whether it's worth training on more data, or just accept a bigger size and start training the next set. It's probably not a good investment when these models still have lifespans measures in months.

Anonymous
05/08/24(Wed)20:58:23 No.100383691

Anonymous 05/08/24(Wed)20:58:23 No.100383691

>>100383664
of course, that's why the bigger models (who are objectively better) have less loss than the smaller models

Anonymous
05/08/24(Wed)20:59:42 No.100383703

Anonymous 05/08/24(Wed)20:59:42 No.100383703

File: 8x22wizardnala.png (236 KB, 914x634)

236 KB PNG

>>100383613
It actually took 2 tries, the first try the model was basically dictating instructions to Nala on how to reply. Second try is typical run-away mythomax tier reply. If I hit continue we'd probably be riding off on a horse into the sunset forming bonds together.

Anonymous
05/08/24(Wed)21:01:11 No.100383722

Anonymous 05/08/24(Wed)21:01:11 No.100383722

>>100383703
>If I hit continue we'd probably be riding off on a horse into the sunset forming bonds together
Geez. That's dire.
Wish I had the hardware to try and wrangle that retard.

Anonymous
05/08/24(Wed)21:01:33 No.100383727

Anonymous 05/08/24(Wed)21:01:33 No.100383727

>>100383691
llama3 is noticeably smarter than the previous generation models of its size but the loss didn't change at much
so i think when loss begins to plateau it doesn't work well as a measure of model intelligence

Anonymous
05/08/24(Wed)21:02:15 No.100383737

Anonymous 05/08/24(Wed)21:02:15 No.100383737

>>100381190
>Mixtral-8x7B
>LLaMA-3-8B
>not L3-70B
DOA

Anonymous
05/08/24(Wed)21:02:40 No.100383747

Anonymous 05/08/24(Wed)21:02:40 No.100383747

>>100383727
>llama3 is noticeably smarter than the previous generation models of its size but the loss didn't change at much
we don't have the paper we have no idea how much loss we have for L3 8b though

Anonymous
05/08/24(Wed)21:05:58 No.100383793

Anonymous 05/08/24(Wed)21:05:58 No.100383793

>>100383722
Oh wait I'm a retard. I didn't use a Vicuna template for the 8x22B test so I'll have to retest it.

Anonymous
05/08/24(Wed)21:07:29 No.100383815

Anonymous 05/08/24(Wed)21:07:29 No.100383815

>>100383663
A log function always looks like it's going to plateau, but it's still unbounded

Anonymous
05/08/24(Wed)21:08:14 No.100383827

Anonymous 05/08/24(Wed)21:08:14 No.100383827

>>100383793
Ah, that makes sense. Your description of the odd behavior sounded pretty weird for such a big model.

Anonymous
05/08/24(Wed)21:08:50 No.100383833

Anonymous 05/08/24(Wed)21:08:50 No.100383833

>>100383622
someone smarter than me look at this please
it has code and checkpoints too

Anonymous
05/08/24(Wed)21:12:50 No.100383879

Anonymous 05/08/24(Wed)21:12:50 No.100383879

>>100383622
Interesting. so it predicts N "correct" tokens and goes from there.
Interesting.
Almost sounds like a form of branch prediction in a way?
That's fucking cool.

Anonymous
05/08/24(Wed)21:14:36 No.100383897

Anonymous 05/08/24(Wed)21:14:36 No.100383897

>>100383622
Seems like this is similar to what Medusa is attempting to do

Anonymous
05/08/24(Wed)21:17:38 No.100383932

Anonymous 05/08/24(Wed)21:17:38 No.100383932

File: wizvicnala.png (135 KB, 917x358)

135 KB PNG

>>100383827
Having trouble getting a template setup that will milk a lengthy reply out of it. Just by virtue of how Vicuna formatting works. (It's more traditional completion style "A role play between blah blah blah" type stuff instead of "write the next reply".

But overall it's not bad. I'd say it probably has better attention to detail than 70B but 70B just has a little something more to it... sovl if you will.

Anonymous
05/08/24(Wed)21:22:08 No.100383972

Anonymous 05/08/24(Wed)21:22:08 No.100383972

so is l3 abliterated any good or did the technique make it tarded I don't wanna download the whole thing if it sucks but don't see anyone talk about it

Anonymous
05/08/24(Wed)21:23:17 No.100383987

Anonymous 05/08/24(Wed)21:23:17 No.100383987

>>100383972
It's better and less cucked but not fully uncucked, turboderp's Cat-Llama3 and Nvidia's finetune are both smarter and more willing to write sick shit.

Storyfag though, I don't do chat, do ymmv.

Anonymous
05/08/24(Wed)21:23:22 No.100383988

Anonymous 05/08/24(Wed)21:23:22 No.100383988

>>100383972
I personally don't have problems with original Instruct so I don't have much interest in using that unless it actually improves intelligence somehow. Maybe someone should run MMLU through it or something.

Anonymous
05/08/24(Wed)21:24:18 No.100383996

Anonymous 05/08/24(Wed)21:24:18 No.100383996

>>100383987
*so, not do

Anonymous
05/08/24(Wed)21:25:08 No.100384007

Anonymous 05/08/24(Wed)21:25:08 No.100384007

>>100383622
>Bros have you seen this?!
it's been more than a year than I've seen cool papers but at the end of the day all we got was flash attention and GPTQ that's all lol

Anonymous
05/08/24(Wed)21:28:33 No.100384050

Anonymous 05/08/24(Wed)21:28:33 No.100384050

>>100383622
>another way to speed up inference rather than reduce vram cost
>in reality speed is always either billion T/s (fits in vram) or 1 minute per token (doesn't)

yawn... drop it in the pile with speculative execution and MoE

Anonymous
05/08/24(Wed)21:30:46 No.100384080

Anonymous 05/08/24(Wed)21:30:46 No.100384080

>>100384050
the only way to reduce vram is to get the model less big, only the quantization is viable (or bitnet)

Anonymous
05/08/24(Wed)21:30:48 No.100384082

Anonymous 05/08/24(Wed)21:30:48 No.100384082

>>100383972
HF evals failed on it, so it's unknown how high the brain damage is: https://huggingface.co/failspy/llama-3-70B-Instruct-abliterated/discussions/5
From my experience it is still quite cucked, but much less than og llama instruct. It still has that annoying positive vibe.

Anonymous
05/08/24(Wed)21:32:12 No.100384101

Anonymous 05/08/24(Wed)21:32:12 No.100384101

>>100383932
That's better. Have you tried the usual (rparagraphs), (longer respnse), etc in the last output sequence?
Also, did you set the correct Context template too, whichever that might be?

>>100384050
If it can be adapted to run on CPU too, then sure, I'll take it.
Imagine if all these techniques coalesce to huge models being used in RAM with actually usable speeds how cool that would be.

Anonymous
05/08/24(Wed)21:36:21 No.100384142

Anonymous 05/08/24(Wed)21:36:21 No.100384142

>>100382343
Huh, I've never even thought about that. Alright, fair point

Anonymous
05/08/24(Wed)21:48:48 No.100384277

Anonymous 05/08/24(Wed)21:48:48 No.100384277

File: 1433508356435.jpg (127 KB, 831x981)

127 KB JPG

>>100382343
This post makes me happy.

Anonymous
05/08/24(Wed)21:53:55 No.100384345

Anonymous 05/08/24(Wed)21:53:55 No.100384345

>>100384050
>another way to speed up inference rather than reduce vram cost

It's because corpos can easily get vram, unlike us, so they'd rather have stuff that saves them money and lets them serve more users off the same hardware

There's no one working very hard on miniaturization, because they don't care about us humble coomers at home (tbf I wouldn't care either in their position)

Anonymous
05/08/24(Wed)21:53:59 No.100384348

Anonymous 05/08/24(Wed)21:53:59 No.100384348

>>100382343
This comment was so wholesome Miku wants to give you a fist bump!

Anonymous
05/08/24(Wed)21:55:00 No.100384360

Anonymous 05/08/24(Wed)21:55:00 No.100384360

>>100382075
I'm the Lonecel, what do I win?

Anonymous
05/08/24(Wed)21:55:29 No.100384364

Anonymous 05/08/24(Wed)21:55:29 No.100384364

>>100384360
depression and autism

Anonymous
05/08/24(Wed)21:59:13 No.100384408

Anonymous 05/08/24(Wed)21:59:13 No.100384408

>>100384387
>>100384387
>>100384387

Anonymous
05/08/24(Wed)22:01:17 No.100384432

Anonymous 05/08/24(Wed)22:01:17 No.100384432

File: ik.png (130 KB, 1284x436)

130 KB PNG

wtf, ikawrakow the kquants guy is giving up on llama.cpp and making llamafile exclusive fixes now?

https://github.com/Mozilla-Ocho/llamafile/pull/394
https://github.com/Mozilla-Ocho/llamafile/pull/405

Anonymous
05/08/24(Wed)22:03:26 No.100384470

Anonymous 05/08/24(Wed)22:03:26 No.100384470

>>100384371
no.

Anonymous
05/08/24(Wed)22:03:51 No.100384475

Anonymous 05/08/24(Wed)22:03:51 No.100384475

>>100384142
>>100384277
>>100384348
you're welcome, we all matter in the grand scheme of things, never forget that! :3

Anonymous
05/08/24(Wed)22:18:43 No.100384706

Anonymous 05/08/24(Wed)22:18:43 No.100384706

File: 212205979-f75e0777-2e15-4(...).png (6 KB, 296x128)

6 KB PNG

>>100382722
>I prefer to offload the model to hosted and keep the 12g vram for stuff like stable diffusion
This is a pure software problem if you have enough RAM. Unlike loading models from disk, loading checkpoint from RAM when needed is super fast. SD already has this option.

Anonymous
05/08/24(Wed)22:29:49 No.100384816

Anonymous 05/08/24(Wed)22:29:49 No.100384816

File: 1686556622012721.jpg (418 KB, 1506x1001)

418 KB JPG

>>100384470
oh....
>>100384510
Will I get magical powers with Miku?

Anonymous
05/08/24(Wed)22:30:29 No.100384827

Anonymous 05/08/24(Wed)22:30:29 No.100384827

>>100383571
I'M SCALING SO HARD AAAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
05/08/24(Wed)22:56:05 No.100385117

Anonymous 05/08/24(Wed)22:56:05 No.100385117

>>100381190
>RefuelLLM-2 is a Mixtral-8x7B base model, trained on a corpus of 2750+ datasets spanning tasks such as classification, reading comprehension, structured attribute extraction and entity resolution.
>8x7b
slop Slop SLOP, you'd have to be one big retard to believe these scammers. This is exactly the same level as all the fucking 7b models "beating" GPT4.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.