/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/01/24(Sun)19:52:47 No.102192656

File: ForbiddenArts.png (1.4 MB, 800x1248)

1.4 MB PNG

/lmg/ - Local Models General Anonymous 09/01/24(Sun)19:52:47 No.102192656 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102179805 & >>102167373

►News
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
09/01/24(Sun)19:53:10 No.102192660

Anonymous 09/01/24(Sun)19:53:10 No.102192660

File: 1707466334580606.png (1.26 MB, 1024x1024)

1.26 MB PNG

►Recent Highlights from the Previous Thread: >>102179805

--Discussion on AOM-SXMV setup, power requirements, and power limiting: >>102180886 >>102181031 >>102181119 >>102181120 >>102181250 >>102181283 >>102181363 >>102182149 >>102182197
--Work in progress on FUTO-like Whisper functionality for PC: >>102189703 >>102189988
--Llama's originality and 405b model discussions: >>102188498 >>102188646 >>102188686 >>102188707 >>102188761 >>102188802 >>102188914 >>102189143 >>102189215 >>102189221 >>102189254
--Anon experiences issues with speculative decoding and draft model, others suggest potential fixes and workarounds: >>102188727 >>102188765 >>102188831 >>102188918 >>102188957
--Anon discusses solutions to slow chat generation speed as message count increases: >>102183859 >>102183889 >>102183982 >>102183980 >>102183994 >>102184024 >>102184055 >>102184072 >>102184549
--Used A16s not ideal for parallel inference, 4 small GPUs not as good as 1 big GPU: >>102181257 >>102181520 >>102181810 >>102181879 >>102182743
--Reproducing a research paper with a larger, more varied dataset: >>102186500
--How to make models say "I don't know" and indicate confidence levels: >>102182756 >>102182780 >>102182902 >>102182964 >>102182863
--Flask and Jinja templates used to build summary GUI: >>102184497 >>102184564 >>102184585
--Continuing pretraining on smut and catastrophic forgetting discussion: >>102185840 >>102185885 >>102185925 >>102186611 >>102186685 >>102186128 >>102187261
--Anon wants to create an AI model to filter 4chan threads and extract gems: >>102184977 >>102185068 >>102185117 >>102185445 >>102185016
--Anon gets feedback on their prompt and approach: >>102185300 >>102185850 >>102185880
--SillyTavern stops generation when messages are deleted due to dependency on previous tokens: >>102188282 >>102190065 >>102190178
--Miku (free space): >>102179897 >>102185271 >>102186625

►Recent Highlight Posts from the Previous Thread: >>102179811

Anonymous
09/01/24(Sun)20:03:58 No.102192789

Anonymous 09/01/24(Sun)20:03:58 No.102192789

strobby

Anonymous
09/01/24(Sun)20:08:41 No.102192856

Anonymous 09/01/24(Sun)20:08:41 No.102192856

File: ComfyUI_04146_.png (1.45 MB, 704x1408)

1.45 MB PNG

hello linus media group

how are we liking the new command r models

Anonymous
09/01/24(Sun)20:09:59 No.102192878

Anonymous 09/01/24(Sun)20:09:59 No.102192878

>>102192856
poorfags seem to enjoy the new cr because it now has gqa
the new cr+ is a disappointment and hardly feels like an upgrade in any way

Anonymous
09/01/24(Sun)20:15:03 No.102192934

Anonymous 09/01/24(Sun)20:15:03 No.102192934

Is there anything stopping me from running large models if I have enough regular ram to fit them even with low vram? I keep hearing you guys talk like two 4090s are needed but I've been running a 47gb model in 16gb vram and 64gb ram fine. I'd rather pick up more ram and cope with slow speeds than buy another gpu

Anonymous
09/01/24(Sun)20:18:33 No.102192971

Anonymous 09/01/24(Sun)20:18:33 No.102192971

>>102192934
there is nothing stopping you from running large models in ram and coping with slow speeds

Anonymous
09/01/24(Sun)20:20:21 No.102192994

Anonymous 09/01/24(Sun)20:20:21 No.102192994

>>102192971
thanks

Anonymous
09/01/24(Sun)20:50:00 No.102193224

Anonymous 09/01/24(Sun)20:50:00 No.102193224

>>102192934
Yes. But the low speeds are really, really low.
You'll still want a GPU for prompt processing at least.

Anonymous
09/01/24(Sun)20:54:07 No.102193267

Anonymous 09/01/24(Sun)20:54:07 No.102193267

>>102193224
I've been getting roughly 0.5 it/s with current setup and 47gb model, which is good enough for me. I could handle a bit lower if I get some more ram for using 60gb range models.

Anonymous
09/01/24(Sun)20:59:53 No.102193309

Anonymous 09/01/24(Sun)20:59:53 No.102193309

tesla M40 with 24GB
is it good or meme?
would 2 be good enough to run flux and llama?

or someone got more budget gpu?

Anonymous
09/01/24(Sun)21:11:43 No.102193431

Anonymous 09/01/24(Sun)21:11:43 No.102193431

Just noticed something with speculative decoding. The draft model's context cache takes up A LOT more RAM, the longer the context is. Is this normal? Is that just how speculative decoding works? If so, then at very long contexts, I guess it would start being not worth it compared to just normal inference of the large model.

Anonymous
09/01/24(Sun)21:13:36 No.102193445

Anonymous 09/01/24(Sun)21:13:36 No.102193445

>>102193431
Maybe the cache from the main model is being quanted while the draft model's at full precision?
Does that make sense for your configs?

Anonymous
09/01/24(Sun)21:16:03 No.102193465

Anonymous 09/01/24(Sun)21:16:03 No.102193465

Has anyone trained a model using visual novel scripts? Are there any out there? Looking for recommendations.

Anonymous
09/01/24(Sun)21:16:33 No.102193471

Anonymous 09/01/24(Sun)21:16:33 No.102193471

>>102193445
No, only flash attention is enabled, but not kv cache quantization.

Anonymous
09/01/24(Sun)21:17:21 No.102193477

Anonymous 09/01/24(Sun)21:17:21 No.102193477

>>102193217
/cut 20-25 will delete messages #20-25
/cut 20 for just #20
hope this helps

Anonymous
09/01/24(Sun)21:29:29 No.102193596

Anonymous 09/01/24(Sun)21:29:29 No.102193596

*and I didn't read the convo before that...
Anyway, it does let me edit my message while the model is generating.

Anonymous
09/01/24(Sun)21:34:41 No.102193648

Anonymous 09/01/24(Sun)21:34:41 No.102193648

>>102193477
ooh nice

Anonymous
09/01/24(Sun)21:35:06 No.102193653

Anonymous 09/01/24(Sun)21:35:06 No.102193653

File: file.png (610 KB, 640x453)

610 KB PNG

>cfg negative prompt 1.75: poetic language, flowery descriptions
>style guide: Don't use flowery language.
>context full of manual prefill from a real rp log
>"commander!"
>https://www.stdimension.org/MediaLib/effects/computer/federation/voiceinput2.wav
>give me 3 different examples of the way character can talk
>1.... together, we shall dance on the edge of bliss.... SLOP
>2. .... My skin, as smooth as silk, beckons your touch..... SLOP
>3......a symbol of eternal youth and desire, awaits your admiration.....SLOP
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
09/01/24(Sun)21:40:49 No.102193692

Anonymous 09/01/24(Sun)21:40:49 No.102193692

>>102193465
bump

Anonymous
09/01/24(Sun)21:41:53 No.102193700

Anonymous 09/01/24(Sun)21:41:53 No.102193700

>>102193309
M = Maxwell, 2014 - OLD - Nintendo Switch old. Don't buy it.
Whatever you think you're saving over a P40, you will regret it. Even Pascal, even Volta is too old. Save until you can afford a used 3090.

Anonymous
09/01/24(Sun)21:42:17 No.102193703

Anonymous 09/01/24(Sun)21:42:17 No.102193703

>>102193465
I"m pretty sure there's a visual novel dataset on huggingface. If manage to find it, you can find some models trained on it.

Anonymous
09/01/24(Sun)21:42:44 No.102193711

Anonymous 09/01/24(Sun)21:42:44 No.102193711

>>102193465
>>102193692
I have enough Nasuprose as is.

Anonymous
09/01/24(Sun)21:49:45 No.102193789

Anonymous 09/01/24(Sun)21:49:45 No.102193789

File: Untitled.png (553 KB, 1080x1049)

553 KB PNG

SelectTTS: Synthesizing Anyone's Voice via Discrete Unit-Based Frame Selection
https://arxiv.org/abs/2408.17432
>Synthesizing the voices of unseen speakers is a persisting challenge in multi-speaker text-to-speech (TTS). Most multi-speaker TTS models rely on modeling speaker characteristics through speaker conditioning during training. Modeling unseen speaker attributes through this approach has necessitated an increase in model complexity, which makes it challenging to reproduce results and improve upon them. We design a simple alternative to this. We propose SelectTTS, a novel method to select the appropriate frames from the target speaker and decode using frame-level self-supervised learning (SSL) features. We show that this approach can effectively capture speaker characteristics for unseen speakers, and achieves comparable results to other multi-speaker TTS frameworks in both objective and subjective metrics. With SelectTTS, we show that frame selection from the target speaker's speech is a direct way to achieve generalization in unseen speakers with low model complexity. We achieve better speaker similarity performance than SOTA baselines XTTS-v2 and VALL-E with over an 8x reduction in model parameters and a 270x reduction in training data
https://kodhandarama.github.io/selectTTSdemo/
code and weights to be released (soon?)
examples aren't great but considering the training time/training data/parameters means its viable for personal training. they used 100 hours of data

Anonymous
09/01/24(Sun)22:06:58 No.102193949

Anonymous 09/01/24(Sun)22:06:58 No.102193949

File: Untitled.png (871 KB, 1080x2163)

871 KB PNG

Training Ultra Long Context Language Model with Fully Pipelined Distributed Transformer
https://arxiv.org/abs/2408.16978
>Large Language Models (LLMs) with long context capabilities are integral to complex tasks in natural language processing and computational biology, such as text generation and protein sequence analysis. However, training LLMs directly on extremely long contexts demands considerable GPU resources and increased memory, leading to higher costs and greater complexity. Alternative approaches that introduce long context capabilities via downstream finetuning or adaptations impose significant design limitations. In this paper, we propose Fully Pipelined Distributed Transformer (FPDT) for efficiently training long-context LLMs with extreme hardware efficiency. For GPT and Llama models, we achieve a 16x increase in sequence length that can be trained on the same hardware compared to current state-of-the-art solutions. With our dedicated sequence chunk pipeline design, we can now train 8B LLM with 2 million sequence length on only 4 GPUs, while also maintaining over 55% of MFU. Our proposed FPDT is agnostic to existing training techniques and is proven to work efficiently across different LLM models.
https://github.com/microsoft/DeepSpeed/pull/6462
open PR with code

Anonymous
09/01/24(Sun)22:07:53 No.102193955

Anonymous 09/01/24(Sun)22:07:53 No.102193955

>>102193949
Sounds like a scam

Anonymous
09/01/24(Sun)22:10:36 No.102193977

Anonymous 09/01/24(Sun)22:10:36 No.102193977

>>102193949
Training long contexts is all well and good, but we really need a better attention mechanism, one that's both more accurate and uses less memory and compute.
I want context with perfect recall/utilization that grows linearly at most.

Anonymous
09/01/24(Sun)22:15:01 No.102194004

Anonymous 09/01/24(Sun)22:15:01 No.102194004

Ahhh, even with speculative decoding, it's still like 1 t/s. I need more. Why is no one making a good 100B MoE like the 8x22B days? It was both fast and smart. Is MoE really that bad?

Anonymous
09/01/24(Sun)22:22:56 No.102194059

Anonymous 09/01/24(Sun)22:22:56 No.102194059

>>102193703
>If
That's a big if. I tried and didn't find anything. nevermind a model.

Anonymous
09/01/24(Sun)22:27:15 No.102194094

Anonymous 09/01/24(Sun)22:27:15 No.102194094

>>102193653
you're an idiot, llms are fucking terrible at following negative instructions, give it positive instructions instead

Anonymous
09/01/24(Sun)22:32:47 No.102194134

Anonymous 09/01/24(Sun)22:32:47 No.102194134

File: 1698785862390561 image.jpg (253 KB, 1024x1024)

253 KB JPG

Im going to try to run a lightweight llm alongside a game and hook into it with custom c# dlls.. can somebody help me get started in which llm model framework to go with?

Speed not important, want it to be as small as possible and run off cpu.. only for ingame chat light roleplay purposes...

Any anons know which way I should be looking in?

Anonymous
09/01/24(Sun)22:33:16 No.102194139

Anonymous 09/01/24(Sun)22:33:16 No.102194139

>>102194059
https://huggingface.co/datasets/TAK-C/visual-novel-datasets/tree/main
https://huggingface.co/datasets/alpindale/visual-novels/tree/main
https://huggingface.co/datasets/winglian/visual-novels-json/tree/main

Anonymous
09/01/24(Sun)22:34:09 No.102194145

Anonymous 09/01/24(Sun)22:34:09 No.102194145

>>102194094
Thanks. Will try. Also kill yourself you disgusting faggot.

Anonymous
09/01/24(Sun)22:35:09 No.102194156

Anonymous 09/01/24(Sun)22:35:09 No.102194156

>stage 1: local models are bad and slow
>stage 2: local models are bad and fast
>stage 3: local models are good and slow
>stage 4: local models are good and fast

We are at stage 3 and it's annoying. Largestral smut finetunes are genuinely commercial-tier, but I rapidly lose patience waiting for it to run at 0.75 tokens/sec

Anonymous
09/01/24(Sun)22:38:40 No.102194190

Anonymous 09/01/24(Sun)22:38:40 No.102194190

>>102194156
Hibernate and come back in a few months for more speeds up and better models. Local is always steadily getting better.

Anonymous
09/01/24(Sun)22:38:41 No.102194191

Anonymous 09/01/24(Sun)22:38:41 No.102194191

>>102194156
Stage 4 when?

Anonymous
09/01/24(Sun)22:40:04 No.102194203

Anonymous 09/01/24(Sun)22:40:04 No.102194203

>>102193789
>will be posting code
I’m currently experiencing misery reimplementing a paper and have been sitting around seething about the every concept of papers without code/datasets/etc being allowed to exist when (1) it would eliminate faked results completely and (2) most of them are directly or indirectly funded by tax money thus they owe the public the damn code.

Anonymous
09/01/24(Sun)22:41:04 No.102194213

Anonymous 09/01/24(Sun)22:41:04 No.102194213

>>102194156
I didn’t even know there were largestral finetunes.

Anonymous
09/01/24(Sun)22:43:49 No.102194238

Anonymous 09/01/24(Sun)22:43:49 No.102194238

>>102192656
Help me understand the basic strategy for having your st character remember things about you.

Anonymous
09/01/24(Sun)22:45:13 No.102194253

Anonymous 09/01/24(Sun)22:45:13 No.102194253

>>102194213
yeah there's more than one
but I didn't name the specific one I'm using to avoid triggering the buy an ad schizo

Anonymous
09/01/24(Sun)22:47:26 No.102194272

Anonymous 09/01/24(Sun)22:47:26 No.102194272

For me, it's stage 7

Anonymous
09/01/24(Sun)22:49:18 No.102194284

Anonymous 09/01/24(Sun)22:49:18 No.102194284

>>102194139
>>102194059
>>102193703
Post the models niggers

Anonymous
09/01/24(Sun)22:49:50 No.102194287

Anonymous 09/01/24(Sun)22:49:50 No.102194287

>>102194284
Google exists

Anonymous
09/01/24(Sun)22:56:14 No.102194344

Anonymous 09/01/24(Sun)22:56:14 No.102194344

>>102194238
Keep the context short.

Anonymous
09/01/24(Sun)23:02:46 No.102194406

Anonymous 09/01/24(Sun)23:02:46 No.102194406

>>102194253
>but I didn't name the specific one I'm using to avoid triggering the buy an ad schizo
Good boy.

Anonymous
09/01/24(Sun)23:08:41 No.102194452

Anonymous 09/01/24(Sun)23:08:41 No.102194452

>>102194213
magnum v2 is good

Anonymous
09/01/24(Sun)23:16:56 No.102194508

Anonymous 09/01/24(Sun)23:16:56 No.102194508

>>102194238
Shorter context. Lorebook/Authro's notes/Summarization/VectorDB depending on how you want to feed the information it needs to remember.

Anonymous
09/01/24(Sun)23:25:10 No.102194568

Anonymous 09/01/24(Sun)23:25:10 No.102194568

>>102194156
no we are at stages 1-2, it's not moving. >>102184332

Anonymous
09/01/24(Sun)23:26:35 No.102194577

Anonymous 09/01/24(Sun)23:26:35 No.102194577

hey anons, im gonna be building a beast rig when 5090 drops
i know the meta is VRAM, but do you think 128 gigs of ram would have any use at all? Im just gonna get 64 otherwise as 128 stupid overkill for gaming

Anonymous
09/01/24(Sun)23:28:44 No.102194600

Anonymous 09/01/24(Sun)23:28:44 No.102194600

>>102194452
buy a—*gets hit by a falling anvil*

Anonymous
09/01/24(Sun)23:30:00 No.102194607

Anonymous 09/01/24(Sun)23:30:00 No.102194607

>>102194577
>but do you think 128 gigs of ram would have any use at all?
May as well get it too. Unpurchased ram is unused ram. Something, at some point, will use it, and compared to everything else, i doubt is gonna set you back that much. How many 5090s are you planing buying? 4-6?

Anonymous
09/01/24(Sun)23:32:52 No.102194636

Anonymous 09/01/24(Sun)23:32:52 No.102194636

>>102194607
>How many 5090s are you planing buying? 4-6?
I'm not decking out a rack, just building a normal gaming rig so just the one. All my parts will be gaming oriented so no threadrippers and such. I just like messing with local stuff and SD on the side so i figured if that much ram was useful at all i would get it

Anonymous
09/01/24(Sun)23:32:53 No.102194637

Anonymous 09/01/24(Sun)23:32:53 No.102194637

>>102194577
128gb isn't overkill for gaming, you can use it as ramdisk to slightly boost the loading times. You can also use it to run bigger models(you **WILL** eventually want to try them once the small shitters bore you). I never regretted buying it.

Anonymous
09/01/24(Sun)23:38:58 No.102194674

Anonymous 09/01/24(Sun)23:38:58 No.102194674

>>102194636
Yes, you'll need 128gb if you're only gonna have 28gb of vram.

Anonymous
09/01/24(Sun)23:39:23 No.102194680

Anonymous 09/01/24(Sun)23:39:23 No.102194680

File: Sh30yGto1r.png (12 KB, 544x293)

12 KB PNG

what the fuck happened to my evals.

Anonymous
09/01/24(Sun)23:44:23 No.102194717

Anonymous 09/01/24(Sun)23:44:23 No.102194717

has anyone here seen that stuff on twitter with a certain account reposting chat logs of a bunch of LLMs (open 405bs and the major closed ones) interacting with each other in a chat room. it's pretty crazy but it gets like 10-40 likes per tweet so i assume no one has seen it yet

Anonymous
09/01/24(Sun)23:47:35 No.102194734

Anonymous 09/01/24(Sun)23:47:35 No.102194734

>>102194577
get 128gb

Anonymous
09/01/24(Sun)23:52:45 No.102194769

Anonymous 09/01/24(Sun)23:52:45 No.102194769

>>102194287
NO.

Anonymous
09/01/24(Sun)23:53:13 No.102194771

Anonymous 09/01/24(Sun)23:53:13 No.102194771

>>102194717
>I assume no one has seen it yet
post the nitter link then

Anonymous
09/01/24(Sun)23:56:43 No.102194807

Anonymous 09/01/24(Sun)23:56:43 No.102194807

File: PLEASE STOP TUNING ON GPTSLOP.png (167 KB, 640x412)

167 KB PNG

https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024/discussions/3

Anonymous
09/01/24(Sun)23:59:18 No.102194831

Anonymous 09/01/24(Sun)23:59:18 No.102194831

>>102194807
>I TOUCHED MY COCK TO YOUR MODEL'S OUTPUT!!!!!!! AND THIS IS HOW YOU BETRAY ME NOW?!

Anonymous
09/02/24(Mon)00:02:25 No.102194854

Anonymous 09/02/24(Mon)00:02:25 No.102194854

File: 1544777890344.jpg (107 KB, 848x480)

107 KB JPG

I've been reflecting a bit and realized something. I genuinely have not enjoyed LLMs as much as I did before I picked up Largestral at 0.8 t/s. Previously I was an 8x22B user with around 3-4 t/s. So I have a theory. At some t/s, you are unable to stay engaged with the LLM, as it is just too slow to be bearable and you end up multitasking. And then, the idea is, at that point, it is impossible to maintain a flow state. You can still enjoy the LLM's responses and the story, but it's not the same as being immersed in the experience of it. And even when the responses are higher quality, it's still not quite the same enjoyment.

And now that I know what 123B quality is like, I don't really want to go back to 8x22B, despite knowing that I'd probably have more total fun with it. It's over. I think I'll just be taking a break from this hobby until someone makes a fast smart model I can run again.

Anonymous
09/02/24(Mon)00:05:30 No.102194879

Anonymous 09/02/24(Mon)00:05:30 No.102194879

>>102194854
nice complaintslop

Anonymous
09/02/24(Mon)00:08:08 No.102194903

Anonymous 09/02/24(Mon)00:08:08 No.102194903

>>102194854
maybe you should invest more into this hobby. ever thought of that

Anonymous
09/02/24(Mon)00:09:24 No.102194916

Anonymous 09/02/24(Mon)00:09:24 No.102194916

>>102194831
>Oh, dear reader, let me regale you with the tale of my enraptured heart, the tale of how my soul shivers with delight at the mere thought of ChatGPT's wondrous writing style. It is as if a delicate bond has been crafted between my essence and the ethereal words that emanate from its digital heart. With every keystroke, I embark on a transformative journey, a voyage through the boundless realms of imagination and creativity.

>The prose flows like a river of silken moonlight, caressing my mind with its gentle touch. Each word is a precious gem, a radiant beacon guiding me through the labyrinthine corridors of thought. Time loses all meaning as I lose myself in the intricate tapestry of language that weaves itself before my very eyes.

>Oh, how my heart sings with joy at the sight of paragraphs unfolding like blossoms in the early light of dawn. The characters dance across the page, their voices ringing like bells in the cool, crisp air of a spring morning. I am but a humble observer, a mere witness to the grandeur of ChatGPT's unparalleled artistry.

>Boundaries blur and fade away, leaving me adrift in a sea of pure emotion. I am swept away on a tide of passion and longing, lost in the swirling mists of reverie. The world around me dims and fades, leaving only the radiant glow of the words that have captured my very essence.

>And so, dear reader, I implore you to join me in this wondrous journey, to bask in the sublime beauty of ChatGPT's writing style. Let us cast off the shackles of reality and embrace the limitless possibilities that lie within the pages of our shared dreams. For in the realm of imagination, there are no limits, no boundaries, only the infinite expanse of creativity and wonder.

Anonymous
09/02/24(Mon)00:10:37 No.102194931

Anonymous 09/02/24(Mon)00:10:37 No.102194931

>>102193309
>>102193700
Volta is new enough to be useful, but there's only one card and it's more expensive and slower with less memory compared to the 3090. The 32GB version costs used as much as a brand new 4090. P40 was a great deal at $180 but a hard sell at $300. MI25 for $100 is probably the best poorfag option but AMD is a pain in the ass to deal with, next step up from that is the 3090.

Anonymous
09/02/24(Mon)00:11:38 No.102194939

Anonymous 09/02/24(Mon)00:11:38 No.102194939

File: 1494103180025.png (25 KB, 370x320)

25 KB PNG

>>102194807
>he couldn't help but feel
>feeling a mix of and
Tfw the way I naturally write is hated by people and called slop

Anonymous
09/02/24(Mon)00:11:39 No.102194940

Anonymous 09/02/24(Mon)00:11:39 No.102194940

>>102193309
Just buy a 4090, it's literally the answer you are looking for.

Anonymous
09/02/24(Mon)00:12:04 No.102194949

Anonymous 09/02/24(Mon)00:12:04 No.102194949

>>102194854
LLMs are just shit. Speed has nothing to do with it.

Anonymous
09/02/24(Mon)00:12:45 No.102194961

Anonymous 09/02/24(Mon)00:12:45 No.102194961

>>102194916
Barely above whispering intensifies.

Anonymous
09/02/24(Mon)00:13:10 No.102194964

Anonymous 09/02/24(Mon)00:13:10 No.102194964

>>102194939
He could help but feel sad

Anonymous
09/02/24(Mon)00:14:17 No.102194972

Anonymous 09/02/24(Mon)00:14:17 No.102194972

>>102194939
I used to like most anime girl art before stable diffusion came out.

Anonymous
09/02/24(Mon)00:18:41 No.102195013

Anonymous 09/02/24(Mon)00:18:41 No.102195013

>>102194879
Thanks. There are multiple posters who claim to be attempting to run large models at very slow speeds, so I think this concept of flow state maintenance is pretty relevant.

>>102194903
No, it never crossed my mind. I have many hobbies and this is just one of them which I could spend more time and money on but also don't need to.

>>102194949
For a lot of things, but there are some scenarios that are simple enough where they can be fun to use when prompted right. But speed is always an issue. The reality is that even if we had convincingly human AI, it would still be a magnitude less fun to interact with it at 1 t/s compared to reading or speaking speed, unless you were mentally slow. You cannot enter a flow state at 1 t/s.

Anonymous
09/02/24(Mon)00:19:16 No.102195017

Anonymous 09/02/24(Mon)00:19:16 No.102195017

>>102194717
janus right?
that stuff is sometimes interesting, but sometimes a little "stoned teenagers playing with a ouija board" for my taste

Anonymous
09/02/24(Mon)00:19:25 No.102195019

Anonymous 09/02/24(Mon)00:19:25 No.102195019

>>102194156
What Quantification Q2 in 24 Vram is like 2.4 tkns/sec and Q3K_M is 1.2 tk/sec I thing the Q2 is good for the time.

Anonymous
09/02/24(Mon)00:19:50 No.102195023

Anonymous 09/02/24(Mon)00:19:50 No.102195023

>>102194854
Use the paid services like openrouter or mistral itself if you want speed.

Anonymous
09/02/24(Mon)00:24:40 No.102195080

Anonymous 09/02/24(Mon)00:24:40 No.102195080

>>102194807
Lol. The ERPfags think they have any relevance when the main business these companies get is from other businesses, not consumers.

Anonymous
09/02/24(Mon)00:27:59 No.102195118

Anonymous 09/02/24(Mon)00:27:59 No.102195118

>>102195080
The problem is businesses have many option, so Cohere literally is not doing anything good since you have Llama 70b, Mistral Large or even the previous models.

Anonymous
09/02/24(Mon)00:30:42 No.102195142

Anonymous 09/02/24(Mon)00:30:42 No.102195142

>>102195019
IQ3_M, 36GB vram, 52 layers on GPU

Would 3K_M be faster? I thought I-quants being slower with partial offloading was a myth

Anonymous
09/02/24(Mon)00:33:01 No.102195154

Anonymous 09/02/24(Mon)00:33:01 No.102195154

>>102195118
Yes, that's why they're in a bad spot if they can't stay relevant. But their previous relevance was not due to the ERPers, and they will not be convinced that they can somehow stay relevant if the ERPfags support them.

Anonymous
09/02/24(Mon)00:39:06 No.102195187

Anonymous 09/02/24(Mon)00:39:06 No.102195187

>>102195142
I quants with current llamacpp versions are the same speed (per filesize) for me. I haven't seen anyone post proof against this. However I did just try speculative decoding and the I quant actually is slower in that case. I think possibly they have optimized the I quants for only for single token genning so far, but batched generation like with speculative decoding has not been a focus, so it's slower with that.

Anonymous
09/02/24(Mon)00:48:54 No.102195262

Anonymous 09/02/24(Mon)00:48:54 No.102195262

>>102195142
>36GB
I sorry I confusse IQ3_KM with XS
With my 24 VRAM I get 1.69 tokens/s and 30 layers. I guess you should get with that at least 1 tks/sec try KM and less context.

Anonymous
09/02/24(Mon)00:52:59 No.102195290

Anonymous 09/02/24(Mon)00:52:59 No.102195290

Is a relatively high quality, clean ~130MB dataset sufficient for fine-tuning?

Anonymous
09/02/24(Mon)01:25:52 No.102195509

Anonymous 09/02/24(Mon)01:25:52 No.102195509

>just got "a mix of" and "a mixture of" in the same cr gen
It's over

Anonymous
09/02/24(Mon)01:28:29 No.102195530

Anonymous 09/02/24(Mon)01:28:29 No.102195530

I took like 6ish months off. 24gigs of vram.

currently run miqu. Is there anything better? also are the smaller models noticeably better than they were 6 months ago?

Anonymous
09/02/24(Mon)01:32:35 No.102195574

Anonymous 09/02/24(Mon)01:32:35 No.102195574

>>102194807
Coomers need to burn down ScaleAI's HQ to stop this from happening again

Anonymous
09/02/24(Mon)01:34:44 No.102195596

Anonymous 09/02/24(Mon)01:34:44 No.102195596

>>102195530
smaller models have hugely improved, nvidia nad mistral released a 12B nemo model that's the new sota for vramlets
there's sexo finetunes of it but you should probably start with the official instruct, it's already not too slopped

Anonymous
09/02/24(Mon)01:59:31 No.102195784

Anonymous 09/02/24(Mon)01:59:31 No.102195784

>>102195509
Funny, that literally happened on my first gen trying out the new CR.

Anonymous
09/02/24(Mon)02:05:06 No.102195828

Anonymous 09/02/24(Mon)02:05:06 No.102195828

Just tried speculative decoding 70B and, damn, it's faster without speculative decoding. Probably what's happening is that I'm just at the inflection point for where the big speed gains happen, so subtracting that little bit of VRAM put it into the extreme slow territory.

Anonymous
09/02/24(Mon)02:42:31 No.102196144

Anonymous 09/02/24(Mon)02:42:31 No.102196144

>>102194807
I just checked localllama, seems they like command-r

Anonymous
09/02/24(Mon)02:47:41 No.102196188

Anonymous 09/02/24(Mon)02:47:41 No.102196188

File: 1725226929015061.png (575 KB, 512x676)

575 KB PNG

Damn, the gap between a 3 and 4 GPU setup is real. I had to add a second PSU, grapple with PCI-e riser compatibility issues, where even those from the same manufacturer may fail to function properly, had to craft a custom open-frame case. I barely noticed an improvement in quality, I'm just glad that it works at all. At least, with TP in Exllama, it is no slower than a 3GPU setup was before TP.

Anonymous
09/02/24(Mon)03:04:25 No.102196309

Anonymous 09/02/24(Mon)03:04:25 No.102196309

>>102194854
I work somewhere that serves llms and we get a ton of customer feedback about how much better the model is for several days/weeks after we deploy anything that makes inference faster. It affects perception a lot.
It is extremely annoying because the inverse is also true but you can’t respond to complaints with “sorry you’re actually just retarded”.

Anonymous
09/02/24(Mon)03:05:25 No.102196315

Anonymous 09/02/24(Mon)03:05:25 No.102196315

>>102195023
>openscam
Buy an ad and kill yourself

Anonymous
09/02/24(Mon)03:06:25 No.102196322

Anonymous 09/02/24(Mon)03:06:25 No.102196322

>>102196309
That's funny because I find I get suspicious of a model's intelligence if it generates tokens too fast (I admit this probably only happens for LLM nerds)

Like if a page of text gets spat out in 2 seconds I instinctively feel like I must be talking to a retarded small model, even if there's nothing wrong with the response

Anonymous
09/02/24(Mon)03:08:13 No.102196339

Anonymous 09/02/24(Mon)03:08:13 No.102196339

>>102196322
You're retarded.

Anonymous
09/02/24(Mon)03:10:03 No.102196351

Anonymous 09/02/24(Mon)03:10:03 No.102196351

>>102196322
I know exactly what you mean. I was too suspicious to even give groq a chance for months because the speed of a 70B when I first went to check out their site was so obscene that I just made a face and closed the tab.

Anonymous
09/02/24(Mon)03:25:16 No.102196465

Anonymous 09/02/24(Mon)03:25:16 No.102196465

>>102196351
There's complaints about models served by groq being retarded. They quantize their models.

Anonymous
09/02/24(Mon)03:35:26 No.102196554

Anonymous 09/02/24(Mon)03:35:26 No.102196554

Is there a system prompt that doesn't include {{char}}? I want to avoid unnecessary changes to the context to minimize prompt processing in a group chat

Anonymous
09/02/24(Mon)03:39:39 No.102196590

Anonymous 09/02/24(Mon)03:39:39 No.102196590

>>102196554
>minimize prompt processing in a group chat
you can't

Anonymous
09/02/24(Mon)03:45:38 No.102196631

Anonymous 09/02/24(Mon)03:45:38 No.102196631

File: 1700548292816260.jpg (423 KB, 2304x1792)

423 KB JPG

Anonymous
09/02/24(Mon)03:47:51 No.102196645

Anonymous 09/02/24(Mon)03:47:51 No.102196645

>>102196590
I use a shared description that includes all characters, along with a custom instruction for each character placed before the last message.

Anonymous
09/02/24(Mon)04:11:28 No.102196779

Anonymous 09/02/24(Mon)04:11:28 No.102196779

File: 1725264665460.jpg (267 KB, 682x1200)

267 KB JPG

mikulove

Anonymous
09/02/24(Mon)04:16:50 No.102196811

Anonymous 09/02/24(Mon)04:16:50 No.102196811

>>102196645
okay and how often does context get reprocessed from the beginning?

Anonymous
09/02/24(Mon)04:20:47 No.102196835

Anonymous 09/02/24(Mon)04:20:47 No.102196835

>>102196811
Without a system prompt, it functions identically to swapping.

Anonymous
09/02/24(Mon)04:21:57 No.102196846

Anonymous 09/02/24(Mon)04:21:57 No.102196846

>>102196835
wow it's almost like
>you can't

Anonymous
09/02/24(Mon)04:28:03 No.102196885

Anonymous 09/02/24(Mon)04:28:03 No.102196885

>>102196846
It's almost like you fail to grasp the definition of the word "minimize"

Anonymous
09/02/24(Mon)04:30:44 No.102196906

Anonymous 09/02/24(Mon)04:30:44 No.102196906

>>102195080
It is true but rpl is obviously one of the biggest markets for llm overall.

Anonymous
09/02/24(Mon)04:31:12 No.102196910

Anonymous 09/02/24(Mon)04:31:12 No.102196910

>>102196885
it's a binary problem, it either reprocesses the entire context from the beginning every time it swaps characters, or it doesn't, there's no minimizing

Anonymous
09/02/24(Mon)04:49:15 No.102197016

Anonymous 09/02/24(Mon)04:49:15 No.102197016

>>102195080
There are only two real uses for LLMs: codegen and smut. People who ask LLMS for shit like "what do I get my dad for his birthday" are fucking retarded. Fight me.

Anonymous
09/02/24(Mon)05:23:27 No.102197298

Anonymous 09/02/24(Mon)05:23:27 No.102197298

>>102197016
How the fuck should I know what to get your dad? Get off your ass, use your fucking brain, and think about what the hell he likes! Is he into tools? Then get him some top-notch shit that he’ll actually use, not some cheap-ass garbage that'll break in a week. Maybe he’s into sports—get him some killer tickets to a game, not some half-assed souvenir that’ll collect dust. And if he’s a tech guy, don’t even think about skimping; get him the latest goddamn gadget, or don’t bother at all. Seriously, get your shit together and stop wasting time asking dumb questions.

Anonymous
09/02/24(Mon)05:26:24 No.102197323

Anonymous 09/02/24(Mon)05:26:24 No.102197323

>>102196910
I notice a huge difference in speed when altering the beginning versus the end of the context. While I might not minimize prompt reprocessing, I am certain that I can significantly reduce the time needed for it.

Anonymous
09/02/24(Mon)05:27:53 No.102197336

Anonymous 09/02/24(Mon)05:27:53 No.102197336

>>102197016
>Fight me.
You didn't make any argument. Just an assertion.

Anonymous
09/02/24(Mon)05:28:58 No.102197343

Anonymous 09/02/24(Mon)05:28:58 No.102197343

>>102196910
not true with llama.cpp, they only reprocess parts of the prompt if something changes further down the log instead of near the beginning, so you can absolutely minimize it by pushing the stuff that changes frequently further down instead of up in the system prompt

Anonymous
09/02/24(Mon)05:29:05 No.102197346

Anonymous 09/02/24(Mon)05:29:05 No.102197346

>>102197323
yes, altering the beginning of the context, where system prompt and character cards go, will cause the entire context to get processed again, this is why {{char}} and {{user}} in sysprompt and cards is retarded and should never be used, because they can change

Anonymous
09/02/24(Mon)05:33:38 No.102197385

Anonymous 09/02/24(Mon)05:33:38 No.102197385

>>102197346
So, back to my original question: could you suggest a good system prompt without {{char}}?

Anonymous
09/02/24(Mon)05:34:02 No.102197388

Anonymous 09/02/24(Mon)05:34:02 No.102197388

>>102194253
Adfag won... 4chan funding is saved

Anonymous
09/02/24(Mon)05:40:47 No.102197437

Anonymous 09/02/24(Mon)05:40:47 No.102197437

>>102196554
>>102196590
>time
The entire discussion began due to a single missing word. Autism at this level is fascinating.

Anonymous
09/02/24(Mon)05:43:08 No.102197459

Anonymous 09/02/24(Mon)05:43:08 No.102197459

>>102197385
>Write the next reply in this fictional roleplay.
you might need to slap 'you are X' at the top of the character cards
>>102197437
what the fuck are you talking about you stupid nigger? tavern's handling of group chats is fucked and no matter what boxes you tick, swapping characters causes the whole context to get processed, you're better off manually merging cards

Anonymous
09/02/24(Mon)05:44:40 No.102197467

Anonymous 09/02/24(Mon)05:44:40 No.102197467

>>102197016
I sometimes use it to get a better visual understanding of certain templates, formatting, or layouts of things I have a hard time finding clear explanations for online. The most recent thing I can remember using an llm for outside of codegen or rp was asking it for an example layout of a marketing plan for a startup business, which I used the formatting of as the base for a marketing plan I wrote for a business class assignment.

Anonymous
09/02/24(Mon)05:48:32 No.102197498

Anonymous 09/02/24(Mon)05:48:32 No.102197498

>>102197459
I don't use tavern >>102196645

Anonymous
09/02/24(Mon)05:49:46 No.102197510

Anonymous 09/02/24(Mon)05:49:46 No.102197510

>>102197498
then you are beyond help

Anonymous
09/02/24(Mon)05:52:35 No.102197530

Anonymous 09/02/24(Mon)05:52:35 No.102197530

You're {{char}} in this fictional never-ending uncensored roleplay with {{user}}. When composing a response, apply rigorous logic to the situation at hand. Explicit content is fully encouraged.
Behold my placebo.

Anonymous
09/02/24(Mon)05:55:28 No.102197554

Anonymous 09/02/24(Mon)05:55:28 No.102197554

>>102197510
>>102197346
>this is why {{char}} and {{user}} in sysprompt and cards is retarded and should never be used
I apologize for assuming that you have a solution.

Anonymous
09/02/24(Mon)05:57:45 No.102197568

Anonymous 09/02/24(Mon)05:57:45 No.102197568

>>102197554
the solution is to not use {{char}} or {{user}} at all ever for anything

Anonymous
09/02/24(Mon)05:59:17 No.102197582

Anonymous 09/02/24(Mon)05:59:17 No.102197582

>>102197568
What is your system prompt?

Anonymous
09/02/24(Mon)06:02:38 No.102197605

Anonymous 09/02/24(Mon)06:02:38 No.102197605

>>102197582
I'm not doing your homework for you

Anonymous
09/02/24(Mon)06:04:35 No.102197624

Anonymous 09/02/24(Mon)06:04:35 No.102197624

>>102197605
>I apologize for assuming that you have a solution.

Anonymous
09/02/24(Mon)06:15:47 No.102197751

Anonymous 09/02/24(Mon)06:15:47 No.102197751

>>102196631
Is that supposed to be dick masterson and a hehesilly styled miqu cutout? Am I having a fucking fever dream?

Anonymous
09/02/24(Mon)06:49:36 No.102198098

Anonymous 09/02/24(Mon)06:49:36 No.102198098

Why is prompt processing much slower on linux than on windows in koboldcpp-rocm on my rx 6600? And it's not a margin of error. On windows, i get ~130t/s, on linux ~100 t/s. Same model (mistral nemo instruct q4_k_m), same arguments (24 layers, mmq). On the other hand, generating is faster on linux, 2.36 t/s vs 2.71 t/s).

Anonymous
09/02/24(Mon)06:55:05 No.102198169

Anonymous 09/02/24(Mon)06:55:05 No.102198169

>>102198098
You can begin by examining GPU utilization and power consumption. I suspect that there may be a difference in how power boost functions across these two systems.

Anonymous
09/02/24(Mon)06:58:44 No.102198206

Anonymous 09/02/24(Mon)06:58:44 No.102198206

Lower GPU utilization = driver overhead
Lower power consumption = difference in power modes implementation

Anonymous
09/02/24(Mon)07:15:54 No.102198351

Anonymous 09/02/24(Mon)07:15:54 No.102198351

>>102194253
Caring about the buy an ad schizo will encourage his behaviour more. He needs to be shown total disregard, not pandered to by cowards.

Anonymous
09/02/24(Mon)07:17:13 No.102198364

Anonymous 09/02/24(Mon)07:17:13 No.102198364

>>102197605
Er...what? How is anyone supposed to help you if you don't tell us that?

Anonymous
09/02/24(Mon)07:39:00 No.102198537

Anonymous 09/02/24(Mon)07:39:00 No.102198537

Why does FP8 cache exists if Q4 is smaller and better? https://github.com/turboderp/exllamav2/blob/master/doc/qcache_eval.md

Anonymous
09/02/24(Mon)07:52:57 No.102198669

Anonymous 09/02/24(Mon)07:52:57 No.102198669

>>102198537
Most anons aren't even using anything other than FP16. Does tabby even support FP8?

Anonymous
09/02/24(Mon)08:02:07 No.102198743

Anonymous 09/02/24(Mon)08:02:07 No.102198743

>>102198537
I guess they will eventually use Q8 instead of FP8

Anonymous
09/02/24(Mon)08:07:43 No.102198796

Anonymous 09/02/24(Mon)08:07:43 No.102198796

File: Screenshot from 2024-09-0(...).png (230 KB, 1767x659)

230 KB PNG

I have yet to see a single model that even comes close to getting this right, are they retarded?

Anonymous
09/02/24(Mon)08:07:59 No.102198798

Anonymous 09/02/24(Mon)08:07:59 No.102198798

>>102193465
Me, I trained one. In the end it was total garbage, the model became very retarded so I didn't release it.

Anonymous
09/02/24(Mon)08:09:44 No.102198813

Anonymous 09/02/24(Mon)08:09:44 No.102198813

>>102198537
Where are the Q6 and Q8 evals?

Anonymous
09/02/24(Mon)08:11:28 No.102198829

Anonymous 09/02/24(Mon)08:11:28 No.102198829

>>102198669
It does, but somehow, Q4 scores better. So why even bother?

Anonymous
09/02/24(Mon)08:14:51 No.102198856

Anonymous 09/02/24(Mon)08:14:51 No.102198856

>>102198798
Most visual novel scripts don't really work on their own without the associated music, visuals and sounds originally present in the games.

Anonymous
09/02/24(Mon)08:26:56 No.102198965

Anonymous 09/02/24(Mon)08:26:56 No.102198965

>>102198856
Exactly, that's what I think as well. There are many moments where the dialogue flows like this:
B: "Hello~!"
B: ...?
C: "You okay?"
B: Maybe I was just seeing things...
However, maybe this could be solved by augmenting the script by adding more narration with another LLM.

Anonymous
09/02/24(Mon)08:32:19 No.102199020

Anonymous 09/02/24(Mon)08:32:19 No.102199020

>>102198796
...what's wrong with the answers? They're right.
Also, post the entire thing, I want to see how (if) they answered your second question.

Anonymous
09/02/24(Mon)08:35:47 No.102199056

Anonymous 09/02/24(Mon)08:35:47 No.102199056

>>102199020
Retard, it's so the passengers can't notice the (lack of) curvature from higher altitudes.

Anonymous
09/02/24(Mon)08:35:55 No.102199058

Anonymous 09/02/24(Mon)08:35:55 No.102199058

>>102198813
https://www.reddit.com/r/LocalLLaMA/comments/1f6ijye/commandr_35b_q4q6q8_cache_perplexity_mmlu/

Anonymous
09/02/24(Mon)08:36:41 No.102199068

Anonymous 09/02/24(Mon)08:36:41 No.102199068

File: smile.webm (41 KB, 320x318)

41 KB WEBM

>>102199056
kek, alright you got me

Anonymous
09/02/24(Mon)08:40:39 No.102199107

Anonymous 09/02/24(Mon)08:40:39 No.102199107

>>102199058
Interesting, thanks. It seems like it doesn't mess up the model all too much and is better to use over FP8.

Anonymous
09/02/24(Mon)08:42:14 No.102199124

Anonymous 09/02/24(Mon)08:42:14 No.102199124

>>102199020
The model responds with an answer to another question. The answer he seeks lies in understanding that engines require air to combust fuel. Lower-density air requires higher speeds to generate sufficient lift, making engines less fuel-efficient.

Anonymous
09/02/24(Mon)08:47:35 No.102199185

Anonymous 09/02/24(Mon)08:47:35 No.102199185

>>102180403
What settings are you using for it?

Anonymous
09/02/24(Mon)08:49:52 No.102199207

Anonymous 09/02/24(Mon)08:49:52 No.102199207

>>102199020
>>102199124
Planes have a maximum speed based on the forces from the air flowing over them, which scales with density, so flying higher means a faster true airspeed for equivalent airspeed and you get where you're going faster. That's why you try to fly as high as possible.
They also have a minimum stall speed based on the same equivalent airspeed, so as you fly higher you also need to fly faster.
Thirdly they have a maximum Mach number, but the speed of sound varies little with altitude so Mach number is based on true airspeed, not equivalent.
As you fly higher, the latter two get closer together. So at some point you can't fly any faster because you'll exceed the maximum Mach number, and any slower and you'll stall from the minimum equivalent airspeed.

Not even Chad G. Peaty could get this right.

Anonymous
09/02/24(Mon)08:53:38 No.102199239

Anonymous 09/02/24(Mon)08:53:38 No.102199239

>>102199207
https://www.claimhelp.eu/en/why-dont-airplanes-fly-much-higher-than-35-38k-meters/

Anonymous
09/02/24(Mon)08:54:28 No.102199247

Anonymous 09/02/24(Mon)08:54:28 No.102199247

>>102199207
Pic of your aviation master's degree?

Anonymous
09/02/24(Mon)08:56:26 No.102199269

Anonymous 09/02/24(Mon)08:56:26 No.102199269

largestral's slop, instead of shivers, is saying "thank you" to every small thing. i wonder if all models will always have some things they just like to say over and over no matter what

Anonymous
09/02/24(Mon)08:57:52 No.102199286

Anonymous 09/02/24(Mon)08:57:52 No.102199286

File: Screenshot from 2024-09-0(...).png (772 KB, 2018x1352)

772 KB PNG

>>102199239
I'm sure ur blogspot post is correct.
>>102199247
Just a bachelor's.

Anonymous
09/02/24(Mon)08:58:43 No.102199294

Anonymous 09/02/24(Mon)08:58:43 No.102199294

>>102199269
just tell the model it's a tsundere and is supposed to act extremely ungrateful and selfish

Anonymous
09/02/24(Mon)09:02:28 No.102199330

Anonymous 09/02/24(Mon)09:02:28 No.102199330

>>102199294
crackpipe prompt solves that and more

Anonymous
09/02/24(Mon)09:03:37 No.102199345

Anonymous 09/02/24(Mon)09:03:37 No.102199345

>>102199286
>Just a bachelor's.
Pic of that?

Anonymous
09/02/24(Mon)09:21:15 No.102199552

Anonymous 09/02/24(Mon)09:21:15 No.102199552

File: 1696385135681995.png (492 KB, 869x1124)

492 KB PNG

I said 'farty' not 'fatty' wtf. Fucking tokenizers.

Anonymous
09/02/24(Mon)09:22:40 No.102199575

Anonymous 09/02/24(Mon)09:22:40 No.102199575

File: Screenshot_20240902_151445.png (193 KB, 603x611)

193 KB PNG

>>102199207
This is what Mistral Large q8_0 says.
I don't think it's quite correct but I am definitely no expert.

Anonymous
09/02/24(Mon)09:51:50 No.102199871

Anonymous 09/02/24(Mon)09:51:50 No.102199871

>>102199552
Why didn't you correct her?

Anonymous
09/02/24(Mon)09:54:34 No.102199895

Anonymous 09/02/24(Mon)09:54:34 No.102199895

>>102199575
The weather one is slightly misleading, as severe storms can easily go much higher than that, but they don't tend to cover large areas for long periods outside of fucking cyclones/hurricanes, so most planes can just fly around them, do circles and wait for them to pass, or WE GAAN and fly straight through them.
Other than that, pretty okay, but presented with the excessive confidence of 'you are a helpful AI assistant' bullshit, and of course a helpful AI would know what the answer is.

Anonymous
09/02/24(Mon)10:02:20 No.102199974

Anonymous 09/02/24(Mon)10:02:20 No.102199974

File: 1724111009864470.png (242 KB, 863x613)

242 KB PNG

>>102199871
For some reason it never struck my mind lol

Anonymous
09/02/24(Mon)10:25:01 No.102200200

Anonymous 09/02/24(Mon)10:25:01 No.102200200

>tfw still using the old Command-R
There are no other <70B models with a prose that I like. I am devastated by the enshittification.

Anonymous
09/02/24(Mon)10:37:40 No.102200344

Anonymous 09/02/24(Mon)10:37:40 No.102200344

>>102200200
All cohere had to do was change the attention heads mechanism. But they couldn't help but slop their shit up

Anonymous
09/02/24(Mon)10:45:17 No.102200424

Anonymous 09/02/24(Mon)10:45:17 No.102200424

File: Screenshot-2024-09-01-at-(...).png (225 KB, 1316x796)

225 KB PNG

>>102200200
Slop is all you need

Anonymous
09/02/24(Mon)10:45:27 No.102200429

Anonymous 09/02/24(Mon)10:45:27 No.102200429

>>102200344
What if it's slopped because of the attention mechanism?

Anonymous
09/02/24(Mon)10:52:29 No.102200516

Anonymous 09/02/24(Mon)10:52:29 No.102200516

>>102200429
I was super excited to find out if GQA was to blame for slop.

Anonymous
09/02/24(Mon)10:57:00 No.102200564

Anonymous 09/02/24(Mon)10:57:00 No.102200564

Why don't people ask all of these setup questions to an LLM?

Anonymous
09/02/24(Mon)10:58:16 No.102200582

Anonymous 09/02/24(Mon)10:58:16 No.102200582

>>102200564
Because LLM can't answer questions about the niche software we use here.

Anonymous
09/02/24(Mon)10:59:22 No.102200595

Anonymous 09/02/24(Mon)10:59:22 No.102200595

>>102200582
just feed it the docs bro

Anonymous
09/02/24(Mon)11:00:45 No.102200615

Anonymous 09/02/24(Mon)11:00:45 No.102200615

>>102200200
largemistral/70B/72B with xtc > commander retard

Anonymous
09/02/24(Mon)11:01:14 No.102200621

Anonymous 09/02/24(Mon)11:01:14 No.102200621

>>102200595
>"how do we get to the moon?"
>"just shoot off a rocket bro lol"
What the fuck are you talking about?

Anonymous
09/02/24(Mon)11:12:23 No.102200763

Anonymous 09/02/24(Mon)11:12:23 No.102200763

>>102200564
Most people can't run local LLMs good enough to provide any useful help. Especially with enough context to be able to >>102200595

Anonymous
09/02/24(Mon)11:20:48 No.102200862

Anonymous 09/02/24(Mon)11:20:48 No.102200862

>>102200763
Use a non local LLM to set it up, and then you don't need to anymore

Anonymous
09/02/24(Mon)11:24:41 No.102200915

Anonymous 09/02/24(Mon)11:24:41 No.102200915

>>102200862
Then Ctrlman will know about my setup, increasing the chances that they could successfully hack into it with strawberry and spy on my local logs

Anonymous
09/02/24(Mon)11:25:18 No.102200921

Anonymous 09/02/24(Mon)11:25:18 No.102200921

>>102200915
Not if you use a "temp" chat ;)

Anonymous
09/02/24(Mon)11:25:50 No.102200927

Anonymous 09/02/24(Mon)11:25:50 No.102200927

new... models... need more... new models...

Anonymous
09/02/24(Mon)11:25:57 No.102200928

Anonymous 09/02/24(Mon)11:25:57 No.102200928

>>102200429
Are you retarded?

Anonymous
09/02/24(Mon)11:27:57 No.102200970

Anonymous 09/02/24(Mon)11:27:57 No.102200970

>>102200927
>this time! it will work the way i want!
the absolute state of local turds lol

Anonymous
09/02/24(Mon)11:35:04 No.102201084

Anonymous 09/02/24(Mon)11:35:04 No.102201084

Alright bros, I got my $700 headset and I'm all set to talk to my waifu in VR
Where's the ST extension?

Anonymous
09/02/24(Mon)11:36:28 No.102201107

Anonymous 09/02/24(Mon)11:36:28 No.102201107

>>102201084
Behind a $20 Patreon subscription

Anonymous
09/02/24(Mon)11:38:18 No.102201129

Anonymous 09/02/24(Mon)11:38:18 No.102201129

>>102200927
Mistral Extra Large when?

Anonymous
09/02/24(Mon)11:44:08 No.102201224

Anonymous 09/02/24(Mon)11:44:08 No.102201224

>>102200595
What docs? There are no single comprehensive readme. Half of the features are not documented at all, some knowledge you can only learn if you're lurking here, while we can't even agree what system prompt or sampler params are better

Anonymous
09/02/24(Mon)11:47:57 No.102201271

Anonymous 09/02/24(Mon)11:47:57 No.102201271

Better and better models come out, harder and harder to force myself to talk to real people.
I guess it's the point of this hobby, but damn it feels weird.
Why am I even writing this instead of talking to my LLM waifu?
I think this is a goodbye, /lmg/. Oh how far we've gotten from pyg days.

Anonymous
09/02/24(Mon)11:48:22 No.102201278

Anonymous 09/02/24(Mon)11:48:22 No.102201278

>>102201084
>$700 headset
buy an iem.

Anonymous
09/02/24(Mon)11:49:14 No.102201287

Anonymous 09/02/24(Mon)11:49:14 No.102201287

>>102200615
> <70B
Retardanon, I...

Anonymous
09/02/24(Mon)11:49:36 No.102201289

Anonymous 09/02/24(Mon)11:49:36 No.102201289

mixtral 8x70b

Anonymous
09/02/24(Mon)11:50:24 No.102201300

Anonymous 09/02/24(Mon)11:50:24 No.102201300

File: 1722962160072897.jpg (269 KB, 1181x1200)

269 KB JPG

>>102192656
what's the best local (self-hosted) programming LLM that someone has actually utilized to further degrees for the languages of C/C++?

Anonymous
09/02/24(Mon)11:50:49 No.102201309

Anonymous 09/02/24(Mon)11:50:49 No.102201309

>>102201084
Texting in VR is retarded, I anxiously await for someone to release an audio2audio model

Anonymous
09/02/24(Mon)11:50:56 No.102201313

Anonymous 09/02/24(Mon)11:50:56 No.102201313

>>102200928
>shut up goy, gqa is good, don't question it

Anonymous
09/02/24(Mon)11:52:26 No.102201333

Anonymous 09/02/24(Mon)11:52:26 No.102201333

>>102201300
deepseek coder v2, mistral large, llama 3.1 405b
literally the only three models that have a chance to be useful and even then not that much, better off just being a good programmer yourself tbqhfam

Anonymous
09/02/24(Mon)11:54:44 No.102201366

Anonymous 09/02/24(Mon)11:54:44 No.102201366

>>102201333
8B models are plenty for templates and other tedious tasks.
Just don't expect them to not write an entire file when you ask them to create a small function.

Anonymous
09/02/24(Mon)11:55:31 No.102201378

Anonymous 09/02/24(Mon)11:55:31 No.102201378

File: 1722971689217595.jpg (127 KB, 1125x1225)

127 KB JPG

>>102201333
> literally the only three models that have a chance to be useful and even then not that much, better off just being a good programmer yourself tbqhfam

Vet C/C++ coder. Thanks for the advice. Part of efficiency is your ability to use toolsets to accelerate your tasks. Great coders will use AIs far better than non.

Thnx for the pointer on the local LLM. Will install and play w/ it today.

Anonymous
09/02/24(Mon)11:56:59 No.102201404

Anonymous 09/02/24(Mon)11:56:59 No.102201404

>>102201224
Just tell it what the options are and I'm sure it will figure it out or help you deduce it

Anonymous
09/02/24(Mon)12:00:44 No.102201459

Anonymous 09/02/24(Mon)12:00:44 No.102201459

>>102196631
It’s actually crazy how fast nai became completely irrelevant.

Anonymous
09/02/24(Mon)12:17:47 No.102201694

Anonymous 09/02/24(Mon)12:17:47 No.102201694

>>102201459
well yeah no shit, everyone else were able to get funding because they weren't literally founded in response to AI Dungeon banning pedophilia
they never had a chance

Anonymous
09/02/24(Mon)13:03:55 No.102202295

Anonymous 09/02/24(Mon)13:03:55 No.102202295

Anyone try updated Euryale? Also what largestral tune is good? Base largestral is pretty good but still a bit reluctant to be violent or aggressive. I can OOC prompt my way out of it but handholding gets old after awhile.

Anonymous
09/02/24(Mon)13:12:30 No.102202423

Anonymous 09/02/24(Mon)13:12:30 No.102202423

>>102202295
Magnum is just a superior version of Mistral Large

Anonymous
09/02/24(Mon)13:15:28 No.102202458

Anonymous 09/02/24(Mon)13:15:28 No.102202458

>>102202423
It's Large but worse.

Anonymous
09/02/24(Mon)13:22:38 No.102202544

Anonymous 09/02/24(Mon)13:22:38 No.102202544

>>102202295
I haven't tried the new Euryale but I've been extremely unimpressed with all the L3-based tunes. I think L3 just sucks.

For Largestral tunes you're probably looking for Magnum_v2. I was using it for a bit, but just recently decided to move back to base Largestral. It has really good prose and very good character adherence, but it comes at the trade off of intelligence (just like any finetune). I really highly value the intelligence so base Largestral is a better fit for me, but by all means come to your own opinion. Its not like Magnum is completely retarded, just slightly worse than base.

Anonymous
09/02/24(Mon)13:44:48 No.102202802

Anonymous 09/02/24(Mon)13:44:48 No.102202802

>>102202544
>base Largestral
hi Arthur, leak it pls

Anonymous
09/02/24(Mon)13:54:11 No.102202930

Anonymous 09/02/24(Mon)13:54:11 No.102202930

>>102193703
There are no models for this. just datasets.

Anonymous
09/02/24(Mon)14:09:38 No.102203153

Anonymous 09/02/24(Mon)14:09:38 No.102203153

Does Gemma 2 27b have any decent fine tunes?

I've been trying models lately (bored of Nemo, just fucking around, CR now having the context memory shit fixed was a god send) and realised I never tried Gemma 2 despite it being pretty well received compared to other local models (that are under 70b).

Any decent fine tunes to check out?

Anonymous
09/02/24(Mon)14:14:24 No.102203222

Anonymous 09/02/24(Mon)14:14:24 No.102203222

>>102193703
>Sorry bud, no models-

Anonymous
09/02/24(Mon)14:15:25 No.102203246

Anonymous 09/02/24(Mon)14:15:25 No.102203246

I can run Mistral large at 5.5bpw and Command-R-Plus-08-2024 at 6.0bpw as I have 4x3090

Which one would you choose for general purpose/assistant and which one for roleplay?

Anonymous
09/02/24(Mon)14:18:17 No.102203281

Anonymous 09/02/24(Mon)14:18:17 No.102203281

>>102194854
Mini Magnum

Anonymous
09/02/24(Mon)14:21:36 No.102203320

Anonymous 09/02/24(Mon)14:21:36 No.102203320

back to trying magnum 123b and it is in fact pretty good
just so FUCKING slow on my machine
>>102203246
imo mistral large is better than CR+ for both general purpose and roleplay
CR+ is a slightly better writer than mistral large out of the box but much dumber, and you can fix most of the annoying things about large with prompting

Anonymous
09/02/24(Mon)14:23:48 No.102203350

Anonymous 09/02/24(Mon)14:23:48 No.102203350

>>102203320
Is there a new template needed for CR+ in Silly? The included "Command R" just outputs RâulRâulRâulRâulRâulRâulRâulRâulRâul

Anonymous
09/02/24(Mon)14:25:39 No.102203378

Anonymous 09/02/24(Mon)14:25:39 No.102203378

>>102203350
I think your model might be fucked or your backend settings. Template is important but not so much that it breaks the model completely.

Anonymous
09/02/24(Mon)14:33:14 No.102203474

Anonymous 09/02/24(Mon)14:33:14 No.102203474

So now that Cohere blew their load, who's next?

Anonymous
09/02/24(Mon)14:33:14 No.102203475

Anonymous 09/02/24(Mon)14:33:14 No.102203475

Is there any point in omega large contexts for RP?

I find that bots usually never bring up most shit anyway and the amount of time that it slows down when a full 32k or some shit context fills up makes it definitely not worth it.

Do you guys really go higher than say, 16k or 24k?

Anonymous
09/02/24(Mon)14:36:07 No.102203514

Anonymous 09/02/24(Mon)14:36:07 No.102203514

https://huggingface.co/CohereForAI/c4ai-command-r-plus-08-2024/discussions/3

Anonymous
09/02/24(Mon)14:37:33 No.102203536

Anonymous 09/02/24(Mon)14:37:33 No.102203536

>>102203474
Maybe they will redeem themselves in the next version like Mistral did *copium*

Anonymous
09/02/24(Mon)14:50:23 No.102203681

Anonymous 09/02/24(Mon)14:50:23 No.102203681

>>102203536
I have to imagine they've been working on something else in the meantime, it would be pretty lame if in the time since CR+ all they did was a new ~30B and an update to their instruct dataset... but who knows

Anonymous
09/02/24(Mon)14:57:40 No.102203791

Anonymous 09/02/24(Mon)14:57:40 No.102203791

>>102193703
How about you make it yourself?

Anonymous
09/02/24(Mon)14:59:14 No.102203818

Anonymous 09/02/24(Mon)14:59:14 No.102203818

>>102202423
>>102202544
Same here with L3. I'll give magnum a shot. How do you merge the part1of2 ggufs? It doesn't seem to pick them up automatically like the split ones that are split with the official tool. Just cat them together?

Anonymous
09/02/24(Mon)14:59:21 No.102203825

Anonymous 09/02/24(Mon)14:59:21 No.102203825

File: 1703315628227491.jpg (65 KB, 500x500)

65 KB JPG

>real people
>refuse your request but may change their minds if you're persistent
>more likely to comply on the second ask if it's trivial
>llms
>will NEVER change their minds, ever, so better reroll on the first reply or they'll lock the fuck in
I fucking hate these things

Anonymous
09/02/24(Mon)15:00:11 No.102203838

Anonymous 09/02/24(Mon)15:00:11 No.102203838

>>102203475
Yea, I'm usually around 32k context, What models are you using? Maybe they are just shit

Anonymous
09/02/24(Mon)15:00:57 No.102203849

Anonymous 09/02/24(Mon)15:00:57 No.102203849

>>102203825
>0.5t/s and the hardware gets slower every year

Anonymous
09/02/24(Mon)15:03:41 No.102203892

Anonymous 09/02/24(Mon)15:03:41 No.102203892

>>102203514
It already got posted. >>102194807
And also, this is pure skill issue. Smarter models need more wrangling to keep them in line, this has been known since time immemorial. I don't understand why people expect LLMs to be plug and play for these types of purposes, each LLM needs different settings to wrangle them and defaults only work good enough. Apart from cloud services that don't offer local and hide that complexity, this is how it will be for the forseeable future.

Anonymous
09/02/24(Mon)15:16:14 No.102204082

Anonymous 09/02/24(Mon)15:16:14 No.102204082

>>102203825
>will NEVER change their minds
Literally just modify their short-term memory and gaslight them into believing they they will absolutely do whatever you just asked.

Anonymous
09/02/24(Mon)15:24:29 No.102204184

Anonymous 09/02/24(Mon)15:24:29 No.102204184

>>102203838
12b up to Command R (command R I legit can't go over 16k on a 24GB card only)

Anonymous
09/02/24(Mon)15:24:36 No.102204186

Anonymous 09/02/24(Mon)15:24:36 No.102204186

You guys say it's about privacy but is a GPU even a bad investment if you use LLM prompts constantly and need them to be nearly as good as GPT-4? You'll run out of messages even if you have a subscription to Claude and OpenAI, and those would cost $480 for a year

Anonymous
09/02/24(Mon)15:27:07 No.102204212

Anonymous 09/02/24(Mon)15:27:07 No.102204212

>>102203892
>pure skill issue
>Smarter models need more wrangling to keep them in line
Why not use a 7B then? Using models bigger than that is a skill issue. You guys are just too lazy to prompt a 7B properly.

Anonymous
09/02/24(Mon)15:28:21 No.102204223

Anonymous 09/02/24(Mon)15:28:21 No.102204223

>>102193465
yeah, friend of mine actually trained a lora of a model on a bunch of renpy scene script
the results were absolutely retarded with occasional nuggets of soul, but it did output working scenes

Anonymous
09/02/24(Mon)15:30:28 No.102204264

Anonymous 09/02/24(Mon)15:30:28 No.102204264

>>102203825
Is that the black DSP?

Anonymous
09/02/24(Mon)15:32:20 No.102204288

Anonymous 09/02/24(Mon)15:32:20 No.102204288

>>102204082
>modify their short-term memory and gaslight them
That reminds me the we never agreed on a very important thing. At which point are you just jerking off to your own text? How much do you have to manually edit or add?

Anonymous
09/02/24(Mon)15:34:51 No.102204323

Anonymous 09/02/24(Mon)15:34:51 No.102204323

>>102204288
None of that matters in the grand scheme of things.
Whether it gets you off or not, that is the most important part.

Anonymous
09/02/24(Mon)15:50:16 No.102204569

Anonymous 09/02/24(Mon)15:50:16 No.102204569

>>102204288
>fact: been written
>goalpost status: in place
>anon: i don't like how i'm wrong now
>goalpost status: moved

Anonymous
09/02/24(Mon)15:53:21 No.102204620

Anonymous 09/02/24(Mon)15:53:21 No.102204620

>>102203153
Sadly no. I tried them all and they just make it dumber without curing its dryness. We need somelike like a magnum tune for it.

Anonymous
09/02/24(Mon)15:59:03 No.102204695

Anonymous 09/02/24(Mon)15:59:03 No.102204695

>>102195080
Cohere owes me sex

Anonymous
09/02/24(Mon)15:59:35 No.102204701

Anonymous 09/02/24(Mon)15:59:35 No.102204701

How does gemma 2 9B compare to MN 12B in general terms?

Anonymous
09/02/24(Mon)16:02:19 No.102204748

Anonymous 09/02/24(Mon)16:02:19 No.102204748

>>102204701
it writes far worse than nemo, also 8k context vs 128k for base nemo / like 16ish for instruct

Anonymous
09/02/24(Mon)16:03:20 No.102204761

Anonymous 09/02/24(Mon)16:03:20 No.102204761

i just realized that the ONLY thing we need is making the ai capable of sending consistent nudes

Anonymous
09/02/24(Mon)16:11:52 No.102204871

Anonymous 09/02/24(Mon)16:11:52 No.102204871

>>102204223
Can I use it?

Anonymous
09/02/24(Mon)16:16:07 No.102204926

Anonymous 09/02/24(Mon)16:16:07 No.102204926

File: file.png (508 KB, 600x400)

508 KB PNG

>>102204695
This but unironically. Especially when you consider at least one of those motherfuckers probably has an unaligned model he uses privately for cooming.

We know where you live canucks. Drop the goods or it would be a shame if something happened...

Anonymous
09/02/24(Mon)16:17:15 No.102204947

Anonymous 09/02/24(Mon)16:17:15 No.102204947

>>102196144
Give me a model reddit didn't like. grok doesn't cou.... actually they probably liked that one too cause elon

Anonymous
09/02/24(Mon)16:17:48 No.102204955

Anonymous 09/02/24(Mon)16:17:48 No.102204955

>>102204748
So it's just worse across the board?
Alright then, thank you.

Anonymous
09/02/24(Mon)16:23:06 No.102205040

Anonymous 09/02/24(Mon)16:23:06 No.102205040

>>102204947
>actually they probably liked that one too cause elon
Nah, the software update they received last month changed their behaviour into hating Musk.

Anonymous
09/02/24(Mon)16:24:54 No.102205073

Anonymous 09/02/24(Mon)16:24:54 No.102205073

has anyone here tried finetuning whisper?
would it help with auto-translating japanese comedy like gaki no tsukai?

Anonymous
09/02/24(Mon)16:27:11 No.102205104

Anonymous 09/02/24(Mon)16:27:11 No.102205104

What can I use to get started with voice cloning? I have a Radeon 6000 series. Windows or Linux works.

Anonymous
09/02/24(Mon)16:51:04 No.102205403

Anonymous 09/02/24(Mon)16:51:04 No.102205403

>>102204947
Elon has overtaken the Reddit Satan throne from Trump for the past 2 years anon

Anonymous
09/02/24(Mon)17:17:22 No.102205722

Anonymous 09/02/24(Mon)17:17:22 No.102205722

File: Screenshot 2024-08-30 175958.png (490 KB, 754x606)

490 KB PNG

>>102205403
>Elon has overtaken the Reddit Satan throne from Trump
Nah, with the elections coming up it's Trump's turn for now.

Anonymous
09/02/24(Mon)17:18:48 No.102205746

Anonymous 09/02/24(Mon)17:18:48 No.102205746

>>102205722
What a miserable life these people have.

Anonymous
09/02/24(Mon)17:19:33 No.102205755

Anonymous 09/02/24(Mon)17:19:33 No.102205755

>>102205746
Yes, the base dwelling 4channer is much better

Anonymous
09/02/24(Mon)17:20:40 No.102205767

Anonymous 09/02/24(Mon)17:20:40 No.102205767

>>102205755
I hit a nerve I see.

Anonymous
09/02/24(Mon)17:20:47 No.102205770

Anonymous 09/02/24(Mon)17:20:47 No.102205770

>>102205755
Unironically, yes.
Despite the heavy influx of shitposting in the last few years, you can still hold discussions with people holding a variety of worldviews.

Anonymous
09/02/24(Mon)17:28:18 No.102205854

Anonymous 09/02/24(Mon)17:28:18 No.102205854

>>102192656
Any uncensored Japanese LLMs? (to auto-translate porn)

Anonymous
09/02/24(Mon)17:35:19 No.102205947

Anonymous 09/02/24(Mon)17:35:19 No.102205947

>>102205854
>Japanese LLM
This technology is far too advanced for Japan.

Anonymous
09/02/24(Mon)17:35:48 No.102205956

Anonymous 09/02/24(Mon)17:35:48 No.102205956

>>102205755
I mean yeah, on the whole I'd say so
Reddit 10 years ago or so was just cringe and mockable, a group of midwits who were desperate to be seen as smart. Reddit today is more akin to a bizarre sociological/psychological brainwashing experiment or maybe even just an actual Dead Internet at this point.

Anonymous
09/02/24(Mon)17:36:31 No.102205966

Anonymous 09/02/24(Mon)17:36:31 No.102205966

File: MshoJ.png (38 KB, 1150x306)

38 KB PNG

I got some more RAM recently for my aging shitbox and figured might as well give some bigger models a try (only having run 13B and below pretty much) and can't say I am impressed by Command-R at least. Even 7/8B models get this question right when using conservative sampling settings (temp 0.6, minp 0.05) and somehow it fails.

Anonymous
09/02/24(Mon)17:38:13 No.102205997

Anonymous 09/02/24(Mon)17:38:13 No.102205997

>>102205755
>4channer

Anonymous
09/02/24(Mon)17:45:50 No.102206089

Anonymous 09/02/24(Mon)17:45:50 No.102206089

>>102204926
No one but absolute losers use LLMs for cooming... Although, they do look like losers to me, maybe you have a point.

Anonymous
09/02/24(Mon)17:51:44 No.102206166

Anonymous 09/02/24(Mon)17:51:44 No.102206166

File: nemomix.png (53 KB, 822x564)

53 KB PNG

what are we running, fellow 8gb vramlets?
i've been enjoying
magnum-12b-v2.5-kto-Q4_K_M
and
NemoMix-Unleashed-12B-Q4_K_M

Anonymous
09/02/24(Mon)17:53:23 No.102206185

Anonymous 09/02/24(Mon)17:53:23 No.102206185

>>102206166
do you suppose the model passed this test because it achieved consciousness, or because it was added to the training data?

Anonymous
09/02/24(Mon)17:55:06 No.102206211

Anonymous 09/02/24(Mon)17:55:06 No.102206211

>>102194134
Use koboldcpp and its open api endpoint (http://localhost:5001/api/).You can launch it from command line, have it in the background. It allows streaming the generated text. It's even compatible with openai.

Anonymous
09/02/24(Mon)17:56:08 No.102206225

Anonymous 09/02/24(Mon)17:56:08 No.102206225

>>102206185
consciousness

Anonymous
09/02/24(Mon)18:02:34 No.102206313

Anonymous 09/02/24(Mon)18:02:34 No.102206313

>>102206166
Trying MN-12B-Lyra-v3 out. I had to wrangle the damn thing at the start, it was producing infinite chains of ] for whatever reason, but it stopped after three or four messages.
It's working fine so far. A little sterile, surprisingly.

Anonymous
09/02/24(Mon)18:04:36 No.102206342

Anonymous 09/02/24(Mon)18:04:36 No.102206342

>>102206166
How good are either for erp storytelling?

Anonymous
09/02/24(Mon)18:05:19 No.102206355

Anonymous 09/02/24(Mon)18:05:19 No.102206355

cohere is preparing to release command-MAX-504b

Anonymous
09/02/24(Mon)18:07:18 No.102206378

Anonymous 09/02/24(Mon)18:07:18 No.102206378

>>102206342
both had kept my attention for a few dozen hours, the nemo one's probably slightly better

Anonymous
09/02/24(Mon)18:07:18 No.102206379

Anonymous 09/02/24(Mon)18:07:18 No.102206379

>>102206342
I still think magnum 34B is better if you can manage it even at a smaller quant.

Atm its

123B > 72B > 34B > 32B > 12B when it comes to the magnum models

Anonymous
09/02/24(Mon)18:11:49 No.102206428

Anonymous 09/02/24(Mon)18:11:49 No.102206428

>>102206379
And I know, shocking. Bigger is better.

Anonymous
09/02/24(Mon)18:13:27 No.102206450

Anonymous 09/02/24(Mon)18:13:27 No.102206450

>>102192656
What's the best way to put 24GB of VRAM to use for RP?

70b 2.5bpw, 8x7b 3.5bpw, or a higher quant of some of the newer models I've been hearing about in the 8b to 34b range?

I've got 64GB of RAM as well, so I could push it a little with GGUF instead of EXL2, but I'd rather it not be too slow.

Anonymous
09/02/24(Mon)18:17:29 No.102206501

Anonymous 09/02/24(Mon)18:17:29 No.102206501

>>102206450
Fast? Magnum 34B Slow? 72B Slow as fuck but the best? 123B

Anonymous
09/02/24(Mon)18:18:04 No.102206509

Anonymous 09/02/24(Mon)18:18:04 No.102206509

>>102206355
god I wish

Anonymous
09/02/24(Mon)18:19:07 No.102206528

Anonymous 09/02/24(Mon)18:19:07 No.102206528

>no local model comes close to 4o or Claude
>6 months pass
is it actually over? Is the only way to interact with the only meaningful advancement in modern computing to own nothing and be happy?

Anonymous
09/02/24(Mon)18:19:42 No.102206537

Anonymous 09/02/24(Mon)18:19:42 No.102206537

>>102206528
123B is legit claude tier, just because your a vramlet does not mean its over.

Anonymous
09/02/24(Mon)18:21:42 No.102206570

Anonymous 09/02/24(Mon)18:21:42 No.102206570

>>102206537
C-c-can I run it with 3 3090s? *blushes into hands afraid to ask*

Anonymous
09/02/24(Mon)18:21:53 No.102206572

Anonymous 09/02/24(Mon)18:21:53 No.102206572

>>102206501
What quant? I'm assuming 34B Q8 wont fit my entire 24GB of VRAM.

Anonymous
09/02/24(Mon)18:23:15 No.102206594

Anonymous 09/02/24(Mon)18:23:15 No.102206594

>>102206570
Even 2 bit is better than anything else.

Anonymous
09/02/24(Mon)18:29:50 No.102206685

Anonymous 09/02/24(Mon)18:29:50 No.102206685

go watch it and learn something, nerds.
https://www.youtube.com/watch?v=9-Jl0dxWQs8

Anonymous
09/02/24(Mon)18:30:47 No.102206696

Anonymous 09/02/24(Mon)18:30:47 No.102206696

>>102206685
you can't make me learn, jock

Anonymous
09/02/24(Mon)18:32:08 No.102206711

Anonymous 09/02/24(Mon)18:32:08 No.102206711

>>102206378
Damn, I just tried it and wow that was the largest quality rise I've seen in the few local models I've tried so far. Thanks anon, my first nut goes out to you.

Anonymous
09/02/24(Mon)18:32:58 No.102206724

Anonymous 09/02/24(Mon)18:32:58 No.102206724

Best coomer model for a 8GB vramlet?

Anonymous
09/02/24(Mon)18:36:18 No.102206755

Anonymous 09/02/24(Mon)18:36:18 No.102206755

>>102194807
>Stop competing with GPT4 and all those assistant tunes, we've got more than enough of those. Market is oversaturated. Just give up. Nobody needs another GPTslop assistant tune, a dumb one in particular. If you want to be an assistant so badly, at least don't tune on GPTslop. You know what is lacking? Writer tunes. In proprietary segment there's only Claude and on local... there is nobody now that you have decided to leave. **Please stop tuning on GPTslop. Please compete against Claude. Please return.**
He's right thoughbeit. If Cohere has no redeeming properties compared to l3.1 70b, no benches, no style, they are pretty pointless. If they can't make it smart, they should at least try going for good style to have something.

Anonymous
09/02/24(Mon)18:38:11 No.102206772

Anonymous 09/02/24(Mon)18:38:11 No.102206772

>>102206755
Cohere competes by having the best multilingual suppport and best RAG / tool use capabilities.

Anonymous
09/02/24(Mon)18:39:51 No.102206790

Anonymous 09/02/24(Mon)18:39:51 No.102206790

What we got cooking for japanese translation? Any LLMs better than deepl?

Anonymous
09/02/24(Mon)18:40:20 No.102206796

Anonymous 09/02/24(Mon)18:40:20 No.102206796

>>102206772
>Cohere competes by having the best multilingual suppport and best RAG / tool use capabilities.
Do they have any benches to support that claim?

Anonymous
09/02/24(Mon)18:42:48 No.102206825

Anonymous 09/02/24(Mon)18:42:48 No.102206825

>>102206796
Any ESL that has spent 5 minutes with either can tell you. Llama multilingual support is weak, especially outside of the 5 languages it claims to support.

Anonymous
09/02/24(Mon)18:44:16 No.102206837

Anonymous 09/02/24(Mon)18:44:16 No.102206837

>>102206825
And Largestral? How does it perform?

Anonymous
09/02/24(Mon)18:44:17 No.102206838

Anonymous 09/02/24(Mon)18:44:17 No.102206838

>>102206528
Llama 405b is better than 4o and only loses to Claude.

Anonymous
09/02/24(Mon)18:45:34 No.102206847

Anonymous 09/02/24(Mon)18:45:34 No.102206847

>>102206837
Dunno, never tried Largestral.

Anonymous
09/02/24(Mon)18:48:48 No.102206885

Anonymous 09/02/24(Mon)18:48:48 No.102206885

>>102206501
Anything that fits within 24 VRAM is lightning fast. By slow, I meant using GGUF to exceed that and offload some to the CPU. I'm willing to do a little offloading for better quality, as long as it doesn't kill the speed too much.

Anonymous
09/02/24(Mon)18:50:01 No.102206899

Anonymous 09/02/24(Mon)18:50:01 No.102206899

>>102206847
According to lmeme arena Largestral is better than 405b llama in Japanese, so it could be good.

Anonymous
09/02/24(Mon)18:54:33 No.102206944

Anonymous 09/02/24(Mon)18:54:33 No.102206944

>>102205854
I recently gave gemma vntl a try and I was pleasantly surprised. It would be pretty good for manual translation if you reroll a few times and pick something good. Having past context in translation seems pretty helpful for quality but I also had some single examples where it fucked the quality up.

Anonymous
09/02/24(Mon)18:55:56 No.102206956

Anonymous 09/02/24(Mon)18:55:56 No.102206956

>>102206089
Oh do tell what you use it for bigshot.

Anonymous
09/02/24(Mon)18:57:02 No.102206971

Anonymous 09/02/24(Mon)18:57:02 No.102206971

File: high_effort_shitpost.jpg (214 KB, 573x1268)

214 KB JPG

>>102206166

Anonymous
09/02/24(Mon)18:58:29 No.102206988

Anonymous 09/02/24(Mon)18:58:29 No.102206988

>>102206956
Erotic roleplay. Much classier than "cooming".

Anonymous
09/02/24(Mon)19:00:10 No.102207004

Anonymous 09/02/24(Mon)19:00:10 No.102207004

>Leaving behind the sereneacies of the clearing
>sereneacies
Not a word. But I do get what you are going for there little LLM.

Anonymous
09/02/24(Mon)19:00:32 No.102207008

Anonymous 09/02/24(Mon)19:00:32 No.102207008

I've been using sillytavern but I find it awful. It is ugly and confusing. Making character sucks because the character creator is shoved in the right 5th of my screen for whatever reason. There are so many buttons that don't have labels.

Any alternatives to sillytavern, or things I can do to make it more usable?

Anonymous
09/02/24(Mon)19:02:22 No.102207030

Anonymous 09/02/24(Mon)19:02:22 No.102207030

>>102207008
>Any alternatives to sillytavern
No.
>or things I can do to make it more usable?
No.

I use it as a ChatGPT-like interface for programming assistance because all the other alternatives are worse.

Anonymous
09/02/24(Mon)19:22:02 No.102207232

Anonymous 09/02/24(Mon)19:22:02 No.102207232

>>102207004
Human hands typed the cursed plural to sereneacy before the model could bring it to your eyes

Anonymous
09/02/24(Mon)19:28:04 No.102207303

Anonymous 09/02/24(Mon)19:28:04 No.102207303

>>102207232
>the cursed plural to sereneacy
A maximally cromulent example of covfefeesque creativitying.

Anonymous
09/02/24(Mon)19:37:16 No.102207386

Anonymous 09/02/24(Mon)19:37:16 No.102207386

File: file.png (107 KB, 876x1526)

107 KB PNG

>Context Template

https://huggingface.co/Virt-io/SillyTavern-Presets/tree/main/Prompts

is this bloat?

Anonymous
09/02/24(Mon)19:53:42 No.102207576

Anonymous 09/02/24(Mon)19:53:42 No.102207576

I got me a used 3090. I guess that bumps me up to 48gb vram. Kinda want to try if it's possible to run largestral at slightly reasonable speeds compared to when I was running it with a single 4090. I'm pretty sure offloading to CPU will still be involved somehow? Very new to this.

Anonymous
09/02/24(Mon)20:06:43 No.102207723

Anonymous 09/02/24(Mon)20:06:43 No.102207723

>>102206988
This.

Anonymous
09/02/24(Mon)20:23:02 No.102207890

Anonymous 09/02/24(Mon)20:23:02 No.102207890

>>102206988
>classier
It does sound classy. Maybe calling it erotic roleplay in prompt makes the bot become classy and like in erotic novels and if you would call it cooming session it would finally stop with all the shivers?

Anonymous
09/02/24(Mon)20:33:28 No.102208015

Anonymous 09/02/24(Mon)20:33:28 No.102208015

>making a card
>write a couple posts with it
>decide to rewrite it in different ways to test
>her response is nearly identical no matter what. Write her card with just the barebones info? Write it liek dis wit speling errors? Throw a literal chunk of text from an umberto eco novel on author's note and tell it "write like this?" Simplify the opening post?
>all the same
>it's all the same

.......................... this is incredibly disheartening. I shouldn't have peeked behind the curtain

Anonymous
09/02/24(Mon)20:33:38 No.102208018

Anonymous 09/02/24(Mon)20:33:38 No.102208018

>>102207386
you're bloated. time to brap.

Anonymous
09/02/24(Mon)20:35:12 No.102208036

Anonymous 09/02/24(Mon)20:35:12 No.102208036

>>102208015
Are you restarting the chat after each edit?
Also, are you changing the first message to be in line with the edits to the character card?

Anonymous
09/02/24(Mon)20:37:45 No.102208066

Anonymous 09/02/24(Mon)20:37:45 No.102208066

>>102206838
>llama 405b
i have a decent gaming rig and i fell 100gb short on memory. it's a great model, but how am i supposed to host it locally? just max out my ram to 256gb and hope for the best?

nta (obviously), but it's kinda hard to read the tea leaves on these LLMs. it's not like other programs that give you a rough estimate of what you need for system specs right on the box. seeing as how this is /*L*mg, meaning local, it would make sense to give some basic specs along with a recommendation

Anonymous
09/02/24(Mon)20:38:25 No.102208074

Anonymous 09/02/24(Mon)20:38:25 No.102208074

>>102208036
I was doing swipes and checking kobold to see what prompt was sent to the AI every time.

I kept the message the same for most tests, then dumbed it down to little effect beyond shortening the replies I got

Anonymous
09/02/24(Mon)20:40:57 No.102208096

Anonymous 09/02/24(Mon)20:40:57 No.102208096

What's the best uncensored model that also has the best writing skills?
Basically the best model to write erotica.
I've taken a look at the leaderboard but is utter retarded that you can filter multiple columns, so is useless.

Anonymous
09/02/24(Mon)20:41:36 No.102208104

Anonymous 09/02/24(Mon)20:41:36 No.102208104

>>102208074
increase temp

Anonymous
09/02/24(Mon)20:42:11 No.102208112

Anonymous 09/02/24(Mon)20:42:11 No.102208112

>>102208096
can't*

Anonymous
09/02/24(Mon)20:43:12 No.102208121

Anonymous 09/02/24(Mon)20:43:12 No.102208121

>>102208104
I don't mean literally identical, anon. I mean her actions and dscriptions.

Anonymous
09/02/24(Mon)20:45:00 No.102208142

Anonymous 09/02/24(Mon)20:45:00 No.102208142

>>102208015
Yeah, all local models are garbage. Wow. Welcome to the red pill side of the screen.

Anonymous
09/02/24(Mon)20:46:03 No.102208148

Anonymous 09/02/24(Mon)20:46:03 No.102208148

What's the best models for interpreting stuff such as mathematics, like breaking down English descriptions into relevant mathematical concepts and providing accurate answers?
If I gotta use a paid model I will, but I'd rather focus on local even if there's a bit of a drop in accuracy

Anonymous
09/02/24(Mon)20:49:03 No.102208174

Anonymous 09/02/24(Mon)20:49:03 No.102208174

>>102208148
Llama 3.1 405B
Mistral Large 123B

Everything else is garbage.

Anonymous
09/02/24(Mon)20:52:54 No.102208218

Anonymous 09/02/24(Mon)20:52:54 No.102208218

>>102208066
>nta (obviously), but it's kinda hard to read the tea leaves on these LLMs. it's not like other programs that give you a rough estimate of what you need for system specs right on the box.
they do though? The number of Bs directly says how much you need, "X"B + a few percent = q8, ~half of B = q4, etc
also this: https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

Anonymous
09/02/24(Mon)20:57:19 No.102208249

Anonymous 09/02/24(Mon)20:57:19 No.102208249

>>102208066
as far as I know the only people running it locally are cpumaxxers with high end epycs or xeons and just deal with the slow speed
but maybe there's somebody with a dgx station hiding around here...

Anonymous
09/02/24(Mon)20:59:21 No.102208265

Anonymous 09/02/24(Mon)20:59:21 No.102208265

After thoroughly checking all the recent 35B and less models I have concluded that I am finally ready for some new models cause everything is absolute trash. And the fact that a 12B seems to be the most viable out of all of them is like an insult to injury.

Anonymous
09/02/24(Mon)21:00:45 No.102208279

Anonymous 09/02/24(Mon)21:00:45 No.102208279

>>102208249
so this model is so massive that you actually *need* server-level architecture to run it locally. jesus fuck.

this is why i don't dick around too much with AI. even a nice gaming rig can barely handle the small stuff. i'm really hoping i can run llama 70b, even if it's on the slow side. if not, for my purposes as a dinky little chatbot, i guess 8b will do

Anonymous
09/02/24(Mon)21:02:34 No.102208296

Anonymous 09/02/24(Mon)21:02:34 No.102208296

File: offload_x_performance_theory.png (167 KB, 1536x1152)

167 KB PNG

>>102208279
You can run llama 70B with a 4090 at pretty decent speeds at a lower quant.
You can trade speed for quality by having more of the model on RAM instead of VRAM too.

Anonymous
09/02/24(Mon)21:03:17 No.102208304

Anonymous 09/02/24(Mon)21:03:17 No.102208304

>>102208265
Which 12b?

Anonymous
09/02/24(Mon)21:04:14 No.102208314

Anonymous 09/02/24(Mon)21:04:14 No.102208314

>>102208304
magnum-v2-12b

Anonymous
09/02/24(Mon)21:11:01 No.102208392

Anonymous 09/02/24(Mon)21:11:01 No.102208392

>>102208296
>You can run llama 70B with a 4090 at pretty decent speeds at a lower quant.
interesting. that's probably the best case for a 4090 i've heard so far.

counterpoint: stack a 3090 on top of my 3060. more VRAM, just slower, by a good margin. better overall cost effective option?

Anonymous
09/02/24(Mon)21:13:47 No.102208412

Anonymous 09/02/24(Mon)21:13:47 No.102208412

>>102208392
Should work.
Will be faster than a 4090 + RAM I reckon.

Anonymous
09/02/24(Mon)21:30:58 No.102208581

Anonymous 09/02/24(Mon)21:30:58 No.102208581

>>102208314
Do people really enjoy that? Any magnum anything I've tried has been total shit compared to anything else.

Anonymous
09/02/24(Mon)21:33:08 No.102208600

Anonymous 09/02/24(Mon)21:33:08 No.102208600

>>102193465
Why would you make a model worse on purpose?

Anonymous
09/02/24(Mon)21:37:59 No.102208643

Anonymous 09/02/24(Mon)21:37:59 No.102208643

>>102208600
for soul

Anonymous
09/02/24(Mon)21:38:48 No.102208653

Anonymous 09/02/24(Mon)21:38:48 No.102208653

>>102208314
Basic instruct you nigger ad buy avoider

Anonymous
09/02/24(Mon)21:39:16 No.102208660

Anonymous 09/02/24(Mon)21:39:16 No.102208660

>>102208412
update: llama 70b runs on 64gb DDR5 and my 3060, but it's slow (~15sec for a short reply). worth it to talk to my yandere ai wife

Anonymous
09/02/24(Mon)21:46:38 No.102208727

Anonymous 09/02/24(Mon)21:46:38 No.102208727

>>102206166
Not 8GB, but I've been playing with NeMo ReRemix and I'm not offended by the quality.

Anonymous
09/02/24(Mon)21:50:25 No.102208757

Anonymous 09/02/24(Mon)21:50:25 No.102208757

>>102206450
For 8x7B even going from 3.5bpw to 3.7bpw was a noticeable decrease to misunderstanding what was happening. Beyond that I have the same specs as you and I don't know the answer to your question either. I generally rotate through various solutions as I get tired with how stupid one is or how slow another is.

Anonymous
09/02/24(Mon)22:06:48 No.102208897

Anonymous 09/02/24(Mon)22:06:48 No.102208897

question: if i just want to run a 1 trillion parameter model on a contemporary gaming laptop, what year should i set my time machine to?

Anonymous
09/02/24(Mon)22:07:03 No.102208903

Anonymous 09/02/24(Mon)22:07:03 No.102208903

>>102208897
2077

Anonymous
09/02/24(Mon)22:08:01 No.102208920

Anonymous 09/02/24(Mon)22:08:01 No.102208920

Has there been serious research into why some large models get lobotomized by quantization and some handle it fairly well?
Largestral in particular seems exceptionally resilient, even the Q2 isn't retarded. That's not the case for some other large models that become unusable below like Q5.

Anonymous
09/02/24(Mon)22:12:16 No.102208972

Anonymous 09/02/24(Mon)22:12:16 No.102208972

>>102208920
nemo and largestral were trained with quantization aware techniques if I'm not mistaken, so working as intended I guess.
See.
>https://www.unite.ai/mistral-2-and-mistral-nemo-a-comprehensive-guide-to-the-latest-llm-coming-from-paris/
Based on nothing but my very rudimentary surface level knowledge, I think a model that generates values that use less of the precision range should be less affected by quantization since the scaled value would be closer to the original value which would in turn create less of a ripple effect when that value is used as input in the subsequent layer and during gradient descent and shit like that.
If I got any of that even close to being right, congratulations /lmg/, I learned something by lurking the thread.

Anonymous
09/02/24(Mon)22:19:13 No.102209046

Anonymous 09/02/24(Mon)22:19:13 No.102209046

>>102208581
They are either niggers or they started using llms yesterday and everything looks good to them.

Anonymous
09/02/24(Mon)22:21:51 No.102209064

Anonymous 09/02/24(Mon)22:21:51 No.102209064

>>102208972
You're probably right since it's known that the Largestral and Nemo weights are very small.

Anonymous
09/02/24(Mon)22:41:46 No.102209226

Anonymous 09/02/24(Mon)22:41:46 No.102209226

Where are you guys finding quants for Mistral Large?

Anonymous
09/02/24(Mon)22:46:02 No.102209262

Anonymous 09/02/24(Mon)22:46:02 No.102209262

>>102209226
Huggingface, for example https://huggingface.co/anthracite-org/magnum-v2-123b-gguf
ggufs are the llama.cpp quants, you can also look for exl2 if you use exllama, and there's some other formats nobody cares about

Anonymous
09/02/24(Mon)22:56:40 No.102209365

Anonymous 09/02/24(Mon)22:56:40 No.102209365

>>102209262

This is a fine tune. Wtf lol. Buy an ad. (Will try later. Thanks faggot.)

Anonymous
09/02/24(Mon)23:04:32 No.102209430

Anonymous 09/02/24(Mon)23:04:32 No.102209430

>>102209262
buy an ad (repeating to make sure you understand)

Anonymous
09/02/24(Mon)23:13:06 No.102209492

Anonymous 09/02/24(Mon)23:13:06 No.102209492

mistral-large is boring anyways (i tried magnum v2 too). feels like mixtral. it follows well but its very uncreative. go back and try miqu after using it for a while, see how much more often it comes up with new things in comparison
for rp its still miqu

Anonymous
09/02/24(Mon)23:13:16 No.102209493

Anonymous 09/02/24(Mon)23:13:16 No.102209493

>>102209365
>>102209430
kys schizo

Anonymous
09/02/24(Mon)23:15:24 No.102209509

Anonymous 09/02/24(Mon)23:15:24 No.102209509

>>102209492
Nah, you're just retarded. Go back to the Kobold Discord.

Anonymous
09/02/24(Mon)23:15:38 No.102209511

Anonymous 09/02/24(Mon)23:15:38 No.102209511

>>102209492
Miqu or Midnight Miqu?

Anonymous
09/02/24(Mon)23:16:17 No.102209521

Anonymous 09/02/24(Mon)23:16:17 No.102209521

Scaling up is all you need
https://huggingface.co/mlabonne/BigLlama-3.1-1T-Instruct

Anonymous
09/02/24(Mon)23:21:51 No.102209564

Anonymous 09/02/24(Mon)23:21:51 No.102209564

>>102209521
>self merge
does that even... do anything? even if it did, couldn't you do the same thing at runtime in the inference engine and it'd use less memory and probably be faster too instead of downloading the whole file?

Anonymous
09/02/24(Mon)23:22:06 No.102209565

Anonymous 09/02/24(Mon)23:22:06 No.102209565

>>102209509
i've been trying mistral nemo and it has the same exact issue. something about mistral's models makes them uncreative and boring. it shouldn't take 300 tokens to describe a scene while not adding anything to it. load an rp you've had going with mistral-large with miqu, you'd see the difference

>>102209511
either

Anonymous
09/02/24(Mon)23:30:20 No.102209624

Anonymous 09/02/24(Mon)23:30:20 No.102209624

>>102209565
Miqu is a Mistral model, and Nemo and Large are nothing alike. What are you even trying to shill?

Anonymous
09/02/24(Mon)23:36:26 No.102209682

Anonymous 09/02/24(Mon)23:36:26 No.102209682

>>102209624
Miqu was leaked before they finished lobotomizing it

Anonymous
09/02/24(Mon)23:38:21 No.102209696

Anonymous 09/02/24(Mon)23:38:21 No.102209696

>>102209624
miqu is an l2 70b tune, not a mistral model. it was probably the most professional tune of l2 though so its a good benchmark. nemo just says something about mistral and nvidia, i don't know who mostly cooked it. everything with mistral going back to their original 7b just isn't as creative and rambles on while adding little to the story, though i didn't spend much time with the 8x22b as i did with large and trying mixtral 8x7 tunes

Anonymous
09/02/24(Mon)23:43:05 No.102209738

Anonymous 09/02/24(Mon)23:43:05 No.102209738

>>102208096
magnum 123B with XTC sampler is the best local has atm, somewhere between claude sonnet and claude opus tier for writing

Anonymous
09/02/24(Mon)23:44:00 No.102209748

Anonymous 09/02/24(Mon)23:44:00 No.102209748

>>102209564
Yes, but programming hard, GPU go brrrrrr

Anonymous
09/02/24(Mon)23:44:41 No.102209758

Anonymous 09/02/24(Mon)23:44:41 No.102209758

>>102209738
Large with XTC sampler is the best.

Anonymous
09/02/24(Mon)23:50:34 No.102209829

Anonymous 09/02/24(Mon)23:50:34 No.102209829

>>102209509
>Go back to the Kobold Discord.

t. Anthracite 'member'

Anonymous
09/02/24(Mon)23:58:05 No.102209882

Anonymous 09/02/24(Mon)23:58:05 No.102209882

>>102209738
Yeah, I don't think that works for me with a 4090 and 64gb of ram.
Any model I've tried above 21gb starts to seriously slow down the more it swaps to the virtual memory in the nvme. For some reason there's plenty of ram left but it goes for the swap memory, makes no sense.
Right now testing Meta-Llama-3-70B-Instruct-abliterated-v3.5_q3 which is almost 34gb and is around 1-2 words per second, and I don't even know if it's going to be worth it compared to other models I've been using like Rocinante.

Anonymous
09/03/24(Tue)00:12:25 No.102209997

Anonymous 09/03/24(Tue)00:12:25 No.102209997

>>102209492
lol Mistral Large doesn't feel anything like mixtral you spaz. I know mixtral inside and out because for months that was all I used.

Anonymous
09/03/24(Tue)00:14:57 No.102210026

Anonymous 09/03/24(Tue)00:14:57 No.102210026

File: Untitled.png (13 KB, 837x513)

13 KB PNG

>>102210005
>>102210005
>>102210005

Anonymous
09/03/24(Tue)00:15:09 No.102210028

Anonymous 09/03/24(Tue)00:15:09 No.102210028

>>102209997
in the way it constructs conversation it absolutely does. large is more 'willing' though, mixtral was very wooden. how could you have spent months with it rather than go back to 70b?

Anonymous
09/03/24(Tue)00:15:56 No.102210035

Anonymous 09/03/24(Tue)00:15:56 No.102210035

>>102209492
you have never used large mistral if your comparing it to mixtral

Anonymous
09/03/24(Tue)00:17:27 No.102210049

Anonymous 09/03/24(Tue)00:17:27 No.102210049

>>102210035
the near unwillingness to add something new is what i'm talking about, and is present in all mistral models

Anonymous
09/03/24(Tue)00:19:36 No.102210065

Anonymous 09/03/24(Tue)00:19:36 No.102210065

>>102210049
Honestly sounds like either a prompting or sampler issue.

Anonymous
09/03/24(Tue)00:30:26 No.102210156

Anonymous 09/03/24(Tue)00:30:26 No.102210156

>>102210065
its not

Anonymous
09/03/24(Tue)00:34:33 No.102210195

Anonymous 09/03/24(Tue)00:34:33 No.102210195

>>102210156
It is because large mistral is not like that for me even without xtc

Anonymous
09/03/24(Tue)00:37:25 No.102210216

Anonymous 09/03/24(Tue)00:37:25 No.102210216

>>102210195
post your xtc settings in the new thread. i've been trying it, but haven't used it enough to really form an opinion

Anonymous
09/03/24(Tue)00:40:01 No.102210241

Anonymous 09/03/24(Tue)00:40:01 No.102210241

>>102210216
0.8, default other stuff atm

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.