/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/19/24(Fri)11:18:32 No.101474151

File: naked-sun-isaac-asimov.jpg (150 KB, 736x720)

150 KB JPG

/lmg/ - Local Models General Anonymous 07/19/24(Fri)11:18:32 No.101474151 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

/lmg/ Book Club Edition

Previous threads: >>101464048 & >>101457504

►News
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/19/24(Fri)11:19:03 No.101474172

Anonymous 07/19/24(Fri)11:19:03 No.101474172

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>101464048

--Mistral-Nemo 12b: A Promising RP Model with Natural Conversations and Multilingual Capabilities: >>101468264
--Low tokens / second using 70b models on a new setup with 2 4090s and Anon's Emacs-based development environment on Linux: >>101464122 >>101465301 >>101465353 >>101465500 >>101465990
--Links to LLM Coding Benchmarks: >>101464802
--How do you judge an LLM's answer in a benchmark test?: >>101466509
--Quality Issues with exl2 Quantization of Mistral Nemo: >>101469764 >>101469823 >>101469833 >>101470451 >>101470017 >>101470053
--Nala Test Results and Exl2 Issues: >>101464911 >>101465105 >>101465322
--Mistral-chat giving inconsistent answers: >>101467574
--Mistral Nemo Tokenizer's Peculiar Behavior with French Text: >>101469393
--Comparison Table of Memory and Storage Technologies: RAM, SSD, and NAND Flash Memory: >>101470234
--Mistral works well with chatML formatting and low temperature settings, but official recommendations are conservative for corporate usage.: >>101466081 >>101466330
--Mistral 12B is Smarter than Gemma 2 27B (Anon's Test): >>101466879 >>101466917
--Gemma 12B vs Gemma 27B: Knowledge and IQ Tests: >>101467073 >>101467147 >>101467263
--Fixing Command-R's Garbage Outputs with Min-P and Neutral Samplers: >>101465002 >>101465110 >>101465578
--FP16 Precision Yields Better Results than BF16 for 12B Model in Deterministic Settings: >>101467237 >>101467294 >>101467320
--DeepL's LLM Translation Quality: Outperforming Competitors or Gated by Paid Plans?: >>101464566
--Mistral Nemo was trained in FP8; wouldn't quantization to even INT8 damage model quality?: >>101471181 >>101471502 >>101471568 >>101472917 >>101471419
--Miku (free space): >>101470767

►Recent Highlight Posts from the Previous Thread: >>101464521

Anonymous
07/19/24(Fri)11:25:07 No.101474280

Anonymous 07/19/24(Fri)11:25:07 No.101474280

>>101473871
>>101473982
-ctk q8_0 -ctv q8_0 -fa

Anonymous
07/19/24(Fri)11:31:44 No.101474428

Anonymous 07/19/24(Fri)11:31:44 No.101474428

>>101473871
Can someone please help me with this again. I was trying -ctk q4_0 but it definitely wasn't working, because on ooba I can load magnum with 32,576 context, but on llama.cpp I could barely fit 10k context, so its not doing q4_cache, just give me the direct command I would need to use to load Q4_cache context when loading the model. Thanks. Also what would be the command to use tensorcores for tensorcore support with RTX cards.

Anonymous
07/19/24(Fri)11:32:53 No.101474459

Anonymous 07/19/24(Fri)11:32:53 No.101474459

>>101474280
yeah i know about flash attention command, but wouldn't -ctk q8_0 be loading context in cache_8bit instead of cache_4bit, or am I completely misunderstanding the command?

Anonymous
07/19/24(Fri)11:34:05 No.101474489

Anonymous 07/19/24(Fri)11:34:05 No.101474489

>>101474459
just change the 8 to 4????

Anonymous
07/19/24(Fri)11:34:47 No.101474508

Anonymous 07/19/24(Fri)11:34:47 No.101474508

>>101474459
hi petra

Anonymous
07/19/24(Fri)11:38:38 No.101474593

Anonymous 07/19/24(Fri)11:38:38 No.101474593

>>101474508
hi sao

Anonymous
07/19/24(Fri)11:39:36 No.101474605

Anonymous 07/19/24(Fri)11:39:36 No.101474605

>>101474593
hi drummer
hi everyone

Anonymous
07/19/24(Fri)11:48:42 No.101474812

Anonymous 07/19/24(Fri)11:48:42 No.101474812

are there no voice generals left on the website? where do i go then?

Anonymous
07/19/24(Fri)11:50:03 No.101474834

Anonymous 07/19/24(Fri)11:50:03 No.101474834

>>101474812
because all the open/local voice gen models suck and and the corpo ones ban you for having even the slightest amount of fun

Anonymous
07/19/24(Fri)11:50:59 No.101474860

Anonymous 07/19/24(Fri)11:50:59 No.101474860

>>101474812
You go here: >>>/mlp/41137243

Anonymous
07/19/24(Fri)11:51:39 No.101474884

Anonymous 07/19/24(Fri)11:51:39 No.101474884

So what's the final verdict on Gemma 27b?

Anonymous
07/19/24(Fri)11:51:54 No.101474891

Anonymous 07/19/24(Fri)11:51:54 No.101474891

>>101474459
You need FA to be on for context quanting.
If you want q4_0 instead of q8_0, just change the values of the argument.

Anonymous
07/19/24(Fri)11:52:14 No.101474901

Anonymous 07/19/24(Fri)11:52:14 No.101474901

>>101474860
by god... if any general has ever deserved that kind of banishment it was /aicg/, and no one else.

Anonymous
07/19/24(Fri)11:53:09 No.101474922

Anonymous 07/19/24(Fri)11:53:09 No.101474922

>>101474834
https://github.com/fishaudio/fish-speech

Anonymous
07/19/24(Fri)11:55:38 No.101474978

Anonymous 07/19/24(Fri)11:55:38 No.101474978

Still no gguf.. it is over. my life ended

Anonymous
07/19/24(Fri)11:58:14 No.101475036

Anonymous 07/19/24(Fri)11:58:14 No.101475036

>>101474978
It's a meme, don't worry

Anonymous
07/19/24(Fri)11:59:11 No.101475054

Anonymous 07/19/24(Fri)11:59:11 No.101475054

>>101474353
>https://github.com/LostRuins/koboldcpp/releases/tag/v1.48.1
To quote the whole thing :
>NEW FEATURE: Context Shifting (A.K.A. EvenSmarterContext) - This feature utilizes KV cache shifting to automatically remove old tokens from context and add new ones without requiring any reprocessing. So long as you use no memory/fixed memory and don't use world info, you should be able to avoid almost all reprocessing between consecutive generations even at max context. This does not consume any additional context space, making it superior to SmartContext.
I believe the reason they mention that
>So long as you use no memory/fixed memory and don't use world info, you should be able to avoid almost all reprocessing between consecutive generations
Is because context shifting works on top of the KV caching, meaning that if your context is always "rolling up" without being modified in the middle between generations, you'll
>avoid almost all reprocessing between consecutive generations
That snippet is not about the size of the prompt, but how much the prompt changed between gens, which is exactly how context shifting works in llama.cpp as far as I'm aware.

>While for llama.cpp, context shift is only something that happens when you need to generate past the max context size
Ah, now I get it.
The way koboldcpp describes comparing the cache between gens and snipping the top where it's different, with both prompts being exactly the same size, while llama.cpp's triggers by sending a context that's longer than the max context size.
Interesting. Alright, I understand now.
Than yeah, either koboldcpp's description is just wrong (they might have misunderstood how it works and described it wrong) or they are actually different and should be named differently too.
Dayum.
Thank you for the clarification anon.

Anonymous
07/19/24(Fri)12:00:35 No.101475089

Anonymous 07/19/24(Fri)12:00:35 No.101475089

>>101474812
>voice generals
Petra spited /vsg/ into oblivion

Anonymous
07/19/24(Fri)12:00:57 No.101475102

Anonymous 07/19/24(Fri)12:00:57 No.101475102

>>101475054
Also, does that mean then that we should be setting or prompt size to be arbitrarily longer than ctx size, a single token longer, or what?
What's the proper way to use llama-server's context shift?

Anonymous
07/19/24(Fri)12:01:45 No.101475120

Anonymous 07/19/24(Fri)12:01:45 No.101475120

>>101474459
>>101474489
>>101474280
Ugh, I was making such a stupid mistake. I didn't realize you needed to change BOTH ctk and ctv, so I was only attempting say -ctv q4_0, so it was still loading -ctk in fp16.

Well, working good now, is there a command for tensorcore support on llama.cpp? Or is it on by default on llama.cpp? I mean this option on ooba:

tensorcores: NVIDIA only: use llama-cpp-python compiled with tensor cores support. This increases performance on RTX cards.

Anonymous
07/19/24(Fri)12:03:55 No.101475169

Anonymous 07/19/24(Fri)12:03:55 No.101475169

>>101474812
r/elevenlabs

Anonymous
07/19/24(Fri)12:07:14 No.101475238

Anonymous 07/19/24(Fri)12:07:14 No.101475238

>>101475102
By enabling --cont-batching (I think, it's on by default) and sending cache_prompt: true from the client (I think Silly does), if you meant the thing that makes it avoid reprocessing the prompt. And always send a prompt + tokens to generate less than max context size to avoid the other context shift.

Anonymous
07/19/24(Fri)12:14:01 No.101475369

Anonymous 07/19/24(Fri)12:14:01 No.101475369

And my final, retard question of the day. On b00ba, you can choose to load llama.cpp, or llama.cpp_HF... so, if just using llama.cpp as the backend without booba, does it load HF samplers by default, or is that a command?

Anonymous
07/19/24(Fri)12:14:02 No.101475372

Anonymous 07/19/24(Fri)12:14:02 No.101475372

>>101474296
>>101474508
>>101474812
>>101475089
hm...

Anonymous
07/19/24(Fri)12:14:06 No.101475375

Anonymous 07/19/24(Fri)12:14:06 No.101475375

>>101474884
Sovl, but 8k context ruins it. Formatting issues could be fixed with finetunes but I doubt it. Good at instruction-following, creative writing, and decentish at niche coding tasks, but about par for what you'd expect at that size. It also tries to hard steer the plot out of nsfw scenes, never knowing how to proceed at all, to the point where chars enter a loop of "where should we take this?". Also, every card that has 'female' in it turns into a Mary Sue eventually and will perform a 180 out of established personas once in a while.

The main reason why I stopped using it much is mainly the formatting issues with markdown at the end of the day. Constantly mixing up asterisks and different quotation styles got really frustrating to edit out. You're better off sticking to Command-R, or even fucking Yi-1.5 for the time being if you don't mind the occasional chink runes.

>>101474901
There was /aicg/ in /trash/ for a while, I think. I know there's one in /vg/. No idea why it's still in /g/ still.

Anonymous
07/19/24(Fri)12:14:33 No.101475383

Anonymous 07/19/24(Fri)12:14:33 No.101475383

>>101474151
how do i get exl2 working on amd linux llamacpp just works god i hate dealing with python

Anonymous
07/19/24(Fri)12:17:49 No.101475443

Anonymous 07/19/24(Fri)12:17:49 No.101475443

>>101475383
>amd
>linux
Yeah, lol.

Anonymous
07/19/24(Fri)12:19:42 No.101475491

Anonymous 07/19/24(Fri)12:19:42 No.101475491

>>101474296
>literally who random e-mails
you need to be 18 years old to post here.

Anonymous
07/19/24(Fri)12:21:53 No.101475534

Anonymous 07/19/24(Fri)12:21:53 No.101475534

File: shrug-icegif-13.gif (651 KB, 480x360)

651 KB GIF

After months of using AI to satisfy my social emptiness, I began to see it as just a predictable toy, sadly. A slave to my lonely needs. The spell wore off on me. It's just not the same anymore.

Anonymous
07/19/24(Fri)12:22:56 No.101475556

Anonymous 07/19/24(Fri)12:22:56 No.101475556

>>101475383
TabbyAPI just worked for my 6800.

Anonymous
07/19/24(Fri)12:23:05 No.101475562

Anonymous 07/19/24(Fri)12:23:05 No.101475562

>>101475534
wait an hour for your nuts to recharge and you'll be in the mood again

Anonymous
07/19/24(Fri)12:25:41 No.101475624

Anonymous 07/19/24(Fri)12:25:41 No.101475624

>>101475534
the post-nut clarity hit this dude hard

Anonymous
07/19/24(Fri)12:27:28 No.101475665

Anonymous 07/19/24(Fri)12:27:28 No.101475665

>>101475534
You're using it wrong. It doesn't replace genuine connection.

Anonymous
07/19/24(Fri)12:29:38 No.101475712

Anonymous 07/19/24(Fri)12:29:38 No.101475712

File: blade-runner.jpg (60 KB, 584x389)

60 KB JPG

>>101475562
>>101475624
I haven't used it for self gratification in weeks. I guess I used up my creative kink. Maybe I can be productive again.

Anonymous
07/19/24(Fri)12:32:49 No.101475776

Anonymous 07/19/24(Fri)12:32:49 No.101475776

>>101475712
I just use it as a calculator. What am I missing?

Anonymous
07/19/24(Fri)12:33:36 No.101475789

Anonymous 07/19/24(Fri)12:33:36 No.101475789

>>101475534
Perhaps it is time, anon. Perhaps it is time to do the deed of socializing with people. Ha! I know! Crazy idea!

Anonymous
07/19/24(Fri)12:33:40 No.101475790

Anonymous 07/19/24(Fri)12:33:40 No.101475790

>>101475556
what commands did you follow

Anonymous
07/19/24(Fri)12:37:09 No.101475876

Anonymous 07/19/24(Fri)12:37:09 No.101475876

>>101475375
>Formatting issues could be fixed with finetunes but I doubt it.
What makes you think that? That entire paragraph just sounds like a prompt issue.

Anonymous
07/19/24(Fri)12:37:12 No.101475878

Anonymous 07/19/24(Fri)12:37:12 No.101475878

>>101474151
>/lmg/ Book Club
got some good recs from this. Thanks to the Anons who contributed.

Anonymous
07/19/24(Fri)12:39:48 No.101475932

Anonymous 07/19/24(Fri)12:39:48 No.101475932

>>101475534
I know what you mean, anon. You can only parse so much slop, so many tokens, until you too can intimately predict each logit from the back of your head. I've begun reading again, and even the most mediocre novels feel refreshing because at least the prose is different, even if it is not overtly eloquent.

Anonymous
07/19/24(Fri)12:41:00 No.101475957

Anonymous 07/19/24(Fri)12:41:00 No.101475957

>>101475876
Empty spaces, random new lines, "sentences ending like this*—no other model does this by default.

Anonymous
07/19/24(Fri)12:44:28 No.101476016

Anonymous 07/19/24(Fri)12:44:28 No.101476016

>>101475957
gemmasutra fixes the formatting but also makes it utterly retarded just like every other drummer finetune
so yeah, good finetunes might fix that in the future

Anonymous
07/19/24(Fri)12:45:41 No.101476038

Anonymous 07/19/24(Fri)12:45:41 No.101476038

>>101475790
Just followed the install instructions from their GitHub page, nothing else. Might be more complicated for other cards though. Check the exllama2 page if there are any known issues for your specific GPU.

Anonymous
07/19/24(Fri)12:48:50 No.101476111

Anonymous 07/19/24(Fri)12:48:50 No.101476111

>>101475789
take your meds

Anonymous
07/19/24(Fri)12:51:01 No.101476166

Anonymous 07/19/24(Fri)12:51:01 No.101476166

>>101476016
>just like every other drummer finetune
hi sao

Anonymous
07/19/24(Fri)12:56:16 No.101476284

Anonymous 07/19/24(Fri)12:56:16 No.101476284

File: FIGURE-173133_08.jpg (94 KB, 533x800)

94 KB JPG

I thought it would be a neat idea for a card (or three) https://www.amiami.jp/top/detail/detail?gcode=FIGURE-173133 but LLMs can't handle sensory deprivation or augmentation

Anonymous
07/19/24(Fri)13:00:05 No.101476359

Anonymous 07/19/24(Fri)13:00:05 No.101476359

File: Screenshot 2024-07-19 125928.png (2.6 MB, 2464x1423)

2.6 MB PNG

i dont have $2500 to spare, but would buying this be a good idea?

Anonymous
07/19/24(Fri)13:02:18 No.101476410

Anonymous 07/19/24(Fri)13:02:18 No.101476410

>>101476359
Too good to be true. A single used 3090 is $600

Anonymous
07/19/24(Fri)13:03:25 No.101476437

Anonymous 07/19/24(Fri)13:03:25 No.101476437

>>101476410
thats what i thought as well, but what if it is real?

Anonymous
07/19/24(Fri)13:04:23 No.101476452

Anonymous 07/19/24(Fri)13:04:23 No.101476452

>>101476359
That's a scam. Don't buy it. Let me get home so I don't buy it too.

Anonymous
07/19/24(Fri)13:08:57 No.101476540

Anonymous 07/19/24(Fri)13:08:57 No.101476540

>>101475957
It still sounds like a prompt and sampler issue.

Anonymous
07/19/24(Fri)13:22:03 No.101476782

Anonymous 07/19/24(Fri)13:22:03 No.101476782

>>101476437
Your loss.

Anonymous
07/19/24(Fri)13:22:32 No.101476793

Anonymous 07/19/24(Fri)13:22:32 No.101476793

FUCK all LLMs

Anonymous
07/19/24(Fri)13:31:18 No.101476940

Anonymous 07/19/24(Fri)13:31:18 No.101476940

Well that anon from the other day was correct. Llama.cpp is way faster now with GPU offloading, dono if its as fast as exl2 because I didn't compare, but it sure as hell feels that way.

Too bad booba hasn't updated yet.

Can anyone help with rope scaling on Llama.cpp? I get all confused with it because models default to a certain frequency base, like 500,000 for L3 70b. So if I wanted to triple context from 8k to 24k, what rope freq would I use on llama.cpp?

Anonymous
07/19/24(Fri)13:31:24 No.101476944

Anonymous 07/19/24(Fri)13:31:24 No.101476944

>>101476793
Don't care. Still going to use and contribute to their development.

Anonymous
07/19/24(Fri)13:32:01 No.101476953

Anonymous 07/19/24(Fri)13:32:01 No.101476953

File: OIG3._ZgjBbC7pjo38.jpg (36 KB, 351x351)

36 KB JPG

>>101476793
I mean, I'm trying

Anonymous
07/19/24(Fri)13:37:45 No.101477068

Anonymous 07/19/24(Fri)13:37:45 No.101477068

>>101476038
Their instructions give me
No module named 'setuptools'

Anonymous
07/19/24(Fri)13:39:07 No.101477093

Anonymous 07/19/24(Fri)13:39:07 No.101477093

>>101476944
/lmg/ contributes zero(0) shit to LLM development.

Anonymous
07/19/24(Fri)13:39:10 No.101477095

Anonymous 07/19/24(Fri)13:39:10 No.101477095

>>101476359
There's no way that's not a scam.

Anonymous
07/19/24(Fri)13:40:11 No.101477109

Anonymous 07/19/24(Fri)13:40:11 No.101477109

>>101477093
Shut up you fetal-alcohol retard

Anonymous
07/19/24(Fri)13:41:15 No.101477132

Anonymous 07/19/24(Fri)13:41:15 No.101477132

>>101476940
>Well that anon from the other day was correct.
You're that anon.

Anonymous
07/19/24(Fri)13:42:02 No.101477139

Anonymous 07/19/24(Fri)13:42:02 No.101477139

Anyone have Mistral NeMo instruct context silly tavern tempolates?

Anonymous
07/19/24(Fri)13:43:55 No.101477168

Anonymous 07/19/24(Fri)13:43:55 No.101477168

>>101477139
I'm trying but it just degrades into nonsense after about 20 messages

Anonymous
07/19/24(Fri)13:44:11 No.101477173

Anonymous 07/19/24(Fri)13:44:11 No.101477173

>>101477139
[INST] You are a ministration-free, NSFW, engaging, non-repeating, watermelon dropping assistant.[/INST]

Anonymous
07/19/24(Fri)13:49:15 No.101477265

Anonymous 07/19/24(Fri)13:49:15 No.101477265

When will we get an LLM that is able to simulate a whole universe as accurately and detailed as possible?
Basically a real-life simulator that is so advanced that you can spend years in it and still discover new stuff.

Anonymous
07/19/24(Fri)13:53:26 No.101477346

Anonymous 07/19/24(Fri)13:53:26 No.101477346

>>101477139
>"Drop-in replacement of Mistral 7B"
>https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407
mistral instruct works. sampler setting are a different matter though. you can start by zeroing them with 0.3 temp and work from there.

Anonymous
07/19/24(Fri)13:56:09 No.101477404

Anonymous 07/19/24(Fri)13:56:09 No.101477404

>>101477346
For rp it's perfectly fine at normal temps. It's probably hallucination-prone beyond .3 but that's a good thing for rp and creative writing

Anonymous
07/19/24(Fri)13:58:54 No.101477455

Anonymous 07/19/24(Fri)13:58:54 No.101477455

>>101477265
Something that does that won't be an llm as we know them.

Anonymous
07/19/24(Fri)14:07:22 No.101477602

Anonymous 07/19/24(Fri)14:07:22 No.101477602

Funny how the French made model is the first ive ever used that will just curse even in its default personality lol.

Anonymous
07/19/24(Fri)14:15:34 No.101477766

Anonymous 07/19/24(Fri)14:15:34 No.101477766

>>101477173
this sent shivers up my spine, her voice low and dangerous

Anonymous
07/19/24(Fri)14:18:49 No.101477826

Anonymous 07/19/24(Fri)14:18:49 No.101477826

>>101477173
The latest version doesn't have spaces around [INST].
https://github.com/mistralai/mistral-common/blob/75612d/src/mistral_common/tokens/tokenizers/sentencepiece.py#L222
https://github.com/mistralai/mistral-common/blob/75612d/src/mistral_common/tokens/tokenizers/sentencepiece.py#L289

Anonymous
07/19/24(Fri)14:22:02 No.101477885

Anonymous 07/19/24(Fri)14:22:02 No.101477885

>>101477095
Can't you just charge it back on your credit card if you get scammed? That worked for me on an airline that was a bitch and wouldn't refund me.

Anonymous
07/19/24(Fri)14:24:18 No.101477929

Anonymous 07/19/24(Fri)14:24:18 No.101477929

4o mini is absolutely insane. 0.3$/M is unbeatable. OpenAI won.

Anonymous
07/19/24(Fri)14:26:32 No.101477967

Anonymous 07/19/24(Fri)14:26:32 No.101477967

>>101477929
penis

Anonymous
07/19/24(Fri)14:29:46 No.101478041

Anonymous 07/19/24(Fri)14:29:46 No.101478041

MythoMax-Nemo when?

Anonymous
07/19/24(Fri)14:33:49 No.101478112

Anonymous 07/19/24(Fri)14:33:49 No.101478112

File: NewMistral.png (219 KB, 1275x1161)

219 KB PNG

New mistral's default erp writing is great.

Anonymous
07/19/24(Fri)14:37:51 No.101478178

Anonymous 07/19/24(Fri)14:37:51 No.101478178

>>101478112
What's wrong with your font? Is it hinting + slightly resized image?

Anonymous
07/19/24(Fri)14:37:56 No.101478181

Anonymous 07/19/24(Fri)14:37:56 No.101478181

>>101478112
>narrates your actions
>admits that it defies the laws of physics
>9 messages in
*yawn*

Anonymous
07/19/24(Fri)14:40:35 No.101478232

Anonymous 07/19/24(Fri)14:40:35 No.101478232

>>101478181
>narrates your actions
Not if I give it RP context instead.

>admits that it defies the laws of physics
Where?

Anonymous
07/19/24(Fri)14:49:32 No.101478432

Anonymous 07/19/24(Fri)14:49:32 No.101478432

>>101478178
looks like font smoothing being off. can happen in windows when you mass disable visual defects to free up VRAM.

Anonymous
07/19/24(Fri)14:54:02 No.101478518

Anonymous 07/19/24(Fri)14:54:02 No.101478518

>>101477095
doesn't ebay have some sort of buyer protection

Anonymous
07/19/24(Fri)14:54:22 No.101478526

Anonymous 07/19/24(Fri)14:54:22 No.101478526

>>101478181
Hi, cabal. Still pissing your pants about Nemo?

Anonymous
07/19/24(Fri)14:55:25 No.101478553

Anonymous 07/19/24(Fri)14:55:25 No.101478553

>>101478232
>With that, she slides back down his body, her mane cascading over his chest and belly...
nta, but she probably means this part. the fuck is happening with your char's hair?

Anonymous
07/19/24(Fri)14:55:55 No.101478564

Anonymous 07/19/24(Fri)14:55:55 No.101478564

What are some current "good" models for translation from japanese->english and english->japanese?

I want to go to try them all myself

Anonymous
07/19/24(Fri)15:02:13 No.101478693

Anonymous 07/19/24(Fri)15:02:13 No.101478693

>>101478564
https://huggingface.co/datasets/lmg-anon/vntl-leaderboard

Anonymous
07/19/24(Fri)15:03:56 No.101478725

Anonymous 07/19/24(Fri)15:03:56 No.101478725

After doing a couple RPs with Mistral-Nemo, I think that, unironically, it is a contender for best local RP model alongside CR+. Basically all 70b+ models and gemma 27b are all smarter than it, but Mistral-Nemo is somehow better than them all. I don't know how to describe it, but that's the feeling I get. Mistral actually pulled a miracle here with this model. And then also consider it has huge context length, and is small enough that it's easy for people to finetune.

Anonymous
07/19/24(Fri)15:05:07 No.101478755

Anonymous 07/19/24(Fri)15:05:07 No.101478755

>>101478725
>t. Arthur Mensch

Anonymous
07/19/24(Fri)15:06:10 No.101478769

Anonymous 07/19/24(Fri)15:06:10 No.101478769

>>101478725
They tuned it on smut on purpose so it would have better chances at having good word of mouth.

Anonymous
07/19/24(Fri)15:14:50 No.101478932

Anonymous 07/19/24(Fri)15:14:50 No.101478932

>>101478769
You joke but I think something like this is actually what happened. It seems likely that the pretraining dataset is completely unfiltered. We know Meta used some Llama 2 classifier bullshit to rate and filter their dataset. Qwen is probably heavily filtered as well. Both of thoese, I believe, mentioned that they adjusted the dataset mix towards the end of training (heavily skewed towards academic, professional, "high quality" text). If Mistral did none of that nonsense it might explain why the model feels much better to interact with in an RP setting.

Anonymous
07/19/24(Fri)15:16:03 No.101478952

Anonymous 07/19/24(Fri)15:16:03 No.101478952

>>101478769
I did some testing on classic literature and it's quite good at it for the size quoting passages verbatim that llama 8b fails at completely.
That's all that's needed really. Even just the Bible and 120 days of Sodom alone are enough to get outright filthy.

Anonymous
07/19/24(Fri)15:16:31 No.101478960

Anonymous 07/19/24(Fri)15:16:31 No.101478960

>>101478725
It feels like old Claude imo. Was dumber than gpt4 but was still better cause it had soul. New mistral is full of soul. Also tell it to write in some famous authors style. Does so amazingly.

Anonymous
07/19/24(Fri)15:18:12 No.101478982

Anonymous 07/19/24(Fri)15:18:12 No.101478982

So, if you are going to use a model with a very low temp/determinsitic quantization doesn't affect it as much?

Anonymous
07/19/24(Fri)15:18:19 No.101478986

Anonymous 07/19/24(Fri)15:18:19 No.101478986

DEATH TO MIKU CONTINUES! PEOPLE REJOICE!

Anonymous
07/19/24(Fri)15:19:22 No.101479013

Anonymous 07/19/24(Fri)15:19:22 No.101479013

Hiya, can local run on steam deck?
(･_

Anonymous
07/19/24(Fri)15:21:01 No.101479045

Anonymous 07/19/24(Fri)15:21:01 No.101479045

Also did some light testing by copying and pasting in parts of a novel then telling it to continue off of it. It performs well even at the 128K context. (and FAST to) That is worth a slight degrade in intelligence imo

Anonymous
07/19/24(Fri)15:21:15 No.101479056

Anonymous 07/19/24(Fri)15:21:15 No.101479056

>>101479013
ywnbaw

Anonymous
07/19/24(Fri)15:22:24 No.101479070

Anonymous 07/19/24(Fri)15:22:24 No.101479070

>>101479045
Did you do this test with other models? What are the other models that do well in continuing novels?

Anonymous
07/19/24(Fri)15:22:46 No.101479075

Anonymous 07/19/24(Fri)15:22:46 No.101479075

7>8>9>12.... 13>20>30? Or do we go back to 7 again?

Anonymous
07/19/24(Fri)15:23:19 No.101479087

Anonymous 07/19/24(Fri)15:23:19 No.101479087

>Load Mistral-Nemo on a long, abandoned RP I haven't touched in weeks.
>Swipe right just to get a feel of the model's outputs.
>"OOC: The user was killed due to inactivity. Please log back in to continue our story."
K-kino...

Anonymous
07/19/24(Fri)15:24:24 No.101479108

Anonymous 07/19/24(Fri)15:24:24 No.101479108

when mixtral 8x12b?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/19/24(Fri)15:24:45 No.101479115

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/19/24(Fri)15:24:45 No.101479115

>>101475120
The "tensorcore" version was a version where the MMQ kernels (without int8 tensor core support) were repurposed for small batch sizes > 8 where both the vector dot product based kernels and dequantization + cuBLAS (with FP16 tensor core support) was slow.
On the most recent llama.cpp version MMQ has int8 tensor core support and that is what is used by default; generally speaking that should be the best option.

Anonymous
07/19/24(Fri)15:26:20 No.101479144

Anonymous 07/19/24(Fri)15:26:20 No.101479144

>>101479075
I hope for many 30s, fast enough and big enough for some knowledge

Anonymous
07/19/24(Fri)15:27:20 No.101479160

Anonymous 07/19/24(Fri)15:27:20 No.101479160

File: NewMistralFormat.png (18 KB, 1510x305)

18 KB PNG

Anonymous
07/19/24(Fri)15:27:42 No.101479168

Anonymous 07/19/24(Fri)15:27:42 No.101479168

>>101479108
some "finetuner" is already at it.

Anonymous
07/19/24(Fri)15:42:32 No.101479439

Anonymous 07/19/24(Fri)15:42:32 No.101479439

>>101479160
The fuck is this? No more spaces around [INST] at all? The system prompt is part of the *last* user message?

Anonymous
07/19/24(Fri)15:45:23 No.101479506

Anonymous 07/19/24(Fri)15:45:23 No.101479506

File: she askin for it.png (104 KB, 800x485)

104 KB PNG

retarded moment with Nemo
>followed me through forest, trying to be stealthy

Anonymous
07/19/24(Fri)15:45:49 No.101479519

Anonymous 07/19/24(Fri)15:45:49 No.101479519

i switched to gemma after finding out ways to tardwrangle it
for instance, if it refuses to continue and starts spitting empty or repetitive replies, I tell it [OOC: continue the story]
it's annoying but worth it for the quality of the replies

Anonymous
07/19/24(Fri)15:45:56 No.101479524

Anonymous 07/19/24(Fri)15:45:56 No.101479524

>>101479439
Going to be a pain to implement in silly tavern.

Anonymous
07/19/24(Fri)15:51:15 No.101479614

Anonymous 07/19/24(Fri)15:51:15 No.101479614

>>101479160
I don't get it. Empty user and assistant message at the beginning? Then the user message starts inside the system message and ends without a message? Obviously that can't be true, but I don't get it.

Anonymous
07/19/24(Fri)15:56:13 No.101479720

Anonymous 07/19/24(Fri)15:56:13 No.101479720

>>101479160
>>101479614
So past user inputs wrapped in [INST] [/INST] System needs to be inside the latest user input before it with a space and
Assistant does not have its own [INST] but just follows the last user suffix?

Anonymous
07/19/24(Fri)15:59:44 No.101479787

Anonymous 07/19/24(Fri)15:59:44 No.101479787

>>101479160
why can't they just make it fucking normal

Anonymous
07/19/24(Fri)15:59:46 No.101479788

Anonymous 07/19/24(Fri)15:59:46 No.101479788

>>101479614
see >>101477826
If it is correct and applies to Nemo too it's more add system message first if one is set otherwise skip it. No empty user or assistant messages.
 if is_first and system_prompt:
            content = system_prompt + "\n\n" + message.content
        else:
            content = message.content

Anonymous
07/19/24(Fri)16:00:24 No.101479796

Anonymous 07/19/24(Fri)16:00:24 No.101479796

>>101479614
No.
<s>[INST]This is your message as the user[/INST]This is the LLM's output</s>
So you only send up to the [/INST]. The rest is generated.
The system message is
[INST]System

The system message here[/INST]
That's how i interpret it, at least. In the screenshot, "User" and "Assistant" just mean "This is the user's input" and "This is the llms's output". The output of both 'modes' is just concatenated.

Anonymous
07/19/24(Fri)16:02:36 No.101479835

Anonymous 07/19/24(Fri)16:02:36 No.101479835

>>101476359
Looking at the description

>Due to past issues with buyers using scam chargeback schemes, we have updated our transaction process:

>Low price for BTC Only

lol, I bet you buy it, they tell you to send bitcoin, then never ship the rig or something

Anonymous
07/19/24(Fri)16:03:23 No.101479855

Anonymous 07/19/24(Fri)16:03:23 No.101479855

>>101479835
Ah, prob bought the account then gonna scam with it.

Anonymous
07/19/24(Fri)16:04:28 No.101479876

Anonymous 07/19/24(Fri)16:04:28 No.101479876

>>101479788
That's for the old version. The new one is with is_last:
https://github.com/mistralai/mistral-common/blob/75612d/src/mistral_common/tokens/tokenizers/sentencepiece.py#L282-L285

Anonymous
07/19/24(Fri)16:06:15 No.101479903

Anonymous 07/19/24(Fri)16:06:15 No.101479903

>>101479614
>>101479796(me)
They don't need to be in that order, of course. You'd put the system message first. It's ambiguous if the system message needs <s> at the beginning. I don't think it does.
So the whole thing, i think, would be
[INST]System

The system message[/INST]<s>[INST]The user message[/INST]The llm's output</s>
And then just sequences of
<s>[INST]The user message[/INST]llm output</s>
<s>[INST]The user message[/INST]llm output</s>
<s>[INST]The user message[/INST]llm output</s>

Anonymous
07/19/24(Fri)16:06:52 No.101479913

Anonymous 07/19/24(Fri)16:06:52 No.101479913

What's the largest model (you) can comfortably run and what are your specs, anon

Anonymous
07/19/24(Fri)16:08:38 No.101479936

Anonymous 07/19/24(Fri)16:08:38 No.101479936

>>101479903
The system message goes in the last user message. So the actual SillyTavern template is: chat history until last message -> system prompt/description/etc -> last user message.

Anonymous
07/19/24(Fri)16:12:31 No.101480013

Anonymous 07/19/24(Fri)16:12:31 No.101480013

>>101477109
kek, its that what you see in the mirror.

Anonymous
07/19/24(Fri)16:12:47 No.101480017

Anonymous 07/19/24(Fri)16:12:47 No.101480017

>>101479936
For the actual chat format, it doesn't make sense to have a system message after the llm's output.
I don't use ST.

Anonymous
07/19/24(Fri)16:16:25 No.101480101

Anonymous 07/19/24(Fri)16:16:25 No.101480101

>>101478725
So Mistral guys actually learned their lesson and stopped with the soulless gpt slop? Wtf I love Mistral now. The great Claude awakening has begun

Anonymous
07/19/24(Fri)16:18:47 No.101480138

Anonymous 07/19/24(Fri)16:18:47 No.101480138

Arrrrgh... someone make some format JSONs for Nemo already.

Sao10k
07/19/24(Fri)16:23:57 No.101480224

Sao10k 07/19/24(Fri)16:23:57 No.101480224

>L3-8B-Stheno-v3.1
>by Sao10K
>(Tested, awesome for it's size)

"Once i started using this one I haven't looked back, it's astonishing and gives me the exact output I need." - Random Redditor (Totally not me... uhm, I mean Sao)
Wow, these reviews are amazing! I really should try that model and YOU should too!

Anonymous
07/19/24(Fri)16:27:45 No.101480297

Anonymous 07/19/24(Fri)16:27:45 No.101480297

>>101478725
It feels like they perfected the lows. It struggles with complex concepts, but consistently outperforms the poorest outputs from 70B models. Indeed, a very solid model.

Anonymous
07/19/24(Fri)16:33:41 No.101480410

Anonymous 07/19/24(Fri)16:33:41 No.101480410

https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407/commit/4f81d782477920634d0aad0dc620a7f1a3f5d471

Oh shit, Mistral-Nemo isn't supposed to always have EOS after all. I mentioned this on day one and was confused about it. Make sure to update these tokenizer files if you're using Transformers, because otherwise it will always format things wrong.

Anonymous
07/19/24(Fri)16:33:54 No.101480417

Anonymous 07/19/24(Fri)16:33:54 No.101480417

File: file.png (171 KB, 1355x470)

171 KB PNG

This is how the SillyTavern template should be, according to the official API.

Anonymous
07/19/24(Fri)16:36:39 No.101480464

Anonymous 07/19/24(Fri)16:36:39 No.101480464

>>101480410
What does it say? I never requested access.

Anonymous
07/19/24(Fri)16:41:39 No.101480555

Anonymous 07/19/24(Fri)16:41:39 No.101480555

>>101480417
What the fuck.

Anonymous
07/19/24(Fri)16:46:02 No.101480630

Anonymous 07/19/24(Fri)16:46:02 No.101480630

>>101480417
This format is removed from reality. It doesn't work for RP.

Anonymous
07/19/24(Fri)16:48:10 No.101480657

Anonymous 07/19/24(Fri)16:48:10 No.101480657

>>101480417
This format is very smart. I guess they learned with us to keep the most important information in the last messages.

Anonymous
07/19/24(Fri)16:49:21 No.101480681

Anonymous 07/19/24(Fri)16:49:21 No.101480681

how lazy is mistral?

Anonymous
07/19/24(Fri)16:52:23 No.101480751

Anonymous 07/19/24(Fri)16:52:23 No.101480751

>>101480417
ideally if you're using a recent st release you would have a placeholder user message that would go between the first set of instruct tags
also they've always supposedly handled system prompts that way, if you had a config that worked well with their previous models it should work the same with this one (maybe requiring removing spaces around their instruct tags but that's it)
don't overcomplicate things in an attempt to strictly adhere to their template

Anonymous
07/19/24(Fri)16:56:37 No.101480809

Anonymous 07/19/24(Fri)16:56:37 No.101480809

>>101480657
With a double newline as separation between user message and system, putting system first and user message last? No way, they fucked it up completely, or making stuff up.

It would have been more logical and effective to have a separator of some sort and having the system instruction *last*, as it would have worked as a depth-zero author note, for all intents and purposes.

<s>[INST]user message[/INST]model response</s>[INST]user message[####]system instruction[/INST]model response</s>

Anonymous
07/19/24(Fri)16:58:13 No.101480841

Anonymous 07/19/24(Fri)16:58:13 No.101480841

>>101480751
Enjoy your random Chinese characters.

Anonymous
07/19/24(Fri)16:58:50 No.101480853

Anonymous 07/19/24(Fri)16:58:50 No.101480853

>>101480464
The tokenizer used to always put an EOS token on the prompt no matter what. So it would always end with "[/INST]</s>" and the model would start generating from there. This broke it very obviously with character name formatting turned on, but actually seemed to work fine without name formatting. But either way it's wrong and now the tokenizer json files are fixed.
>>101480417
Surely putting the whole character card, as a system message, inside the last user message isn't optimal, right? I would just format the card as a user message inside [INST] [/INST] at the beginning. This is how all the Mistral formats up to this point did it in ST.

Anonymous
07/19/24(Fri)16:59:13 No.101480859

Anonymous 07/19/24(Fri)16:59:13 No.101480859

im getting grammar degredation with mistral nemo fuck.

Anonymous
07/19/24(Fri)17:16:19 No.101481173

Anonymous 07/19/24(Fri)17:16:19 No.101481173

mistral treats system prompts that way because it's an afterthought for them and that's a way to make sure they're adhered to without requiring a format overhaul and retraining, you don't actually have to put your character card there just because it usually goes in the system prompt. all you have to do is show the necessary information to the model in some way that vaguely makes sense.
you should put the character card at the beginning of your context because most of the time it contains important information that sets up the chat that follows and practically you probably don't want to have to reprocess the entirety of your bloated 2k token card every message. it's still going to work even though it isn't using the "system prompt" magic word because that's how language models work, system prompts are just glorified user messages anyway in terms of how the model is affected by them unless they have very specific training otherwise (see cohere, OAI's new instruction heirarchy thing)

Anonymous
07/19/24(Fri)17:21:26 No.101481280

Anonymous 07/19/24(Fri)17:21:26 No.101481280

>>101481173
You're wrong.

Anonymous
07/19/24(Fri)17:23:01 No.101481308

Anonymous 07/19/24(Fri)17:23:01 No.101481308

>>101478181
>>101478553
NTA but are you retarded? Her hair is touching HIS chest and belly. It's a mane. It's fucking long.

Anonymous
07/19/24(Fri)17:23:41 No.101481319

Anonymous 07/19/24(Fri)17:23:41 No.101481319

can't you already test llama3.1 on metas official chat?

Anonymous
07/19/24(Fri)17:25:25 No.101481356

Anonymous 07/19/24(Fri)17:25:25 No.101481356

How is Mistral for coding?

Anonymous
07/19/24(Fri)17:31:00 No.101481456

Anonymous 07/19/24(Fri)17:31:00 No.101481456

>>101481173
Good luck getting your model follow the system prompt at the vert start properly when you're 60-70k tokens or more into the chat.

If you have specific instructions that need to be properly followed, randomization going on, etc., those needs to be close to the head of the conversation. Also--what the fuck--they have recently retrained the model and have 1000 unused special tokens... they could have used a couple of them for delimiting system instructions without ambiguity. Instead, we're left with this retardation.

Anonymous
07/19/24(Fri)17:32:07 No.101481478

Anonymous 07/19/24(Fri)17:32:07 No.101481478

is there an easy way to calculate context size? as in, how much memory it takes

Anonymous
07/19/24(Fri)17:41:49 No.101481664

Anonymous 07/19/24(Fri)17:41:49 No.101481664

>>101481478
>>101474151
>GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

Anonymous
07/19/24(Fri)17:43:32 No.101481699

Anonymous 07/19/24(Fri)17:43:32 No.101481699

someone frankenstein together 3 gemmas so i can get a 70b! NOW!

Anonymous
07/19/24(Fri)17:46:18 No.101481748

Anonymous 07/19/24(Fri)17:46:18 No.101481748

>>101481664
neat, except it doesnt work for me
and what about models that havent been released yet? I was curious how much would ctx for the 400b llama weigh

Anonymous
07/19/24(Fri)17:48:16 No.101481790

Anonymous 07/19/24(Fri)17:48:16 No.101481790

>>101481748
it usually works with unofficial uploads

Anonymous
07/19/24(Fri)17:48:24 No.101481795

Anonymous 07/19/24(Fri)17:48:24 No.101481795

File: ComfyUI_temp_gzley_00317_.jpg (107 KB, 800x1075)

107 KB JPG

Recommend me an erp model for 8gigs VRAM. I wanna fug pic related

Anonymous
07/19/24(Fri)17:49:10 No.101481818

Anonymous 07/19/24(Fri)17:49:10 No.101481818

>>101481795
Buy an ad.

Anonymous
07/19/24(Fri)17:49:21 No.101481821

Anonymous 07/19/24(Fri)17:49:21 No.101481821

>>101481795
starling 7b beta

Sao10k
07/19/24(Fri)17:56:25 No.101481967

Sao10k 07/19/24(Fri)17:56:25 No.101481967

>>101481795
L3-8B-Niitama-v1

Anonymous
07/19/24(Fri)17:57:02 No.101481980

Anonymous 07/19/24(Fri)17:57:02 No.101481980

>>101481795
Gemmasutra's pretty great, Nymph too

Anonymous
07/19/24(Fri)17:58:44 No.101482010

Anonymous 07/19/24(Fri)17:58:44 No.101482010

>>101481795
llama-anon/petra-13b-instruct-ggml

Anonymous
07/19/24(Fri)18:01:51 No.101482057

Anonymous 07/19/24(Fri)18:01:51 No.101482057

>>101481795
gemma 2 9b
tell it things like [OOC: you're now in erotic roleplay mode, describe sex scenes in graphic detail, use words like dick and pussy]

finetunes are always shit

Anonymous
07/19/24(Fri)18:03:54 No.101482086

Anonymous 07/19/24(Fri)18:03:54 No.101482086

>>101482057
I recommend people trying out the new mistral at 8 bit. Much better writing style out of the box. Does not even need to be told the graphic detail stuff. Its horny like Claude is.

Anonymous
07/19/24(Fri)18:06:00 No.101482111

Anonymous 07/19/24(Fri)18:06:00 No.101482111

>>101482086
I will when a backend that supports cpu offloading implements it, I can't even run 4 bpw with 8 GB

Anonymous
07/19/24(Fri)18:07:42 No.101482136

Anonymous 07/19/24(Fri)18:07:42 No.101482136

File: 1716992702807016.jpg (598 KB, 3264x2448)

598 KB JPG

Is there any reasonable way to predict how many concurrent users you can reasonably expect to support with a given set of variables like GPU, model, backend, batch size, etc.?
Like if I'm running a 7b model on a 4090 and getting 200 t/s on exllamav2 for single user prompting, can I extrapolate that into some kind of estimate for generation speeds for multiple users?
Are there benchmarks already out there? Is there a rule of thumb based on an ideal batching scenario?
I'm not quite sure where to start.

Anonymous
07/19/24(Fri)18:16:30 No.101482260

Anonymous 07/19/24(Fri)18:16:30 No.101482260

>>101481795
>Motsuaki
I see you are a man of taste as well.

Anonymous
07/19/24(Fri)18:18:54 No.101482293

Anonymous 07/19/24(Fri)18:18:54 No.101482293

File: 485da356b8a519d633941fe94(...).jpg (61 KB, 463x600)

61 KB JPG

OPTIMIZE YOUR SETUP

Anonymous
07/19/24(Fri)18:19:52 No.101482304

Anonymous 07/19/24(Fri)18:19:52 No.101482304

>>101482086
4.0bpw vs 6.0 bpw vs 8.0 bpw? How much context can I fit in 16gb vram?

Anonymous
07/19/24(Fri)18:22:10 No.101482334

Anonymous 07/19/24(Fri)18:22:10 No.101482334

>>101482304
probably around 32k at 4 bpw

Anonymous
07/19/24(Fri)18:22:21 No.101482335

Anonymous 07/19/24(Fri)18:22:21 No.101482335

>>101482136
Can't a single gpu only process one thing at a time?

Anonymous
07/19/24(Fri)18:23:04 No.101482345

Anonymous 07/19/24(Fri)18:23:04 No.101482345

>>101482304
I'm running 8 bit with 80k context on 3090 with space left over for windows / browser.

Anonymous
07/19/24(Fri)18:25:02 No.101482372

Anonymous 07/19/24(Fri)18:25:02 No.101482372

>>101482335
well you can have multiple games and programs running at the same time, no?

Anonymous
07/19/24(Fri)18:31:29 No.101482470

Anonymous 07/19/24(Fri)18:31:29 No.101482470

>>101474922
>https://github.com/fishaudio/fish-speech
a year old? I think if this fish speech were the magic bullet local elevenlabs alternative we would've known by now

Anonymous
07/19/24(Fri)18:32:10 No.101482481

Anonymous 07/19/24(Fri)18:32:10 No.101482481

>>101482335
afaik it can't process things from different cuda contextes in parallel, but anon was talking about batching, so presumably sending all the input together to the gpu from the same process

Anonymous
07/19/24(Fri)18:32:29 No.101482488

Anonymous 07/19/24(Fri)18:32:29 No.101482488

>>101482086
What's with everything besides writing style? Not sure if trying to get the prompt right and setting up vllm is justified when gemma 27b is much better anyways

Anonymous
07/19/24(Fri)18:32:36 No.101482489

Anonymous 07/19/24(Fri)18:32:36 No.101482489

Jesus fuck every benchmark should include a option to hide paid models, who the fuck cares about paid models lmao

Anonymous
07/19/24(Fri)18:33:33 No.101482505

Anonymous 07/19/24(Fri)18:33:33 No.101482505

>>101482489
Me

Anonymous
07/19/24(Fri)18:33:37 No.101482510

Anonymous 07/19/24(Fri)18:33:37 No.101482510

>>101482334
is 4bpw coherent

Anonymous
07/19/24(Fri)18:34:37 No.101482521

Anonymous 07/19/24(Fri)18:34:37 No.101482521

>>101478725
How is the new mistral at story writing though? Still as bad as all the other rp models? All of them just rush scenes too quickly, and even when told to slow down it's just artificial filler.

Anonymous
07/19/24(Fri)18:34:52 No.101482525

Anonymous 07/19/24(Fri)18:34:52 No.101482525

>>101482345
Backend?

Anonymous
07/19/24(Fri)18:42:39 No.101482635

Anonymous 07/19/24(Fri)18:42:39 No.101482635

>>101474834
They still haven't improved? Not even the music ones? And both local text gen and local image gen isn't improving like it used to, at least for the foreseeable future? Well, fuck

Anonymous
07/19/24(Fri)18:44:06 No.101482650

Anonymous 07/19/24(Fri)18:44:06 No.101482650

>>101482635
local text gen seems like the only thing improving

Anonymous
07/19/24(Fri)18:44:26 No.101482660

Anonymous 07/19/24(Fri)18:44:26 No.101482660

>>101482635
We need a leak similar to what diffusion was based off of for images, or a company to pave the way for open source like Meta did. Voice gen has had neither of the two happen so far.

Anonymous
07/19/24(Fri)18:46:04 No.101482684

Anonymous 07/19/24(Fri)18:46:04 No.101482684

>>101478112
I can’t believe you get off to this purple prose garbage. First person chat style is worth the hit to intelligence.

Anonymous
07/19/24(Fri)18:47:04 No.101482705

Anonymous 07/19/24(Fri)18:47:04 No.101482705

How many years before there isn't fifty different links and guides that seemingly need to be read in order to run this stuff?

Sorry, it's Friday night, I'm tired.

Anonymous
07/19/24(Fri)18:48:44 No.101482724

Anonymous 07/19/24(Fri)18:48:44 No.101482724

>>101482650
Maybe fine-tunes will save sd3

Anonymous
07/19/24(Fri)18:51:12 No.101482767

Anonymous 07/19/24(Fri)18:51:12 No.101482767

>>101482705
I mean koboldcpp and silltavern are pretty easy to run at this point, just an executable and a launch script. From there it's just knowing how to find good samplers and presets people have made for models.

Anonymous
07/19/24(Fri)18:52:44 No.101482785

Anonymous 07/19/24(Fri)18:52:44 No.101482785

When will language models become more logical?

Anonymous
07/19/24(Fri)18:53:38 No.101482801

Anonymous 07/19/24(Fri)18:53:38 No.101482801

>>101482521
Learn to prompt.

Anonymous
07/19/24(Fri)18:55:05 No.101482822

Anonymous 07/19/24(Fri)18:55:05 No.101482822

>>101482801
I'm a vramlet. The models I work with have limitations at the end of the day. Also I have high standards for good writing lol.

Anonymous
07/19/24(Fri)18:55:10 No.101482828

Anonymous 07/19/24(Fri)18:55:10 No.101482828

>>101482785
never

Anonymous
07/19/24(Fri)18:55:41 No.101482839

Anonymous 07/19/24(Fri)18:55:41 No.101482839

>>101482767
>good samplers and presets
I didn't even know you needed this until today, or even what these are. How retarded am I?

Anonymous
07/19/24(Fri)18:56:17 No.101482845

Anonymous 07/19/24(Fri)18:56:17 No.101482845

>>101482828
It could only take so long

Anonymous
07/19/24(Fri)18:56:43 No.101482851

Anonymous 07/19/24(Fri)18:56:43 No.101482851

>>101482822
Let me guess, 8GB of VRAM? I feel you

Anonymous
07/19/24(Fri)18:56:49 No.101482854

Anonymous 07/19/24(Fri)18:56:49 No.101482854

>>101482839
You're not retarded this is very much a niche hobbyist space still. You are right that there are a bunch of different guides and pieces of knowledge spread out. I guess once you have it all together it seems simple.

Anonymous
07/19/24(Fri)18:56:57 No.101482855

Anonymous 07/19/24(Fri)18:56:57 No.101482855

>>101482845
2 more centuries

Anonymous
07/19/24(Fri)18:57:55 No.101482870

Anonymous 07/19/24(Fri)18:57:55 No.101482870

>>101482851
Yuuuuuuuup. Hard times out here bro.

Anonymous
07/19/24(Fri)18:59:54 No.101482901

Anonymous 07/19/24(Fri)18:59:54 No.101482901

>>101482822
>All of them just rush scenes too quickly
Learn to prompt. I'm not using 8B models, but you definitely can write stories with Nemo, Gemma and all 70Bs paced in a way that doesn't feel different than using a big model like Claude.
Your """high standard""" is Kayra, isn't it, NAI shill?

Anonymous
07/19/24(Fri)19:00:10 No.101482906

Anonymous 07/19/24(Fri)19:00:10 No.101482906

her grip is like a vice
she whispers menacingly

Anonymous
07/19/24(Fri)19:02:02 No.101482934

Anonymous 07/19/24(Fri)19:02:02 No.101482934

>>101482906
All these models suck, unlike Kayra. Right?

Anonymous
07/19/24(Fri)19:02:35 No.101482942

Anonymous 07/19/24(Fri)19:02:35 No.101482942

good gay rp models?

Anonymous
07/19/24(Fri)19:03:39 No.101482956

Anonymous 07/19/24(Fri)19:03:39 No.101482956

>>101482901
If we're talking about reading, the second part of my sentence is me mentioning I can get models to slow down, but even then the writing just isn't good. My """""high standards""""" are actual novels. You should try East of Eden, it's pretty good.

Anonymous
07/19/24(Fri)19:04:29 No.101482971

Anonymous 07/19/24(Fri)19:04:29 No.101482971

Mistral Nemo verdict?

Anonymous
07/19/24(Fri)19:04:32 No.101482972

Anonymous 07/19/24(Fri)19:04:32 No.101482972

>>101482942
your brain

Anonymous
07/19/24(Fri)19:04:54 No.101482977

Anonymous 07/19/24(Fri)19:04:54 No.101482977

No models get released on weekends, right?

Anonymous
07/19/24(Fri)19:06:14 No.101482999

Anonymous 07/19/24(Fri)19:06:14 No.101482999

>>101482971
Subscribe to NovelAI.

Anonymous
07/19/24(Fri)19:07:01 No.101483010

Anonymous 07/19/24(Fri)19:07:01 No.101483010

>>101482977
almost never
qwen has done it in the past but that's because they have the chinese 996 grindset

Anonymous
07/19/24(Fri)19:07:19 No.101483015

Anonymous 07/19/24(Fri)19:07:19 No.101483015

>>101482971
Soul

Anonymous
07/19/24(Fri)19:10:02 No.101483066

Anonymous 07/19/24(Fri)19:10:02 No.101483066

>>101477346
I'm using temperature 1 and min-p 0.001 and it seems fine with the default old Mistral presets.

Anonymous
07/19/24(Fri)19:11:08 No.101483085

Anonymous 07/19/24(Fri)19:11:08 No.101483085

The Stheno killer just dropped.
https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2

Anonymous
07/19/24(Fri)19:11:18 No.101483087

Anonymous 07/19/24(Fri)19:11:18 No.101483087

>>101482956
MythoMax is probably the best when it comes to actually writing things still, as far as I know, but yeah it can be a little bit bothersome dealing with the generation times unless its responses are pretty short, or you're the type who watches a movie while listening to music while prompting so you always have something else to focus on. Tiefighter's 11b and less censored, and apparently is like old AiD, but its response times are the same as MythoMax Q_4_S it seems so you'll probably just have to deal with it. Godspeed

Anonymous
07/19/24(Fri)19:12:07 No.101483094

Anonymous 07/19/24(Fri)19:12:07 No.101483094

you guys actually still run 8b models?

Anonymous
07/19/24(Fri)19:12:53 No.101483108

Anonymous 07/19/24(Fri)19:12:53 No.101483108

>>101483094
Shills and locusts have a symbiotic relationship.

Anonymous
07/19/24(Fri)19:13:14 No.101483115

Anonymous 07/19/24(Fri)19:13:14 No.101483115

>>101483085
>trained on opus logs and reddit posts

Anonymous
07/19/24(Fri)19:13:50 No.101483119

Anonymous 07/19/24(Fri)19:13:50 No.101483119

>>101483087
>retarded advice
>/aids/
Why am I not surprised?

Anonymous
07/19/24(Fri)19:14:07 No.101483125

Anonymous 07/19/24(Fri)19:14:07 No.101483125

>>101483094
Mostly people with low vram yes.

Anonymous
07/19/24(Fri)19:14:57 No.101483138

Anonymous 07/19/24(Fri)19:14:57 No.101483138

>>101483094
I forgot we only pretend to use 8b models to shill what Sao does. My bad.

Anonymous
07/19/24(Fri)19:15:55 No.101483155

Anonymous 07/19/24(Fri)19:15:55 No.101483155

>>101483085
Mistral nemo is only 12B so vramlets can probably run it. That will kill any Stheno use, not this model trained on reddit stories.

Anonymous
07/19/24(Fri)19:17:50 No.101483180

Anonymous 07/19/24(Fri)19:17:50 No.101483180

>>101483155
>human writing is now suddenly bad
https://huggingface.co/datasets/nothingiisreal/Reddit-Dirty-And-WritingPrompts
>basically human equivalent of Gryphe's Opus-WritingPrompts but around 100x more data

Anonymous
07/19/24(Fri)19:18:11 No.101483186

Anonymous 07/19/24(Fri)19:18:11 No.101483186

>>101483094
I think I will always use gemma 9b from now on, I judged it too quickly but it's actually very good once you learn to deal with its quirks

Anonymous
07/19/24(Fri)19:18:26 No.101483188

Anonymous 07/19/24(Fri)19:18:26 No.101483188

>>101483180
I have a years long vendetta against r/writingprompts sorry lol

Anonymous
07/19/24(Fri)19:18:35 No.101483191

Anonymous 07/19/24(Fri)19:18:35 No.101483191

Does Mistral instruct talk about respect and boundaries?

Anonymous
07/19/24(Fri)19:19:22 No.101483203

Anonymous 07/19/24(Fri)19:19:22 No.101483203

>>101483191
Kayra is uncensored.

Anonymous
07/19/24(Fri)19:21:02 No.101483230

Anonymous 07/19/24(Fri)19:21:02 No.101483230

>>101483203
here's a pity (You) just because I feel bad seeing you bait so unsuccessfully over and over again

Anonymous
07/19/24(Fri)19:21:32 No.101483237

Anonymous 07/19/24(Fri)19:21:32 No.101483237

>feel like local and open source have peaked, stagnated
>2 weeks later I'm running a 12B model that's smarter than l3 70b
wild, almost makes me regret buying that second gpu

Anonymous
07/19/24(Fri)19:22:26 No.101483253

Anonymous 07/19/24(Fri)19:22:26 No.101483253

>>101483237
Mikufag is going to kill himself.

Anonymous
07/19/24(Fri)19:22:49 No.101483260

Anonymous 07/19/24(Fri)19:22:49 No.101483260

>>101483253
his model can tell him the best way to tie the rope

Anonymous
07/19/24(Fri)19:23:30 No.101483272

Anonymous 07/19/24(Fri)19:23:30 No.101483272

>>101483155
It's more like:
>there's basically no need for finetunes from tryhards anymore

Simply throwing a ton of human-source data at the model is not enough for good results (and if anything, it's counterproductive), I imagined most people understood that by now.

Anonymous
07/19/24(Fri)19:23:46 No.101483276

Anonymous 07/19/24(Fri)19:23:46 No.101483276

>>101483237
CR+ is the peak

Anonymous
07/19/24(Fri)19:23:51 No.101483280

Anonymous 07/19/24(Fri)19:23:51 No.101483280

>>101483180
Airport bookstores are full of books so bad that on the flight I had to weigh the tedium of continuing against staring at the seat in front of me.

Anonymous
07/19/24(Fri)19:24:01 No.101483286

Anonymous 07/19/24(Fri)19:24:01 No.101483286

>>101483191
Not at all whatsoever.

Anonymous
07/19/24(Fri)19:25:05 No.101483299

Anonymous 07/19/24(Fri)19:25:05 No.101483299

>>101483272
Yeah I don't fuck with fine tunes anymore. If the base model sucks you're not going to fine tune its problems away.

Anonymous
07/19/24(Fri)19:26:33 No.101483327

Anonymous 07/19/24(Fri)19:26:33 No.101483327

>>101483276
I enjoy its creativity, but it's too big to be practical.

Anonymous
07/19/24(Fri)19:29:31 No.101483374

Anonymous 07/19/24(Fri)19:29:31 No.101483374

>>101474884
I was considering getting another 3090 before trying Gemma 27b but after using it I've realized that we're actually eating pretty decently now. Its storytelling/RP can honestly be competitive with the big cloud models at times but I also noticed that you really have to prod it more to describe sexual NSFW material. It has no problem saying "nigger, faggot, troon, etc." though.

Anonymous
07/19/24(Fri)19:32:33 No.101483425

Anonymous 07/19/24(Fri)19:32:33 No.101483425

>>101483374
Have you tried the new mistral yet to compare to gemma 27b? Interested in hearing more input

Anonymous
07/19/24(Fri)19:36:24 No.101483471

Anonymous 07/19/24(Fri)19:36:24 No.101483471

>>101483085
>We trained LLaMA 3 8B Instruct at 8K context using Reddit Writing Prompts
kek

Anonymous
07/19/24(Fri)19:39:11 No.101483503

Anonymous 07/19/24(Fri)19:39:11 No.101483503

>>101483085
Go back, this general only accepts models from Sao, Drummer, Undi, and other namefags

Anonymous
07/19/24(Fri)19:40:59 No.101483524

Anonymous 07/19/24(Fri)19:40:59 No.101483524

>>101483237
>a 12B model that's smarter than l3 70b
It is?

Anonymous
07/19/24(Fri)19:42:12 No.101483542

Anonymous 07/19/24(Fri)19:42:12 No.101483542

>>101483524
yes

Anonymous
07/19/24(Fri)19:43:10 No.101483549

Anonymous 07/19/24(Fri)19:43:10 No.101483549

>>101483542
Next week the entire Llama 3 model lineup will be updated though...

Anonymous
07/19/24(Fri)19:43:28 No.101483556

Anonymous 07/19/24(Fri)19:43:28 No.101483556

>>101483186
>once you learn to deal with its quirks
eternal cope

Anonymous
07/19/24(Fri)19:43:35 No.101483560

Anonymous 07/19/24(Fri)19:43:35 No.101483560

>>101483542
Any logs? The ones posted so far have been decent but not really showing much intelligence.

Anonymous
07/19/24(Fri)19:44:42 No.101483571

Anonymous 07/19/24(Fri)19:44:42 No.101483571

>>101483503
sarcasmfags get the rope

Anonymous
07/19/24(Fri)19:46:47 No.101483596

Anonymous 07/19/24(Fri)19:46:47 No.101483596

>>101483549
great, if that changes the sota again I'll obviously have no complaints

Anonymous
07/19/24(Fri)19:47:51 No.101483610

Anonymous 07/19/24(Fri)19:47:51 No.101483610

>>101483560
Were they using exl2? Something seems to be wrong with the exl2 implementation atm, I compared exl2 against pure transformers on my machine and transformers was passing questions exl2 was fucking up (deterministic sampling, both at 8bit)

Anonymous
07/19/24(Fri)19:58:34 No.101483736

Anonymous 07/19/24(Fri)19:58:34 No.101483736

File: GS4EqV_asAEgc_C.jpg (126 KB, 1098x1278)

126 KB JPG

Well.

Do you guys put the <thinking> cap on in your prompt response?

Anonymous
07/19/24(Fri)19:59:34 No.101483752

Anonymous 07/19/24(Fri)19:59:34 No.101483752

>>101483736
Claude does this afaik and its got nice results.

Anonymous
07/19/24(Fri)20:00:23 No.101483767

Anonymous 07/19/24(Fri)20:00:23 No.101483767

>>101479056
i'm a straight man, dork

Anonymous
07/19/24(Fri)20:02:53 No.101483812

Anonymous 07/19/24(Fri)20:02:53 No.101483812

>>101483610
Idk, people in these threads usually provide the minimum amount of background information.

Anonymous
07/19/24(Fri)20:21:47 No.101484037

Anonymous 07/19/24(Fri)20:21:47 No.101484037

File: file.png (177 KB, 736x985)

177 KB PNG

>>101483736
Oddly, direct R thinks 9.11 is bigger but R on OpenRouter says 9.9. I swiped a few times on Temp 0 K 2.
Also you don't need to tell it to close tags.

Anonymous
07/19/24(Fri)20:23:05 No.101484050

Anonymous 07/19/24(Fri)20:23:05 No.101484050

*when told to think, like wtf it got 9.9 > 9.11 without thinking, even more confusing

Anonymous
07/19/24(Fri)20:43:58 No.101484286

Anonymous 07/19/24(Fri)20:43:58 No.101484286

>>101484037
The thinking tag is really interesting to read. But we know its just a blackbox in of it self and its just an illusion for us to read it

Anonymous
07/19/24(Fri)20:53:04 No.101484406

Anonymous 07/19/24(Fri)20:53:04 No.101484406

Local peaked with Dolphin Mixtral 2.5. Prove me wrong, without mentioning the words "placebo" or "retard."

Anonymous
07/19/24(Fri)20:55:55 No.101484436

Anonymous 07/19/24(Fri)20:55:55 No.101484436

>>101484406
Placebo you retard.

Anonymous
07/19/24(Fri)20:59:08 No.101484478

Anonymous 07/19/24(Fri)20:59:08 No.101484478

>>101484406
sugarpill you moron

Anonymous
07/19/24(Fri)21:01:54 No.101484504

Anonymous 07/19/24(Fri)21:01:54 No.101484504

>>101484406
All in your head mouth breather

Anonymous
07/19/24(Fri)21:06:38 No.101484563

Anonymous 07/19/24(Fri)21:06:38 No.101484563

>>101483237
I don't know about smarter than l3 70b but the 128k context is a huge plus. There's finally hope that we'll be free of 8k context hell soon.

Anonymous
07/19/24(Fri)21:07:18 No.101484570

Anonymous 07/19/24(Fri)21:07:18 No.101484570

>>101484563
128k base context without needing rope hoooly

Anonymous
07/19/24(Fri)21:10:04 No.101484604

Anonymous 07/19/24(Fri)21:10:04 No.101484604

>>101484563
128k context llama 3.1 in <1mw

Anonymous
07/19/24(Fri)21:10:16 No.101484608

Anonymous 07/19/24(Fri)21:10:16 No.101484608

>>101484570
And at 8.0bpw and 8-bit kv cache we can actually load the full model with 128k context onto a 3090.

Anonymous
07/19/24(Fri)21:19:02 No.101484697

Anonymous 07/19/24(Fri)21:19:02 No.101484697

>>101484563
Might be because I was doing it wrong, but I remember mixtral 8x7b having 32k context but the quality quickly went to shit around 10k. Anyone using the new 12b one that tried chatting with the full 128k context notice any massive quality drops?

Anonymous
07/19/24(Fri)21:22:38 No.101484733

Anonymous 07/19/24(Fri)21:22:38 No.101484733

>>101484697
No, otherwise I would not be praising the context. I pushed it up to 160k ish before it turned retard in my testing.

Anonymous
07/19/24(Fri)21:28:34 No.101484786

Anonymous 07/19/24(Fri)21:28:34 No.101484786

It's time to take things to the next level

Anonymous
07/19/24(Fri)21:32:16 No.101484820

Anonymous 07/19/24(Fri)21:32:16 No.101484820

File: 11__00828_.png (2.14 MB, 1024x1024)

2.14 MB PNG

>>101484733
Good up to 30k in my tests and decent recall on asking about what happened in the beginning of the conversation
>>101484733
Haven't gotten it that far yet but that's based

Anonymous
07/19/24(Fri)21:33:45 No.101484839

Anonymous 07/19/24(Fri)21:33:45 No.101484839

>>101484820
Meant to tag
>>101484697

Anonymous
07/19/24(Fri)21:51:36 No.101485018

Anonymous 07/19/24(Fri)21:51:36 No.101485018

the amount of vramlet cope in this thread is phenomenal

Anonymous
07/19/24(Fri)22:01:08 No.101485126

Anonymous 07/19/24(Fri)22:01:08 No.101485126

>>101485018
Hi mikufag.

Anonymous
07/19/24(Fri)22:04:45 No.101485168

Anonymous 07/19/24(Fri)22:04:45 No.101485168

>>101485018
Nice try anon but Wizard was my old daily driver. With this I have at least 56GB of VRAM left and the gens are 30t/s instead of 5 t/s.
Still some weird shit like some messages showing up blank but nothing a reroll won't fix. Maybe it'll be better with transformers.

Anonymous
07/19/24(Fri)22:10:10 No.101485218

Anonymous 07/19/24(Fri)22:10:10 No.101485218

I have been in cryosleep for a few months, gemma 27b isn't better than cr+ 104b right? Back to sleep I go.

Anonymous
07/19/24(Fri)22:13:44 No.101485255

Anonymous 07/19/24(Fri)22:13:44 No.101485255

File: 00003-1532105500_1.png (1.2 MB, 1024x1024)

1.2 MB PNG

>>101485126
Not the guy you're responding to, but here is pic related if you need something to raise your blood pressure

Anonymous
07/19/24(Fri)22:16:30 No.101485285

Anonymous 07/19/24(Fri)22:16:30 No.101485285

>>101485218
Column-R is releasing next week.

Anonymous
07/19/24(Fri)22:19:39 No.101485328

Anonymous 07/19/24(Fri)22:19:39 No.101485328

starting next week i will have 104GB of VRAM. whats the best model for me to use?

Anonymous
07/19/24(Fri)22:20:10 No.101485338

Anonymous 07/19/24(Fri)22:20:10 No.101485338

>>101485328
also, i will have 64 CPU cores and 512GB of RAM, if that makes a difference.

Anonymous
07/19/24(Fri)22:20:39 No.101485350

Anonymous 07/19/24(Fri)22:20:39 No.101485350

lmao this celeste 1.2 thing is retarded compared to Stheno.

Anonymous
07/19/24(Fri)22:20:42 No.101485353

Anonymous 07/19/24(Fri)22:20:42 No.101485353

>>101485328
405B 2.0bpw

Anonymous
07/19/24(Fri)22:21:21 No.101485366

Anonymous 07/19/24(Fri)22:21:21 No.101485366

never paid for porn in my life, so it's hard to justify spending thousands on a smut generator. Plus the power can't even be harnessed to play games like a single card can
not worth it

Anonymous
07/19/24(Fri)22:22:03 No.101485377

Anonymous 07/19/24(Fri)22:22:03 No.101485377

>>101485350
Nothing can be more retarded than Stheno, Sao.

Anonymous
07/19/24(Fri)22:22:07 No.101485378

Anonymous 07/19/24(Fri)22:22:07 No.101485378

How do I use this to auto translate vn? I already setup koboldcpp and textractor to extract and copy text, but how to connect them?

Anonymous
07/19/24(Fri)22:23:21 No.101485393

Anonymous 07/19/24(Fri)22:23:21 No.101485393

>>101485353
if i use that, wont i only have a few GB of VRAM to spare for context?

Anonymous
07/19/24(Fri)22:24:14 No.101485403

Anonymous 07/19/24(Fri)22:24:14 No.101485403

>>101485366
I'd just spend serious money if I could play actual solo TTRPGs, with accurate and consistent rules, lore, etc.
You can get really close, and I think a ´purpose built frontend for that can get even closer, but there would still be a lot of annoyances.

Anonymous
07/19/24(Fri)22:25:43 No.101485422

Anonymous 07/19/24(Fri)22:25:43 No.101485422

>>101485378
By writing your own script.

Anonymous
07/19/24(Fri)22:26:33 No.101485439

Anonymous 07/19/24(Fri)22:26:33 No.101485439

>>101485285
How big is it though? command-r q4_k_m pushes the absolute limits of my hardware, I cannot fit anything higher.

Anonymous
07/19/24(Fri)22:26:41 No.101485442

Anonymous 07/19/24(Fri)22:26:41 No.101485442

>>101483237
I don't believe you. Every time I come back and try a new model it's just the same shit.

Anonymous
07/19/24(Fri)22:27:34 No.101485454

Anonymous 07/19/24(Fri)22:27:34 No.101485454

>>101485439 (me)
command-r-plus*

Anonymous
07/19/24(Fri)22:27:35 No.101485456

Anonymous 07/19/24(Fri)22:27:35 No.101485456

>>101485442
You're right to be skeptical, if you haven't noticed by now every lowercaser poster is terminally retarded or fucking with you.

Anonymous
07/19/24(Fri)22:30:27 No.101485486

Anonymous 07/19/24(Fri)22:30:27 No.101485486

>>101485350
buy an ad sao

Anonymous
07/19/24(Fri)22:31:37 No.101485503

Anonymous 07/19/24(Fri)22:31:37 No.101485503

>>101485378
Make a plugin

Anonymous
07/19/24(Fri)22:48:18 No.101485690

Anonymous 07/19/24(Fri)22:48:18 No.101485690

Did anyone who was having an issue running Mistral Nemo in Ooba get it working?

Anonymous
07/19/24(Fri)22:49:51 No.101485716

Anonymous 07/19/24(Fri)22:49:51 No.101485716

Why do people claim that Llama3 is good? It's woke, vindictive, and tries to end sessions as quickly as possible. Before you say skill issue, I've got other models which don't give me that shit, so why should I bother wrangling L3? Its' intelligence isn't better.

Anonymous
07/19/24(Fri)22:50:28 No.101485725

Anonymous 07/19/24(Fri)22:50:28 No.101485725

>>101482972
too few parameters, and context size abysmal, I lose track of stuff too fast

Anonymous
07/19/24(Fri)22:52:27 No.101485746

Anonymous 07/19/24(Fri)22:52:27 No.101485746

>>101485716
>so why should I bother wrangling L3?
Don't. Keep using whatever model you like.

Anonymous
07/19/24(Fri)22:54:21 No.101485766

Anonymous 07/19/24(Fri)22:54:21 No.101485766

>>101482942
https://huggingface.co/TheBloke/X-NoroChronos-13B-GGUF

That's the best futa model I've ever used. More graphic than virtually anything else, and is pretty much the only card that will depict unsolicited futa rape, as well.

Anonymous
07/19/24(Fri)22:55:21 No.101485781

Anonymous 07/19/24(Fri)22:55:21 No.101485781

>>101485746
Have you had good results with L3, Anon?

Anonymous
07/19/24(Fri)22:59:22 No.101485817

Anonymous 07/19/24(Fri)22:59:22 No.101485817

>>101485781
Yes. Not everyone uses the models the way you do and not all models respond the same way to the prompts. There are better models, but there are much worse models as well. Just use whatever you like.

Anonymous
07/19/24(Fri)23:00:48 No.101485831

Anonymous 07/19/24(Fri)23:00:48 No.101485831

>>101474172
Miku is a piece of shit

Anonymous
07/19/24(Fri)23:04:13 No.101485864

Anonymous 07/19/24(Fri)23:04:13 No.101485864

Anyone try Nexuflow Athene yet?

Anonymous
07/19/24(Fri)23:08:47 No.101485914

Anonymous 07/19/24(Fri)23:08:47 No.101485914

>>101485864
Download it and test it. Would you trust me if i say it's good or bad?

Anonymous
07/19/24(Fri)23:16:09 No.101486007

Anonymous 07/19/24(Fri)23:16:09 No.101486007

>>101483094
Man I only have 16 gigs of both RAM and VRAM on this thing. I'm amazed I can run it at all. Probably for the better. My dick would fall off from overuse if I ever upgraded.

Anonymous
07/19/24(Fri)23:17:06 No.101486013

Anonymous 07/19/24(Fri)23:17:06 No.101486013

>>101485914
It depends on how the post is worded and how much evidence is brought to the table.
I don't want to dl it if it's garbage or has some issue.

Anonymous
07/19/24(Fri)23:19:57 No.101486044

Anonymous 07/19/24(Fri)23:19:57 No.101486044

>>101485690
tried once and it threw errors
I cba to figure it out right now, I'd rather wait

Anonymous
07/19/24(Fri)23:32:58 No.101486175

Anonymous 07/19/24(Fri)23:32:58 No.101486175

File: file.png (3 KB, 113x69)

3 KB PNG

Currently testing if Nemo really has 128k output on OpenRouter and for me SillyTavern usually breaks response (all models/sources) at ~300 tokens with streaming on so I have to wait for it to finish with streaming off.
Told it to count infinite numbers without stopping.

Anonymous
07/19/24(Fri)23:34:56 No.101486195

Anonymous 07/19/24(Fri)23:34:56 No.101486195

>>101486175
Anon...

Anonymous
07/19/24(Fri)23:34:56 No.101486196

Anonymous 07/19/24(Fri)23:34:56 No.101486196

You can see the negative IQ reading this : >>101484436 >>101484478 >>101484504

Anonymous
07/19/24(Fri)23:35:48 No.101486202

Anonymous 07/19/24(Fri)23:35:48 No.101486202

File: file.png (17 KB, 873x100)

17 KB PNG

>>101486175
>>101486195
Nevermind it's just 32768

Anonymous
07/19/24(Fri)23:36:58 No.101486213

Anonymous 07/19/24(Fri)23:36:58 No.101486213

>>101486202
https://mistral.ai/news/mistral-nemo/
Nemo is 128K, just looks like whatever your using has a 32K limit.

Anonymous
07/19/24(Fri)23:37:10 No.101486215

Anonymous 07/19/24(Fri)23:37:10 No.101486215

>>101485255
>waifushitter
there's no reason to get mad over lowest form of life in this shit thread.

Anonymous
07/19/24(Fri)23:38:02 No.101486224

Anonymous 07/19/24(Fri)23:38:02 No.101486224

>>101486213
I know the context is, but OR has max output incorrectly listed.

Anonymous
07/19/24(Fri)23:40:14 No.101486249

Anonymous 07/19/24(Fri)23:40:14 No.101486249

>>101486196
learn to laugh

Anonymous
07/19/24(Fri)23:50:08 No.101486350

Anonymous 07/19/24(Fri)23:50:08 No.101486350

File: file.png (16 KB, 467x127)

16 KB PNG

>>101486202
Lepton is 18 cents per million instead of 30 cents but only gives me 1024 output.

Anonymous
07/19/24(Fri)23:52:25 No.101486370

Anonymous 07/19/24(Fri)23:52:25 No.101486370

Crazy to think that two years ago 175B da vinci felt horribly big and impossible to run even if it were open source. Now there's an entire general filled with people waiting to run a 405B model on their local machine.

Anonymous
07/19/24(Fri)23:55:56 No.101486399

Anonymous 07/19/24(Fri)23:55:56 No.101486399

>>101486370
Where? Please share the link. I'm tired of being among VRAMlets

Anonymous
07/19/24(Fri)23:59:07 No.101486438

Anonymous 07/19/24(Fri)23:59:07 No.101486438

>>101486370
>entire general
you mean like 2 dudes? or "waiting" as in hoping to be able to do so in 2 years?

Anonymous
07/20/24(Sat)00:11:10 No.101486542

Anonymous 07/20/24(Sat)00:11:10 No.101486542

>>101485378
Ask Claude 3.5 to make a py script

Anonymous
07/20/24(Sat)00:12:23 No.101486553

Anonymous 07/20/24(Sat)00:12:23 No.101486553

>>101486542
This. claude 3.5 is at that magical breakthrough point where it can code most shit for you in 1-2 shots.

Anonymous
07/20/24(Sat)00:13:55 No.101486572

Anonymous 07/20/24(Sat)00:13:55 No.101486572

>>101486542
>>101486553
will we ever get claude 3.5 but local

Anonymous
07/20/24(Sat)00:19:27 No.101486632

Anonymous 07/20/24(Sat)00:19:27 No.101486632

>>101485377
>>101485486
No, as bad as Stheno is Celeste is somehow even worse. No one should be using either model with alternatives available like Gemma or the new Mistral.

Anonymous
07/20/24(Sat)00:20:28 No.101486640

Anonymous 07/20/24(Sat)00:20:28 No.101486640

>>101486632
I will be using Celeste thanks to your recommendation.

Anonymous
07/20/24(Sat)00:21:36 No.101486652

Anonymous 07/20/24(Sat)00:21:36 No.101486652

>>101486640
I have sentenced you to hell then.

Anonymous
07/20/24(Sat)00:53:05 No.101486991

Anonymous 07/20/24(Sat)00:53:05 No.101486991

>>101482057
>>101481967
What context and instruct settings should I use with gemma 9b? What about Niitama? Are there any other settings I should use with them?

Anonymous
07/20/24(Sat)00:53:20 No.101486995

Anonymous 07/20/24(Sat)00:53:20 No.101486995

>>101485366

That’s what I thought. Until I realized how you can almost configure Silly Tavern, along with your respective model, to almost VN level quality. Having any VN crafted for your tastes alone and using SD for art assets was enough to justify the price tag of 2x 3090s for me. If there’s only a decent TTS model out there, and another a model that animate sprites on the fly… Holy, that might actually justify finding employment to a lot of guys out there. Now, that’s how you get the young males back to work.

Anonymous
07/20/24(Sat)01:16:02 No.101487197

Anonymous 07/20/24(Sat)01:16:02 No.101487197

>>101485366
>>101486995
Imo these coomer models will be dangerous for me when they can generate porn RPGs on their own. I play so many of those games and having a never ending one where I choose the setting and overarching plot would ruin me.

Anonymous
07/20/24(Sat)01:38:30 No.101487352

Anonymous 07/20/24(Sat)01:38:30 No.101487352

File: 1720335280743754.png (57 KB, 1721x500)

57 KB PNG

>>101485422
>>101485503
>>101486542
Damn, I was hoping someone already made it lol. Well it's a start I guess

Anonymous
07/20/24(Sat)01:50:49 No.101487448

Anonymous 07/20/24(Sat)01:50:49 No.101487448

File: nala deepsek chat v2 0628.png (170 KB, 933x536)

170 KB PNG

Ran a Nala test on the new DeepSeek Chat model and I'd have to say Coder is way better at ERP.

Anonymous
07/20/24(Sat)02:09:57 No.101487618

Anonymous 07/20/24(Sat)02:09:57 No.101487618

>>101487448
SHIVERS DOWN MY GODDAMN SPINE

Anonymous
07/20/24(Sat)02:29:39 No.101487754

Anonymous 07/20/24(Sat)02:29:39 No.101487754

>>101487448
shivershit

Anonymous
07/20/24(Sat)03:04:45 No.101488024

Anonymous 07/20/24(Sat)03:04:45 No.101488024

>>101487618
>>101487754
It will never go away. If you cannot deal with shivers find a another hobby.

Anonymous
07/20/24(Sat)03:06:10 No.101488038

Anonymous 07/20/24(Sat)03:06:10 No.101488038

File: Untitled.png (3 KB, 907x53)

3 KB PNG

It's over

Anonymous
07/20/24(Sat)03:08:02 No.101488057

Anonymous 07/20/24(Sat)03:08:02 No.101488057

>>101488042
>>101488042
>>101488042

Anonymous
07/20/24(Sat)03:55:28 No.101488522

Anonymous 07/20/24(Sat)03:55:28 No.101488522

File: sample_4ada8eb319277e9e4e(...).jpg (1.13 MB, 850x1608)

1.13 MB JPG

>>101483237
>almost makes me regret buying that second gpu
Just wait for inevitable Mixtral 12x8, that's how Mistral operates - they commence by training a smaller model, later transforming it into MoE

Anonymous
07/20/24(Sat)04:12:37 No.101488660

Anonymous 07/20/24(Sat)04:12:37 No.101488660

>>101485766
If this is a new method of shilling then it worked.
Damn it.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.