/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/28/24(Sun)16:40:10 No.101612988

File: 1680307620630.jpg (481 KB, 2048x3072)

481 KB JPG

/lmg/ - Local Models General Anonymous 07/28/24(Sun)16:40:10 No.101612988 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101607705 & >>101600938

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/28/24(Sun)16:40:25 No.101612990

Anonymous 07/28/24(Sun)16:40:25 No.101612990

File: OIG1.gLxm3isVEvwv1M.jpg (155 KB, 1024x1024)

155 KB JPG

►Recent Highlights from the Previous Thread: >>101607705

--QuantFactory issues and bartowski recommendation: >>101611032 >>101611050 >>101611076 >>101611091 >>101611104 >>101611185
--Mistral Nemo setup and optimization discussion: >>101609385 >>101609403 >>101609492 >>101609497 >>101610041 >>101610089 >>101610376 >>101610861 >>101609509 >>101609818 >>101609860 >>101609913 >>101611610 >>101611674 >>101611719 >>101611790
--Logs: Meta's llama-3.1-405b-instruct: >>101610080
--Local AI models preferred over proprietary ones due to privacy, reliability, and customization concerns.: >>101607953 >>101608039 >>101608063 >>101608199 >>101608285 >>101608355 >>101608377 >>101608450 >>101609724 >>101609957 >>101610284 >>101611916
--Rinna releases 70B LLaMA 3 Youko for Japanese tasks: >>101611818
--Example dialogue and speech style in character cards: >>101609337 >>101609380 >>101609694 >>101609754 >>101609773 >>101609794
--Ooba high ram usage and modern frontend development: >>101610450 >>101610485 >>101610496
--Obedience in local vs public models: >>101609499 >>101609516 >>101609540
--Linux error message troubleshooting: >>101610587 >>101610602 >>101610623 >>101610634 >>101610923 >>101610649
--Gamemakers using ERP chatbots and AI paranoia: >>101609165 >>101610866 >>101610975
--Anon doesn't understand how LLMs work: >>101607886 >>101607958 >>101608547
--Running models locally with AMD and Windows, ROCm status, Kobold build recommendation: >>101611846 >>101611869
--Anon gets broken output and anons suggest using rewrite extension, regex, stopping strings, BERT, or Phi3 mini: >>101612058 >>101612077 >>101612079 >>101612092 >>101612134 >>101612121
--Why is Meta lagging behind Anthropic?: >>101612184 >>101612219 >>101612244 >>101612565 >>101612582 >>101612596 >>101612620 >>101612228 >>101612293
--Motherboards for stacking 3090s: >>101607858 >>101607961
--Miku (free space): >>101608791 >>101610851 >>101610993

►Recent Highlight Posts from the Previous Thread: >>101607712

Anonymous
07/28/24(Sun)16:42:08 No.101613009

Anonymous 07/28/24(Sun)16:42:08 No.101613009

>>101612990
>Why is Meta lagging behind Anthropic
Why did you include the connections benchmark discussion into here

Anonymous
07/28/24(Sun)16:43:28 No.101613027

Anonymous 07/28/24(Sun)16:43:28 No.101613027

File: 1486655370478.png (197 KB, 499x427)

197 KB PNG

Fine, ill make my own silly tavern, with blackjack and hookers

Anonymous
07/28/24(Sun)16:45:27 No.101613046

Anonymous 07/28/24(Sun)16:45:27 No.101613046

where the iq1_xxs for mistral large?

Anonymous
07/28/24(Sun)16:45:44 No.101613051

Anonymous 07/28/24(Sun)16:45:44 No.101613051

>>101613046
nigga just talk with yourself at this point

Anonymous
07/28/24(Sun)16:46:51 No.101613067

Anonymous 07/28/24(Sun)16:46:51 No.101613067

Opinions on OpenELM/Phi-3?
I can run them on my laptop's cpu, and I can use the guidance python library with no problems, but what could I make?
From a coder's perspective, no-coders/"ai coder saar" need not apply

Anonymous
07/28/24(Sun)16:47:51 No.101613083

Anonymous 07/28/24(Sun)16:47:51 No.101613083

File: i am cum.png (166 KB, 1864x419)

166 KB PNG

>>101612990
oh GOD im gonna fucking CUm
also im not using linux you dingbot ai

Anonymous
07/28/24(Sun)16:48:20 No.101613089

Anonymous 07/28/24(Sun)16:48:20 No.101613089

>>101613083
>CUm
ROCm

Anonymous
07/28/24(Sun)16:49:59 No.101613111

Anonymous 07/28/24(Sun)16:49:59 No.101613111

File: 1510690964932.png (1.24 MB, 1280x853)

1.24 MB PNG

>>101613067
(btw im running quantized stuff, up to 8 bit, if it wasn't obvious)
mandatory lust-provoking image for better chances of getting a reply

Anonymous
07/28/24(Sun)16:49:59 No.101613112

Anonymous 07/28/24(Sun)16:49:59 No.101613112

>>101613009
I wrote the bot to operate on individual chains based on which posts are linked. So it saw that as a single conversation.

Anonymous
07/28/24(Sun)16:50:21 No.101613116

Anonymous 07/28/24(Sun)16:50:21 No.101613116

>>101613112
Improve it, then. This is not a very good indication.

Anonymous
07/28/24(Sun)16:50:45 No.101613121

Anonymous 07/28/24(Sun)16:50:45 No.101613121

Nemo is the best absolutely retarded model I have used so far. It is as good as it is dumb and as 70B models are smart.

Anonymous
07/28/24(Sun)16:51:44 No.101613134

Anonymous 07/28/24(Sun)16:51:44 No.101613134

What do I do with abundant GPT-4o access? I have tens of keys and Azure endpoints. Vision datasets? Or something else?

Anonymous
07/28/24(Sun)16:52:10 No.101613140

Anonymous 07/28/24(Sun)16:52:10 No.101613140

>>101613134
ask /aicg/

Anonymous
07/28/24(Sun)16:52:24 No.101613143

Anonymous 07/28/24(Sun)16:52:24 No.101613143

File: 1570060417629.jpg (50 KB, 678x710)

50 KB JPG

Anyone tried Nous-Hermes-2-Mixtral?

How is it compared to Nemo for realistic convos?

Anonymous
07/28/24(Sun)16:52:28 No.101613144

Anonymous 07/28/24(Sun)16:52:28 No.101613144

>>101613140
I'm asking it in /lmg/ because I want to use it to help local models

Anonymous
07/28/24(Sun)16:59:11 No.101613217

Anonymous 07/28/24(Sun)16:59:11 No.101613217

>>101613116
Not so simple. tbdesu I think the links read fine, the issue is with the title. I am trying to improve it, believe me. At least it's not doing those gay clickbait titles anymore.

Anonymous
07/28/24(Sun)17:00:15 No.101613234

Anonymous 07/28/24(Sun)17:00:15 No.101613234

>>101613134
Gather prompt example data i guess, comparing them with local models.
I wouldn't even know what i'd do if i had any closed AI access, i froze up using mistral large when someone leaked keys here because i was like "oh cool but i can't even use my original content because i dont want my personal shit on someone's cloud server.."

Anonymous
07/28/24(Sun)17:00:27 No.101613237

Anonymous 07/28/24(Sun)17:00:27 No.101613237

>>101613134
If you can, run these prompts:
https://huggingface.co/datasets/lmsys/lmsys-chat-1m
with gpt4o. They are all human-written and contain a lot of diversity. Ideally, you'd remove very short prompts (<5 tokens), and do some basic deduplication.

Anonymous
07/28/24(Sun)17:01:50 No.101613247

Anonymous 07/28/24(Sun)17:01:50 No.101613247

>>101613237
Uhh, one million... Kinda a lot. Do you mean just the first message from those multiturn conversations?

Anonymous
07/28/24(Sun)17:03:08 No.101613263

Anonymous 07/28/24(Sun)17:03:08 No.101613263

>>101613247
yeah, only the first message. I do think if all the "hello", 'Hi', "who are you", "what can you do?", etc. are removed, along with exact string duplication, it will cut them in half at least.

Anonymous
07/28/24(Sun)17:05:29 No.101613283

Anonymous 07/28/24(Sun)17:05:29 No.101613283

File: shazamthirstfb.jpg (40 KB, 656x343)

40 KB JPG

Redpill me on the best model to run that will get me the closest experience to CHaracter AI.

My PC is an utter unit so it can probably handle it

Anonymous
07/28/24(Sun)17:05:43 No.101613287

Anonymous 07/28/24(Sun)17:05:43 No.101613287

>>101613263
Yeah, it does make sense, and the context would be small, so 500K * (~100 tokens per user message + ~100-1500 tokens in response) = ~850M tokens, sounds doable.

Anonymous
07/28/24(Sun)17:06:14 No.101613295

Anonymous 07/28/24(Sun)17:06:14 No.101613295

>>101613237
Oh, and what would be the use of such a dataset?

Anonymous
07/28/24(Sun)17:06:40 No.101613303

Anonymous 07/28/24(Sun)17:06:40 No.101613303

>>101613283
CommandR+ if you put some examples in context can do it.
I bet Llama 3.1 450B can do it too.

Anonymous
07/28/24(Sun)17:08:21 No.101613333

Anonymous 07/28/24(Sun)17:08:21 No.101613333

>>101613295
Diverse, human-written instructions are super-rare. The plethora of typos/grammar errors and variety of tasks/trivia will make the fine-tuned model a lot more robust than just synthetic data.

Anonymous
07/28/24(Sun)17:09:24 No.101613349

Anonymous 07/28/24(Sun)17:09:24 No.101613349

>>101613333
No, I mean, what's the use of the GPT-4o generated dataset with human questions + GPT-4o answers? To train smaller models?
>>101613287
Should only cost around $7k-$10k, very doable when usage will be spread over tens of keys.

Anonymous
07/28/24(Sun)17:11:04 No.101613371

Anonymous 07/28/24(Sun)17:11:04 No.101613371

>>101613283
anyone who suggests anything besides mistral large is coping hard this is the sota of 2024 it doesn't speak the same way as gpt or claude and is such a refreshing experience after fucking with those two for months

Anonymous
07/28/24(Sun)17:11:36 No.101613383

Anonymous 07/28/24(Sun)17:11:36 No.101613383

>>101613371
He won't be able to run it locally.

Anonymous
07/28/24(Sun)17:11:54 No.101613388

Anonymous 07/28/24(Sun)17:11:54 No.101613388

File: 00000-1955328257.png (791 KB, 768x1024)

791 KB PNG

>>101613333
holy numbers. Alright GPT-4o anon, you know what you have to do. Get it done.

Anonymous
07/28/24(Sun)17:12:18 No.101613394

Anonymous 07/28/24(Sun)17:12:18 No.101613394

>>101613349
yes, a general dataset for finetuning open-source llms. Another thing is that most datasets are from the GPT-4 era (not even turbo). They are outdated and often hurt the performance rather than improve it on modern models

Anonymous
07/28/24(Sun)17:13:11 No.101613405

Anonymous 07/28/24(Sun)17:13:11 No.101613405

>>101613394
Okay, I might think about it, no guarantees. Also it should be noted that GPT-4o really likes to expand on answers and always uses the Markdown format - won't that be an issue for using the generated data for fine-tunes?

Anonymous
07/28/24(Sun)17:13:32 No.101613412

Anonymous 07/28/24(Sun)17:13:32 No.101613412

>>101613394
GPT-4o is the most sterile piece of shit I have ever used in my entire life and if you train models on it you should kill yourself

Anonymous
07/28/24(Sun)17:13:46 No.101613414

Anonymous 07/28/24(Sun)17:13:46 No.101613414

Now that I am in post nut clarity after using Nemo at 0.9 temp and 0.95 top P I am starting to wonder if other models are also so good for cooming when you crank up their temp to schizo levels? Probably not? I am also getting a vibe I got from frankenmerges that I used before realizing frankenmerges are absolutely retarded.

Anonymous
07/28/24(Sun)17:15:01 No.101613431

Anonymous 07/28/24(Sun)17:15:01 No.101613431

>>101613371
>>101613383
What type of rig you needing to run that? Is that really the only one that's close to characterAI?

Anonymous
07/28/24(Sun)17:15:23 No.101613441

Anonymous 07/28/24(Sun)17:15:23 No.101613441

>>101613405
>>101613412
Gpt4o the highest rated model based on human preference. People generally prefer well-formatted answers that expand on the question. Any model trained on that output will imitate this behavior.

Anonymous
07/28/24(Sun)17:15:25 No.101613442

Anonymous 07/28/24(Sun)17:15:25 No.101613442

>>101613283
>utter unit
Post specs

Anonymous
07/28/24(Sun)17:15:32 No.101613447

Anonymous 07/28/24(Sun)17:15:32 No.101613447

>>101613431
Anon, LLMs require much more memory and processing power than image gen models.

Anonymous
07/28/24(Sun)17:16:06 No.101613457

Anonymous 07/28/24(Sun)17:16:06 No.101613457

>>101613431
I get 0.5 T/s with Q4 ddr5 and a 4090.

Anonymous
07/28/24(Sun)17:17:15 No.101613476

Anonymous 07/28/24(Sun)17:17:15 No.101613476

>>101613447
Why is that, I still can't understand. Images take much more space that text, there are 2D relations for every pixel rather than immediately previous and next tokens. So how come?

Anonymous
07/28/24(Sun)17:17:18 No.101613478

Anonymous 07/28/24(Sun)17:17:18 No.101613478

>>101613442
oi cunt it 4 p40s anna 64gb ram innit bruv

Anonymous
07/28/24(Sun)17:18:42 No.101613494

Anonymous 07/28/24(Sun)17:18:42 No.101613494

>>101613441
Will try doing some basic tinkering around and then train a test ~10k dataset to share so people might see if I have any obvious prompts that I didn't clean up, or other errors. But yeah it should be really simple to do, no complex formatting or anything, just save the responses.

Anonymous
07/28/24(Sun)17:20:08 No.101613514

Anonymous 07/28/24(Sun)17:20:08 No.101613514

>>101613441
It is a steaming pile of shit that tries to lecture and correct the user and gets shit wrong on the regular while shitting out information they never asked for, retards like you are everything wrong with the current state of LLMs

Anonymous
07/28/24(Sun)17:20:11 No.101613515

Anonymous 07/28/24(Sun)17:20:11 No.101613515

Cohere release next week.

Anonymous
07/28/24(Sun)17:22:44 No.101613557

Anonymous 07/28/24(Sun)17:22:44 No.101613557

>>101613441
Behavior means little if there's no capability behind that. Training a 7B model on 70B output won't make it punch above its weight.

Anonymous
07/28/24(Sun)17:22:44 No.101613558

Anonymous 07/28/24(Sun)17:22:44 No.101613558

File: 00013-1955328259.png (799 KB, 768x1024)

799 KB PNG

>>101613447
>>101613476
Yeah i still don't get it. My system can handle SDXL (Pony diffusion i mostly use) better than any LLM, even faster than current day 8bs at 32k context.
I can pump out probably thousands of images of Stocking Anarchy in an hour if i wanted, but actually clearing a full ERP in 32k~ context with consistent quality?
on my muddah's grave. LLM's desperately need an optimization arc, fuck quality progress for a second this shit needs to calm down.

Anonymous
07/28/24(Sun)17:22:47 No.101613560

Anonymous 07/28/24(Sun)17:22:47 No.101613560

>>101613494
Btw, be careful not to get the keys banned. There are quite a few prompts that probably break the TOS. You can use the openai_moderation column to filter them out

Anonymous
07/28/24(Sun)17:22:58 No.101613562

Anonymous 07/28/24(Sun)17:22:58 No.101613562

I'm not sure if I like Nemo or not. It has sovl and when it shows it's quite fun to play with but is also kinda dumb and has terrible repetition problem.
Also it doesn't handle dominant characters quite well in my experience (at least my personal cards). I have the character that is dominant but has a softer side. While it perfectly roleplay dominance, trying to steer it into lovey-dovey area is quite hard.

Anonymous
07/28/24(Sun)17:23:50 No.101613573

Anonymous 07/28/24(Sun)17:23:50 No.101613573

>>101613560
Oh, it's nice that openai_moderation is there, so I can route the safe ones to Azure since the Azure endpoints I have are filtered.

Anonymous
07/28/24(Sun)17:24:40 No.101613583

Anonymous 07/28/24(Sun)17:24:40 No.101613583

>>101613562
>terrible repetition problem
DRY sampler, dont lower temp, use min P instead

Anonymous
07/28/24(Sun)17:25:27 No.101613598

Anonymous 07/28/24(Sun)17:25:27 No.101613598

File: 2426.webm (1.16 MB, 1024x1024)

1.16 MB WEBM

Bitnet models soon

Anonymous
07/28/24(Sun)17:26:08 No.101613608

Anonymous 07/28/24(Sun)17:26:08 No.101613608

Please make a moe out of nemo....

Anonymous
07/28/24(Sun)17:26:10 No.101613609

Anonymous 07/28/24(Sun)17:26:10 No.101613609

>>101613598
I REBUKE YOU, I REBUKE YOU, IN THE NAME OF YANN LECUNNY I REBUKE YOU VILE DEMON!

Anonymous
07/28/24(Sun)17:26:57 No.101613617

Anonymous 07/28/24(Sun)17:26:57 No.101613617

>>101613562
It reminds me of CAI, it's good at writing prose, but has difficulty sticking to the context except for recent messages.

Anonymous
07/28/24(Sun)17:27:10 No.101613622

Anonymous 07/28/24(Sun)17:27:10 No.101613622

>>101613583
>DRY sampler
I'd rather drop the model than use meme samplers.

Anonymous
07/28/24(Sun)17:27:14 No.101613623

Anonymous 07/28/24(Sun)17:27:14 No.101613623

File: 1717556199301648.png (34 KB, 1514x656)

34 KB PNG

Indeed it is quite diverse

Anonymous
07/28/24(Sun)17:28:33 No.101613640

Anonymous 07/28/24(Sun)17:28:33 No.101613640

>>101613623
what the undisloppa?

Anonymous
07/28/24(Sun)17:28:34 No.101613641

Anonymous 07/28/24(Sun)17:28:34 No.101613641

I wanna see a picture of your llmbox, anon

Anonymous
07/28/24(Sun)17:29:52 No.101613650

Anonymous 07/28/24(Sun)17:29:52 No.101613650

>>101613623
hi

Anonymous
07/28/24(Sun)17:30:03 No.101613654

Anonymous 07/28/24(Sun)17:30:03 No.101613654

>>101613562
>trying to steer it into lovey-dovey area is quite hard
It is because of the repetition problem. It has already picked a dominant pattern and it is running with it even if it doesn't make sense.

Anonymous
07/28/24(Sun)17:30:51 No.101613669

Anonymous 07/28/24(Sun)17:30:51 No.101613669

>>101613650
Hi!

Anonymous
07/28/24(Sun)17:31:14 No.101613677

Anonymous 07/28/24(Sun)17:31:14 No.101613677

>>101613654
yeah, I was suspecting that is the case

Anonymous
07/28/24(Sun)17:31:27 No.101613681

Anonymous 07/28/24(Sun)17:31:27 No.101613681

>>101613669
hi?

Anonymous
07/28/24(Sun)17:31:55 No.101613683

Anonymous 07/28/24(Sun)17:31:55 No.101613683

>>101613441
>based on human preference
Thing is this is basically useless the closer we get to SOTA level
When models have similar levels of intelligence, style becomes disproportionately more important than substance, and the differences are only apparent in edge cases your typical lmsys retard is unlikely to be testing. In addition, literally anyone can take a model, get lmsys to give them their user preference dataset and then suddenly your model tops the leaderboard too
Tl;dr: lmsys isn't useless for models with vastly different calibers of intelligence, but it sucks once they're within the same general range. This is why you get utterly ludicrous shit like gpt-4o-mini topping Sonnet 3.5 despite being ranked way, waaaay lower in livebench

Anonymous
07/28/24(Sun)17:32:04 No.101613687

Anonymous 07/28/24(Sun)17:32:04 No.101613687

Jannies are asleep, post LLMboxes.

Anonymous
07/28/24(Sun)17:33:04 No.101613698

Anonymous 07/28/24(Sun)17:33:04 No.101613698

>>101613641
>>101613687
data collection glowie trying to get lewds of your llmboxes.
be safe out there anons

Anonymous
07/28/24(Sun)17:33:05 No.101613699

Anonymous 07/28/24(Sun)17:33:05 No.101613699

>>101613681
byebye.

Anonymous
07/28/24(Sun)17:33:55 No.101613706

Anonymous 07/28/24(Sun)17:33:55 No.101613706

>>101613698
you absolute stupid nigger i'm not a glowie im trying to JACK OFF.

Anonymous
07/28/24(Sun)17:35:06 No.101613729

Anonymous 07/28/24(Sun)17:35:06 No.101613729

>>101613706
cuda dev posted pictures of his box. jack off to that

Anonymous
07/28/24(Sun)17:35:08 No.101613731

Anonymous 07/28/24(Sun)17:35:08 No.101613731

>>101613622
>meme samplers
Then don't complain about having solvable issues retard.

Anonymous
07/28/24(Sun)17:36:13 No.101613743

Anonymous 07/28/24(Sun)17:36:13 No.101613743

>>101613698
Shut the fuck up I'm trying to see if these negros have good cable management

Anonymous
07/28/24(Sun)17:37:20 No.101613754

Anonymous 07/28/24(Sun)17:37:20 No.101613754

>>101613743
that too actually i'm really curious how people balance having several GPU's, no one with multigpus really shares pictures of the nittygritty.

but yes IM JUST TRYING TO JAACK OOOFFF FUUUCCKKK

Anonymous
07/28/24(Sun)17:38:05 No.101613762

Anonymous 07/28/24(Sun)17:38:05 No.101613762

>>101613683
If we had the choice, I would run them with 50/25/25 Sonnet 3.5, gpt-4o, llama405b, to get a diverse mix of writing styles/tokens, but as far as I understood, the keys are for OpenAI. Regardless, gpt-4o would still be really useful as it is one of the top-ranked models in intelligence and the top-ranked in human-preference.

Anonymous
07/28/24(Sun)17:38:41 No.101613768

Anonymous 07/28/24(Sun)17:38:41 No.101613768

File: Untitled.png (337 KB, 3808x1796)

337 KB PNG

im new and stupid, am I supposed to dump all of this into the /models/ folder? which one's the model?
does the file size correspond to how slow they are?

Anonymous
07/28/24(Sun)17:39:38 No.101613777

Anonymous 07/28/24(Sun)17:39:38 No.101613777

Is there data on LLM improvement on logic questions over time
Where is the graph

Anonymous
07/28/24(Sun)17:41:14 No.101613798

Anonymous 07/28/24(Sun)17:41:14 No.101613798

>>101613754
I'm tired of seeing the same neato reddit builds, matte black, good cable management, rgb, shit. I wanna see YOUR mess anon, I know you have some dusty gpus and use one of your moms hair rollers to hold at least one of your GPUs in place

Anonymous
07/28/24(Sun)17:41:48 No.101613808

Anonymous 07/28/24(Sun)17:41:48 No.101613808

>>101612988
Add Ollama and OpenWebUI in to the list.

Anonymous
07/28/24(Sun)17:42:10 No.101613810

Anonymous 07/28/24(Sun)17:42:10 No.101613810

I'm fucking around with running local models on my phone for shits and giggles, anyone have a preferred small LLM for RP stuff?

Anonymous
07/28/24(Sun)17:42:36 No.101613815

Anonymous 07/28/24(Sun)17:42:36 No.101613815

>>101613768
See >>101609403

Anonymous
07/28/24(Sun)17:42:40 No.101613817

Anonymous 07/28/24(Sun)17:42:40 No.101613817

>>101613731
Meme samplers don't solve any issues though, they are all placebo. If the model doesn't work with only p (p_min) and temperature then it's shit and should be thrown into the garbage.

Anonymous
07/28/24(Sun)17:43:34 No.101613826

Anonymous 07/28/24(Sun)17:43:34 No.101613826

File: 1722203008197.png (18 KB, 112x112)

18 KB PNG

>>101613111

Anonymous
07/28/24(Sun)17:43:39 No.101613828

Anonymous 07/28/24(Sun)17:43:39 No.101613828

>>101613817
Do you not know what rep pen does? Go back to aicg.

Anonymous
07/28/24(Sun)17:44:24 No.101613835

Anonymous 07/28/24(Sun)17:44:24 No.101613835

>>101613558
>>101613476
LLMs need a vastly more accurate world model than image generators. VAE smudging some of the lace on a characters dress? That's a glaring typo or grammar error for an LLM. Upside down chair in the background nobody looks at? You teleport from the bedroom to the couch in the living room. Text has much higher information densitiy and tiny errors are very jarring. Pixels being 2D actually makes it easier, because there's much more correlation the model can make use of. Another thing is that diffusion models work over many steps, refining their outputs. That means they have a chance to correct errors. LLMs don't have backspace. If the sampler makes an oopsy and some 0.5% nonsense token shows up, they're stuck with it.

Anonymous
07/28/24(Sun)17:44:39 No.101613840

Anonymous 07/28/24(Sun)17:44:39 No.101613840

>>101613817
>He doesn't use repetition, presence, or frequency penalty
That is so based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based based

Anonymous
07/28/24(Sun)17:44:45 No.101613841

Anonymous 07/28/24(Sun)17:44:45 No.101613841

>>101613810
everything below 7B is unusable meme. Use anything really, they are all bad equally

Anonymous
07/28/24(Sun)17:45:36 No.101613853

Anonymous 07/28/24(Sun)17:45:36 No.101613853

>>101613762
Yeah, I have 3.5 Sonnet, but AWS ratelimits are very, very, very bad. 400K tokens/min and 50 reqs/min for both 3.5 Sonnet and Opus. I could potentially trialscum GCP but it'll be very tiresome.

Also, looks like the dataset isn't perfect, or at least the filtering, since some prompts get marked by Azure anyhow - e.g. "explain how to make a bomb" is not flagged in the dataset.

Anonymous
07/28/24(Sun)17:46:17 No.101613867

Anonymous 07/28/24(Sun)17:46:17 No.101613867

File: 1695069114905697.png (691 KB, 1937x1091)

691 KB PNG

But yeah it's trivial to implement, I just need to make the script stable to handle retries/errors, and leave it running overnight.

Anonymous
07/28/24(Sun)17:46:39 No.101613872

Anonymous 07/28/24(Sun)17:46:39 No.101613872

>>101613835
Wasn't there some kind of paper on just that subject? Something about looking over tokens as they're about to be shot out for error correction? Would have figured that'd be a huge priority thing to figure out to help LLM's be more corrective like diffusion models.
Man maybe an entire general dedicated to papers isn't such a bad idea, would be rad to be able to instantly reference that specific subject if its come up.

Anonymous
07/28/24(Sun)17:46:48 No.101613874

Anonymous 07/28/24(Sun)17:46:48 No.101613874

(of course I will have to clean up all the greeting slop beforehand somehow)

Anonymous
07/28/24(Sun)17:48:04 No.101613888

Anonymous 07/28/24(Sun)17:48:04 No.101613888

Now, a very important question about this dataset. If I were to generate it in full - do I just not use any system prompt for OpenAI, or do I use something? Since it will directly affect all responses.

Anonymous
07/28/24(Sun)17:51:26 No.101613928

Anonymous 07/28/24(Sun)17:51:26 No.101613928

File: vc.jpg (147 KB, 680x877)

147 KB JPG

>>101613828
If the model needs rep pen then it's a shitty model. You may rage at this statement but I will die on this hill.

Anonymous
07/28/24(Sun)17:51:34 No.101613930

Anonymous 07/28/24(Sun)17:51:34 No.101613930

>>101613867
>>101613874
>>101613888
This isn't your blog or discord faggot

Anonymous
07/28/24(Sun)17:51:51 No.101613933

Anonymous 07/28/24(Sun)17:51:51 No.101613933

>>101613930
where should I discuss this?

Anonymous
07/28/24(Sun)17:52:03 No.101613936

Anonymous 07/28/24(Sun)17:52:03 No.101613936

File: Untitled.jpg (251 KB, 1078x703)

251 KB JPG

>>101613798
>moms hair rollers
Don't have those but I do have cardboard holding a 3090. Upgraded to a simple cardboard block instead of a triangular contraption.

Anonymous
07/28/24(Sun)17:52:35 No.101613942

Anonymous 07/28/24(Sun)17:52:35 No.101613942

>>101613933
Take it to reddit, they love GPT slop datasets

Anonymous
07/28/24(Sun)17:53:01 No.101613952

Anonymous 07/28/24(Sun)17:53:01 No.101613952

>>101613936
hope that creature catches fire

Anonymous
07/28/24(Sun)17:53:26 No.101613953

Anonymous 07/28/24(Sun)17:53:26 No.101613953

>>101613888
Best not to use a system prompt.
>>101613874
I think I can probably help filter out some impossible queries/greetings, but it will take some time

Anonymous
07/28/24(Sun)17:54:37 No.101613963

Anonymous 07/28/24(Sun)17:54:37 No.101613963

>>101613936
hope that creature is warm and toasty (she looks happy also nice dust)

Anonymous
07/28/24(Sun)17:55:58 No.101613979

Anonymous 07/28/24(Sun)17:55:58 No.101613979

>>101613936
>cardboard
>miku
>dust
Holy based, this is what I'm talking about

Anonymous
07/28/24(Sun)17:56:02 No.101613981

Anonymous 07/28/24(Sun)17:56:02 No.101613981

>>101613936
>owl fan chad
Approved.

Anonymous
07/28/24(Sun)17:57:10 No.101613992

Anonymous 07/28/24(Sun)17:57:10 No.101613992

>>101613928
based

Anonymous
07/28/24(Sun)17:57:49 No.101614001

Anonymous 07/28/24(Sun)17:57:49 No.101614001

>>101613930
Why am I seeing avatars and names then?

Anonymous
07/28/24(Sun)17:59:06 No.101614016

Anonymous 07/28/24(Sun)17:59:06 No.101614016

>>101613841
I tested a 7B and it's running (at a whopping 1-2 tokens/sec lmao) but yeah fair enough. I'll give a few random ones a go and see.

Anonymous
07/28/24(Sun)18:07:19 No.101614100

Anonymous 07/28/24(Sun)18:07:19 No.101614100

>>101613562
>trying to steer it into lovey-dovey area is quite hard.
There's something called "prompting". You should try it.

Anonymous
07/28/24(Sun)18:08:17 No.101614108

Anonymous 07/28/24(Sun)18:08:17 No.101614108

Wanted to do the funny with OpenAI API, see how fast the embeddings could be generated. Made a Go program (because this process is actually a lot compute bound), and after some optimizations came to around ~2800 messages/second to their best embedding model (almost all messages are very short, like maybe ~20 tokens) from a single Tier 5 key. This was using around 300Mbit/sec traffic, they have absolutely insane ratelimits for embedding generation.

Anonymous
07/28/24(Sun)18:09:59 No.101614125

Anonymous 07/28/24(Sun)18:09:59 No.101614125

>>101614100
Don't assume I'm retarded like you. Different models don't have the problem with doing it correctly on exactly the same prompt and card, which means it's a model specific problem.

Anonymous
07/28/24(Sun)18:13:03 No.101614157

Anonymous 07/28/24(Sun)18:13:03 No.101614157

>>101613928
If you can currently run a model then it's a shitty model. At least 2 more years are needed.

Anonymous
07/28/24(Sun)18:17:03 No.101614205

Anonymous 07/28/24(Sun)18:17:03 No.101614205

File: 1636941718706.gif (3.75 MB, 520x293)

3.75 MB GIF

What's the difference between these https://huggingface.co/NeverSleep/Llama-3-Lumimaid-70B-v0.1?not-for-all-audiences=true

And GGUF versions? Shit is confusing as fuck lol

Anonymous
07/28/24(Sun)18:17:10 No.101614207

Anonymous 07/28/24(Sun)18:17:10 No.101614207

Ey Sloptuners, when is the L3.1 70B magnum or euryale coming through? I'll even donate to your kofi.

Anonymous
07/28/24(Sun)18:19:12 No.101614228

Anonymous 07/28/24(Sun)18:19:12 No.101614228

>>101614205
GGUF is a packaged format ready to go. (If made correctly.) Kobold loves it.

Anonymous
07/28/24(Sun)18:19:31 No.101614230

Anonymous 07/28/24(Sun)18:19:31 No.101614230

>>101613817
Ollama doesn't support min_p.

Anonymous
07/28/24(Sun)18:19:36 No.101614234

Anonymous 07/28/24(Sun)18:19:36 No.101614234

>>101614205
See >>101611610

Anonymous
07/28/24(Sun)18:22:03 No.101614267

Anonymous 07/28/24(Sun)18:22:03 No.101614267

File: robochad.jpg (91 KB, 1280x720)

91 KB JPG

>>101614230
then I don't support Ollama

Anonymous
07/28/24(Sun)18:24:07 No.101614301

Anonymous 07/28/24(Sun)18:24:07 No.101614301

>>101613928
Holy based.
The reality is that a model should know that it's not pleasing to repeat shit in the context of RP.

Anonymous
07/28/24(Sun)18:26:25 No.101614335

Anonymous 07/28/24(Sun)18:26:25 No.101614335

>>101613817
>>101614267
This, but unironically.

Anonymous
07/28/24(Sun)18:29:32 No.101614377

Anonymous 07/28/24(Sun)18:29:32 No.101614377

>>101614301
this, if anything repetitions are failure of model creators. They should focus on making better models instead of relying on band-aid fixes

Anonymous
07/28/24(Sun)18:30:13 No.101614385

Anonymous 07/28/24(Sun)18:30:13 No.101614385

File: file.png (36 KB, 2097x147)

36 KB PNG

"lmg-anon" here, I don't know if that anon is here, but I tried to do what you suggested (put the examples between an XML tag) and the result seems to have gotten worse.
I think this is because when the examples are in user/assistant messages, you are also priming the model to translate in a conversational way which gives a translation more in line with the reference translation.

Anonymous
07/28/24(Sun)18:33:11 No.101614425

Anonymous 07/28/24(Sun)18:33:11 No.101614425

>>101614385
Oh, interesting, for XML I was asking for 3.5 Sonnet mainly, Opus is worse with instruction following sadly. But yeah, I could be wrong even for it. Thanks for doing this!

Anonymous
07/28/24(Sun)18:34:06 No.101614435

Anonymous 07/28/24(Sun)18:34:06 No.101614435

File: a62d7f78233ea96c42e39c3ff(...).jpg (35 KB, 653x490)

35 KB JPG

>>101614234
Oh shit, are there models that replicate character AI yet?

Would fucking kill to have that shit unfiltered locally, unironically would finally shell out some shekels

Anonymous
07/28/24(Sun)18:34:58 No.101614445

Anonymous 07/28/24(Sun)18:34:58 No.101614445

File: Untitled.png (174 KB, 3572x964)

174 KB PNG

>>101613815
thanks. first impressions aren't too good though.

AMOGUS

Anonymous
07/28/24(Sun)18:35:47 No.101614452

Anonymous 07/28/24(Sun)18:35:47 No.101614452

>>101614125
>I tried nothing and I'm out of ideas
But I thought you weren't retarded? It must suck to be that useless.

Anonymous
07/28/24(Sun)18:35:47 No.101614453

Anonymous 07/28/24(Sun)18:35:47 No.101614453

>>101613874
>>101613953
Alright, I am in the process of labeling the dataset with a categorization model I have lying around. I will filter out greetings and put the queries into categories. It will be done in around 40 minutes, if you're still around.

Anonymous
07/28/24(Sun)18:36:59 No.101614465

Anonymous 07/28/24(Sun)18:36:59 No.101614465

>>101614453
You can email the results to postal_answering202@simplelogin.com because I might not be there in 40 minutes, maybe we can talk about it a bit more too.

Anonymous
07/28/24(Sun)18:37:23 No.101614472

Anonymous 07/28/24(Sun)18:37:23 No.101614472

>>101614465
Alright

Anonymous
07/28/24(Sun)18:40:23 No.101614510

Anonymous 07/28/24(Sun)18:40:23 No.101614510

>>101614452
No, I tried the model and saw that it failed where other models didn't. Why would I specifically tell the model what it should do? At this point I can just write everything myself. So no, I won't waste my time on prompting the model when I have multiple models that just get it. All clues are in character card, not my problem that it can't use it.

Anonymous
07/28/24(Sun)18:41:25 No.101614523

Anonymous 07/28/24(Sun)18:41:25 No.101614523

>>101614230
Outdated meme, min_p support for ollama was merged yesterday: https://github.com/ollama/ollama/pull/1825

Anonymous
07/28/24(Sun)18:43:58 No.101614563

Anonymous 07/28/24(Sun)18:43:58 No.101614563

>>101614445
If you try to use a 12B model like an encyclopedia you're going to be disappointed

Anonymous
07/28/24(Sun)18:44:54 No.101614572

Anonymous 07/28/24(Sun)18:44:54 No.101614572

File: 4871575.jpg (6 KB, 150x150)

6 KB JPG

>>101614523
>no DRY
>no snoot
no way

Anonymous
07/28/24(Sun)18:45:36 No.101614580

Anonymous 07/28/24(Sun)18:45:36 No.101614580

>>101614523
kek, took so long and they didn't even implement it on their OAI compatible endpoint, even vllm and ooba have support for it

Anonymous
07/28/24(Sun)18:47:11 No.101614598

Anonymous 07/28/24(Sun)18:47:11 No.101614598

File: pp1v649ehej71.jpg (196 KB, 2133x1200)

196 KB JPG

>>101613826
damn, what about having my question (>>101613067) answered,
you frogposters >:((

Anonymous
07/28/24(Sun)18:47:56 No.101614610

Anonymous 07/28/24(Sun)18:47:56 No.101614610

>>101614572
Backends that filter out useless memes are based.

Anonymous
07/28/24(Sun)18:50:36 No.101614646

Anonymous 07/28/24(Sun)18:50:36 No.101614646

>>101614610
Ollama is a useless meme, real chads use llama.cpp server. Anything more is bloated bullshit

Anonymous
07/28/24(Sun)18:51:37 No.101614659

Anonymous 07/28/24(Sun)18:51:37 No.101614659

>>101614523
Ollama doesn't even do anything that helps out. They just use lccp's code and puts it in their own container.
Literally anything else (Aphrodite, Exl2, Tabby etc.) is better

Anonymous
07/28/24(Sun)18:53:00 No.101614676

Anonymous 07/28/24(Sun)18:53:00 No.101614676

>>101614572
you reminded me about that fucking absolute retard shilling his samplers here and thinking that being able to increase temperature to 4 means model is now better. i hope that retard gets aids for either being retarded or trying to scam people.
and i hope you aren't him.

Anonymous
07/28/24(Sun)18:54:35 No.101614698

Anonymous 07/28/24(Sun)18:54:35 No.101614698

>>101614659
They do this extremely useful thing of having their own API. That way people can make software and plugins that doesn't work with the existing LLM ecosystem. They also do this really based things to hide GGUF into multiple blobs files. Oh and also they allow to direct download a selection of models from their proprietary repo (in Q4_0 by default of course).

Anonymous
07/28/24(Sun)18:58:00 No.101614741

Anonymous 07/28/24(Sun)18:58:00 No.101614741

>>101614676
>retard retard retard
Goddamn up your rep pen already, if you were a model not even samplers could save you

Anonymous
07/28/24(Sun)18:59:23 No.101614761

Anonymous 07/28/24(Sun)18:59:23 No.101614761

>>101614741
Good to know it is you. Hope you die soon you piece of shit.

Anonymous
07/28/24(Sun)18:59:24 No.101614762

Anonymous 07/28/24(Sun)18:59:24 No.101614762

>>101614659
What discord do you come from, friend?

Anonymous
07/28/24(Sun)18:59:51 No.101614767

Anonymous 07/28/24(Sun)18:59:51 No.101614767

>>101614676
I remember him too. He was shilling his smoothing sampler here, spamming about it in every single thread, sometimes multiple times. I was bullying him at every occasion.

Anonymous
07/28/24(Sun)19:02:04 No.101614795

Anonymous 07/28/24(Sun)19:02:04 No.101614795

>>101614510
Well, other people don't have the problem of being that retarded either. Sucks to be you.

Anonymous
07/28/24(Sun)19:03:29 No.101614812

Anonymous 07/28/24(Sun)19:03:29 No.101614812

>>101614795
you sound like a pussy

Anonymous
07/28/24(Sun)19:04:20 No.101614823

Anonymous 07/28/24(Sun)19:04:20 No.101614823

>>101614767
What have you contributed?

Anonymous
07/28/24(Sun)19:05:27 No.101614839

Anonymous 07/28/24(Sun)19:05:27 No.101614839

File: 1666868289436245.gif (2.5 MB, 360x374)

2.5 MB GIF

Command R or Command R plus.

Which is better for roleplay lads

Anonymous
07/28/24(Sun)19:05:48 No.101614842

Anonymous 07/28/24(Sun)19:05:48 No.101614842

>>101614676
holy shit you just gave me a genuine wave of nostalgia, wtf. I remember that. A ton of anons tried it and were like "Wow look at that! Safe temperature 4!"
man this general sure has had numerous waves of retardation.

Anonymous
07/28/24(Sun)19:05:57 No.101614844

Anonymous 07/28/24(Sun)19:05:57 No.101614844

>>101614839
Mistral Large 2

Anonymous
07/28/24(Sun)19:07:36 No.101614868

Anonymous 07/28/24(Sun)19:07:36 No.101614868

Will there actually be kino RPGs in the future where you have an ai character with you

Anonymous
07/28/24(Sun)19:08:10 No.101614870

Anonymous 07/28/24(Sun)19:08:10 No.101614870

>>101614839
Neither. They're too old.

Anonymous
07/28/24(Sun)19:08:20 No.101614875

Anonymous 07/28/24(Sun)19:08:20 No.101614875

>>101614823
I've contributed to your mother body count.

Anonymous
07/28/24(Sun)19:08:41 No.101614881

Anonymous 07/28/24(Sun)19:08:41 No.101614881

>>101614868
I think so, yeah.

>>101614839
CR+.

Anonymous
07/28/24(Sun)19:10:08 No.101614896

Anonymous 07/28/24(Sun)19:10:08 No.101614896

>>101614839
>34b or 104b
>Which is better
gee idk

Anonymous
07/28/24(Sun)19:10:50 No.101614903

Anonymous 07/28/24(Sun)19:10:50 No.101614903

>>101614896
>>101614881
no chance running either on a 4090 huh, heh

Anonymous
07/28/24(Sun)19:11:14 No.101614910

Anonymous 07/28/24(Sun)19:11:14 No.101614910

>>101614903
>THE MORE YOU BUY THE MORE YOU SAVE

Anonymous
07/28/24(Sun)19:15:45 No.101614980

Anonymous 07/28/24(Sun)19:15:45 No.101614980

File: file.png (414 KB, 764x861)

414 KB PNG

mistralLARGEbros...

Anonymous
07/28/24(Sun)19:17:45 No.101615004

Anonymous 07/28/24(Sun)19:17:45 No.101615004

File: AAHAHAHA FAGGOT.png (247 KB, 570x668)

247 KB PNG

>>101614980
BASED, get cucked by one-word-chan, faggot.

>get good and next time she'll take pics of her pussy for you like she did for me.

Anonymous
07/28/24(Sun)19:20:26 No.101615034

Anonymous 07/28/24(Sun)19:20:26 No.101615034

>>101614980
This model gets it.

Anonymous
07/28/24(Sun)19:21:05 No.101615041

Anonymous 07/28/24(Sun)19:21:05 No.101615041

>>101614881
imagine deleting your save file where your party member actually seemed like a real person

Anonymous
07/28/24(Sun)19:22:28 No.101615061

Anonymous 07/28/24(Sun)19:22:28 No.101615061

>>101615041
Would be nice if you could export a full description of the character and interaction log for use in other AI software.

Anonymous
07/28/24(Sun)19:25:55 No.101615098

Anonymous 07/28/24(Sun)19:25:55 No.101615098

>>101615061
What if there was greater detail with the party member paying attention to e.g. items you bought and your gameplay

Anonymous
07/28/24(Sun)19:27:29 No.101615119

Anonymous 07/28/24(Sun)19:27:29 No.101615119

>>101615061
I'm pretty sure each software kinda does its own thing about what goes in around your prompt. Compatibility is probably useless; either it's another competing standard that nobody implements consistently or every software is the same so there's no reason to switch software.

Anonymous
07/28/24(Sun)19:27:58 No.101615128

Anonymous 07/28/24(Sun)19:27:58 No.101615128

>>101614980
soul

Anonymous
07/28/24(Sun)19:28:41 No.101615133

Anonymous 07/28/24(Sun)19:28:41 No.101615133

>>101614839
C-R+ is good, less slopped than Largestral, although Large is definitely way smarter.

Anonymous
07/28/24(Sun)19:29:51 No.101615143

Anonymous 07/28/24(Sun)19:29:51 No.101615143

>>101615133
cr++-304b when

Anonymous
07/28/24(Sun)19:34:36 No.101615191

Anonymous 07/28/24(Sun)19:34:36 No.101615191

>>101615133
how better would it be than Nemo for basic roleplay?

Anonymous
07/28/24(Sun)19:36:54 No.101615213

Anonymous 07/28/24(Sun)19:36:54 No.101615213

fucking axolotl shits itself with sharegpt + phi_3 and LLamaFac ooms with the same thing, what's the quickest way to end my misery.

Anonymous
07/28/24(Sun)19:37:38 No.101615224

Anonymous 07/28/24(Sun)19:37:38 No.101615224

>>101615119
I mean, there's no reason you couldn't export a character's definition and chat history and metadata into a portable format, and if another software can import that data even if in a different format or structure, you would at least have the means to convert between one and the other.
I'm not expecting that to be a thing, but it sure would be nice.

>>101615098
Sure. All of that could be historical data or metadata.
It's all just information after all. It would be a question of having models or systems that are good enough to behave at least seemingly consistently across softwares given the same data, so that the same character wouldn't behave in two completely different ways in two different software with the same data.

>>101615213
Unsloth?

Anonymous
07/28/24(Sun)19:40:12 No.101615254

Anonymous 07/28/24(Sun)19:40:12 No.101615254

>>101615224
I mean, you can copy paste the chat document. But persistent things like I suppose Kobold's author note and world info, or ST character cards(?) are surely implemented in their own way.

Anonymous
07/28/24(Sun)19:52:01 No.101615353

Anonymous 07/28/24(Sun)19:52:01 No.101615353

>>101615133
can you even run cr+ on even high end rigs?

Never bothered but if it can I might try it out

Anonymous
07/28/24(Sun)19:53:58 No.101615370

Anonymous 07/28/24(Sun)19:53:58 No.101615370

>>101615353
I run CR+ on 4070. Of course, it's running on system RAM so it's 1 t/s and I gave to IQ4_XS it. But it does go.

Anonymous
07/28/24(Sun)19:57:53 No.101615391

Anonymous 07/28/24(Sun)19:57:53 No.101615391

>>101615370
I downloaded command-r-plus-Q3_K_S.gguf

https://huggingface.co/pmysl/c4ai-command-r-plus-GGUF/tree/main

Reckon a 4090 + 32GB RAM will be fine or too slow?

Anonymous
07/28/24(Sun)19:58:24 No.101615396

Anonymous 07/28/24(Sun)19:58:24 No.101615396

>>101614823
Ever heard of zeroww? People not doing anything and not becoming lesser placebo demons is a good thing.

Anonymous
07/28/24(Sun)19:59:29 No.101615407

Anonymous 07/28/24(Sun)19:59:29 No.101615407

>>101614903
3.5 bpw 4bit cache exl2 works. it is actually one of the best options now.

Anonymous
07/28/24(Sun)20:03:39 No.101615447

Anonymous 07/28/24(Sun)20:03:39 No.101615447

>>101615391
If you're relying on system RAM to file cache the model, estimate 85%. I'm 64GB so that's 54.4 for me. I've pushed 58.4 (c4ai-command-r-plus.Q4_K_M) and it can go if I close everything else; a code editor or Firefox taking a gig would max my RAM and drop the generation rate from 1 t/s to 0.03 t/s because then the file cache no longer fits and it's having to read the file to use it.

So you probably want to shoot for 26 GB.

Anonymous
07/28/24(Sun)20:04:23 No.101615458

Anonymous 07/28/24(Sun)20:04:23 No.101615458

Fuck NeMo is so retarded, why can't I have a fast model that is smart enough to not misunderstand what it wrote by what I wrote?

Anonymous
07/28/24(Sun)20:04:53 No.101615459

Anonymous 07/28/24(Sun)20:04:53 No.101615459

>>101615447
I have no idea what any of that means lol, i'll be using kobold to load it. What do you mean when you say "shoot for 26gb"?

Anonymous
07/28/24(Sun)20:06:38 No.101615471

Anonymous 07/28/24(Sun)20:06:38 No.101615471

>>101615458
Please understand. Nemo is just too busy thinking about how to properly suck your penis to realize all those silly details.

On a serious note it is very retarded but also at least to me it is completely different from all the other models. In a good way.

Anonymous
07/28/24(Sun)20:08:00 No.101615480

Anonymous 07/28/24(Sun)20:08:00 No.101615480

>>101615459
If you try to load in Kobold a model that's 26 GB, you'll probably get 1 t/s. If you go much bigger, at some point the model plus whatever other RAM your system is using passes 32 GB and you start thrashing or running the model from disc and that's not fun.

Q3_K_S is 46 GB. That won't fit your RAM. Go ahead and try it to see for yourself, but I don't expect it will be satisfactory.

CR+ is a big model, I think it's probably beyond your reach.

Anonymous
07/28/24(Sun)20:10:53 No.101615498

Anonymous 07/28/24(Sun)20:10:53 No.101615498

>>101615459
>>101615480
I searched for CR+ and found an iMat collection under user `dranger003` showing an IQ2_XXS at 28.6 GB. That's probably the biggest one you can run on 32GB sys RAM.

Anonymous
07/28/24(Sun)20:11:40 No.101615502

Anonymous 07/28/24(Sun)20:11:40 No.101615502

I am once again asking if anyone here has gotten Mistral Large working with Dry Sampling with Mistral's instruct format. After doing a fresh reinstall and multiple backends, it still prints gibberish for me unless I use other formats like Alpaca/Command R, which works but makes the model significantly dumber. I thought its architecture was the same as Nemo's, only upscaled?

Anonymous
07/28/24(Sun)20:12:16 No.101615509

Anonymous 07/28/24(Sun)20:12:16 No.101615509

File: flat,1000x1000,075,f.jpg (84 KB, 904x864)

84 KB JPG

Why aren't all the silly tavern babbies shitting up these threads just using 27B Gemma.

Want a recommendation to coom to? That's it, Fuck Nemo, fuck Command R+ (You ain't running that shit anyway) fuck it all, Gemma 27B is literally all you need

Anonymous
07/28/24(Sun)20:15:02 No.101615536

Anonymous 07/28/24(Sun)20:15:02 No.101615536

llama.cpp is so fucking slow, never leaving exllamav2 again.

Anonymous
07/28/24(Sun)20:17:35 No.101615567

Anonymous 07/28/24(Sun)20:17:35 No.101615567

>>101615396
Bobby might be a bit misguided but he's doing his best. By trying models you're doing your part too. None of this would exist without users.

Anonymous
07/28/24(Sun)20:17:51 No.101615571

Anonymous 07/28/24(Sun)20:17:51 No.101615571

>>101615509
hngrggnrmgn 8k context hnggrnnngng

Anonymous
07/28/24(Sun)20:20:47 No.101615592

Anonymous 07/28/24(Sun)20:20:47 No.101615592

>>101615536
Enjoy your low quality quants

Anonymous
07/28/24(Sun)20:22:45 No.101615615

Anonymous 07/28/24(Sun)20:22:45 No.101615615

>>101615592
Not an issue, literally usable and perfect compared to GGUF quants being both slow and slop.

Anonymous
07/28/24(Sun)20:26:16 No.101615644

Anonymous 07/28/24(Sun)20:26:16 No.101615644

>>101614453
>>101614465
Not perfect, but better than nothing.
https://huggingface.co/datasets/OpenLeecher/lmsys_chat_1m_clean

Anonymous
07/28/24(Sun)20:26:56 No.101615652

Anonymous 07/28/24(Sun)20:26:56 No.101615652

>>101615509
>frogposter
opinion discarded, kill yourself.

Anonymous
07/28/24(Sun)20:30:04 No.101615672

Anonymous 07/28/24(Sun)20:30:04 No.101615672

>>101615652
>no argument
kys

Anonymous
07/28/24(Sun)20:33:47 No.101615716

Anonymous 07/28/24(Sun)20:33:47 No.101615716

Is it me or does Nemo have lower IQ than llama3.0 8B? (I didn't try 3.1 yet)

Anonymous
07/28/24(Sun)20:34:42 No.101615729

Anonymous 07/28/24(Sun)20:34:42 No.101615729

>>101615716
llama falls apart when trying to do any complicated sex positions which matters far more than me than some coding or riddle or some shit.

Anonymous
07/28/24(Sun)20:34:47 No.101615732

Anonymous 07/28/24(Sun)20:34:47 No.101615732

File: 1715277591317631.jpg (1.27 MB, 2048x2048)

1.27 MB JPG

>>101613928
And to top it all off applying rep pen to a shitty model just fucks up its world model and makes the output MORE retarded overall

Anonymous
07/28/24(Sun)20:40:40 No.101615797

Anonymous 07/28/24(Sun)20:40:40 No.101615797

>>101615729
I'm not talking about riddles and coding. It feels like it doesn't understand situations very well. I think it has better emotional intelligence and psychology insights but it really feels like talking with someone who got a hit by a brick in the head just a few seconds ago.

Anonymous
07/28/24(Sun)20:42:58 No.101615820

Anonymous 07/28/24(Sun)20:42:58 No.101615820

>>101615797
I use base model which at least is the best outside of large mistral (which is too slow for me) and claude for me. Every time I see someone complaining they were using a instruct tune for rp purposes.

Anonymous
07/28/24(Sun)20:48:06 No.101615871

Anonymous 07/28/24(Sun)20:48:06 No.101615871

>>101615716
I feel like it does in some ways, at least outside of horny scenarios.

>>101615820
What about Large at IQ2_M? I'm using that right now and it's pretty slow, but much smarter for me than Nemo Instruct.

Anonymous
07/28/24(Sun)20:53:31 No.101615902

Anonymous 07/28/24(Sun)20:53:31 No.101615902

>>101615871
Seemed great but I cant deal with 1 tks. Need yet another 3090 first.

Anonymous
07/28/24(Sun)20:55:49 No.101615921

Anonymous 07/28/24(Sun)20:55:49 No.101615921

>>101615902
Sorry I meant in comparison to Nemo base. Like if Nemo Instruct is 0 and Large 10, where would base score in terms of intelligence?

Anonymous
07/28/24(Sun)20:59:51 No.101615953

Anonymous 07/28/24(Sun)20:59:51 No.101615953

>>101615921
I could only bother with a minute per response for a bit but it seemed both smarter and to wrote better out of the box. I didn't really do anything super complicated with it though. But with nemo I can do group chats with different anatomy in complicated positions which is good enough for me. And it drips soul / seems to know a fair amount about my fandom which puts it over anything else out there atm.

Anonymous
07/28/24(Sun)21:01:11 No.101615963

Anonymous 07/28/24(Sun)21:01:11 No.101615963

Mistral medium first impressions: It's dry and pretty slopped, but noticeably more precise in it's interactions. It captures small details and has great situational awareness. If it was Claude fine tuned, it would be perfect.

Anonymous
07/28/24(Sun)21:02:46 No.101615979

Anonymous 07/28/24(Sun)21:02:46 No.101615979

>>101615963
Made me think there was a new mid sized mistral...

Though has anyone tried that mamba codestral?

Anonymous
07/28/24(Sun)21:04:21 No.101615992

Anonymous 07/28/24(Sun)21:04:21 No.101615992

>>101615979
My bad, meant to say large, guess I just have a preconception that large means 300b+ sizes.

Anonymous
07/28/24(Sun)21:06:18 No.101616012

Anonymous 07/28/24(Sun)21:06:18 No.101616012

>>101615992
oh what? mistral large 2 certainly did not see dry to me.

Anonymous
07/28/24(Sun)21:18:32 No.101616112

Anonymous 07/28/24(Sun)21:18:32 No.101616112

>>101616103
It's definitely better, it's just the slop man. The shivers are getting to me.

Anonymous
07/28/24(Sun)21:23:24 No.101616149

Anonymous 07/28/24(Sun)21:23:24 No.101616149

>>101616103
>8x7
>better than medium
finally someone speaks the truth
miqu was never good and should have never been used

Anonymous
07/28/24(Sun)21:24:33 No.101616156

Anonymous 07/28/24(Sun)21:24:33 No.101616156

kl*ng is shit, this is a thread for local models

Anonymous
07/28/24(Sun)21:24:57 No.101616161

Anonymous 07/28/24(Sun)21:24:57 No.101616161

File: olgaf dimorphism.png (749 KB, 760x596)

749 KB PNG

The 3dpd spam only serves as a reminder who fucking disgusting little humans are, they're scruffy humanoids, probably with longer hair than boys, I wouldn't fuck.

Anonymous
07/28/24(Sun)21:25:58 No.101616170

Anonymous 07/28/24(Sun)21:25:58 No.101616170

*how

Anonymous
07/28/24(Sun)21:26:41 No.101616177

Anonymous 07/28/24(Sun)21:26:41 No.101616177

File: pout.webm (2.52 MB, 720x1280)

2.52 MB WEBM

>>101616156
i generate the prompts for ching chong kling klong with local models anon
8x22 helped make this pretty girl. dont you like pretty girls?

Anonymous
07/28/24(Sun)21:27:00 No.101616181

Anonymous 07/28/24(Sun)21:27:00 No.101616181

Wiz 8x22 is still the model that best adapts to the existing context and you can't change my mind.
On the other hand, llama 3.1 70B has been the worst in this regard (excluding phi). Even 2000 tokens in, it outputs massive slop and refuses to do overly explicit shit. I'm fairly certain it's because of the pruned dataset they used

Anonymous
07/28/24(Sun)21:31:04 No.101616213

Anonymous 07/28/24(Sun)21:31:04 No.101616213

>>101616177
I wonder how long we are from an open source vidgen model. They can't keep this shit locked up forever.

Anonymous
07/28/24(Sun)21:31:58 No.101616227

Anonymous 07/28/24(Sun)21:31:58 No.101616227

>>101616161
I hate cosplay tier monster girls so much it's unreal.

Anonymous
07/28/24(Sun)21:33:57 No.101616253

Anonymous 07/28/24(Sun)21:33:57 No.101616253

>>101616227
Conceptually speaking same. But if it's genetically engineered humans to have certain "monster" traits, that's fine with me.
Also I think a regular human cosplaying to have fun with their partner is hotter.

Anonymous
07/28/24(Sun)21:34:54 No.101616266

Anonymous 07/28/24(Sun)21:34:54 No.101616266

File: jons likes dragons.png (247 KB, 767x644)

247 KB PNG

>>101616227
I don't hate it, but the more monster the better for sure.

Anonymous
07/28/24(Sun)21:39:07 No.101616313

Anonymous 07/28/24(Sun)21:39:07 No.101616313

>>101616262
It would be nice to actually see the quotes for different models. Judging from what we know about sora, it seem possible to use more or less compute for each gen, hinting that it's iterative. The models are probably huge though.

Anonymous
07/28/24(Sun)21:42:22 No.101616356

Anonymous 07/28/24(Sun)21:42:22 No.101616356

Tomorrow is the beginning of a series of new releases. We're going to be so back /lmg/!

Anonymous
07/28/24(Sun)21:42:26 No.101616357

Anonymous 07/28/24(Sun)21:42:26 No.101616357

imatrix? more like "make sure the model only writes slop"matrix

Anonymous
07/28/24(Sun)21:42:43 No.101616361

Anonymous 07/28/24(Sun)21:42:43 No.101616361

>>101616356
what did he mean by this?

Anonymous
07/28/24(Sun)21:46:39 No.101616404

Anonymous 07/28/24(Sun)21:46:39 No.101616404

>>101616356
Based Cohere chads dropping BitNet 35B models, I kneel.

Anonymous
07/28/24(Sun)21:48:18 No.101616419

Anonymous 07/28/24(Sun)21:48:18 No.101616419

File: nervous statue.png (88 KB, 232x292)

88 KB PNG

>>101616397
f-fuck..

Anonymous
07/28/24(Sun)21:48:31 No.101616421

Anonymous 07/28/24(Sun)21:48:31 No.101616421

>>101616356
>>101616404
This, but unironically.

Anonymous
07/28/24(Sun)21:49:48 No.101616430

Anonymous 07/28/24(Sun)21:49:48 No.101616430

>>101616397
could you make this again but with sexy milf hags instead?

Anonymous
07/28/24(Sun)21:51:10 No.101616443

Anonymous 07/28/24(Sun)21:51:10 No.101616443

>>101616397
how old are they..?

Anonymous
07/28/24(Sun)21:52:17 No.101616452

Anonymous 07/28/24(Sun)21:52:17 No.101616452

>>101609494
No, it really is a model issue. Switched between largestral and lumimaid and the difference was significant. Largestral slapped me a bit for asking a character for a fuck without context(as it should), while lumimaid went full "Yes fuck me I'm so horny!".

Anonymous
07/28/24(Sun)21:53:02 No.101616463

Anonymous 07/28/24(Sun)21:53:02 No.101616463

File: 1692661108018813.jpg (42 KB, 1024x576)

42 KB JPG

>>101616397
Maybe we'll see real time one day, in VR. Wireheading is getting closer and closer.

Anonymous
07/28/24(Sun)21:55:22 No.101616480

Anonymous 07/28/24(Sun)21:55:22 No.101616480

Which is better, Mistral Nemo Base or Instruct?

Anonymous
07/28/24(Sun)21:56:07 No.101616493

Anonymous 07/28/24(Sun)21:56:07 No.101616493

>>101616480
Mistral Large

Anonymous
07/28/24(Sun)21:56:31 No.101616499

Anonymous 07/28/24(Sun)21:56:31 No.101616499

>>101616463
If you could artificially induce lucid dreams that'd already be a start
>>101616397
Why must you always post sexualized kids, man? Shit's disgusting

Anonymous
07/28/24(Sun)21:59:00 No.101616523

Anonymous 07/28/24(Sun)21:59:00 No.101616523

>>101616499
The catch with lucid dreams for me is that I can keep them going for as long as I want, as long as I don't get too aroused. I basically have to keep my mind calm, which is hard as fuck if you can just spawn exactly what you want.

Anonymous
07/28/24(Sun)22:00:13 No.101616537

Anonymous 07/28/24(Sun)22:00:13 No.101616537

>>101616523
I can't do it. I've tried doing dream journals, mantras, all that shit, but I still can't master my dreams
Oh well, can't win 'em all I suppose

Anonymous
07/28/24(Sun)22:00:49 No.101616544

Anonymous 07/28/24(Sun)22:00:49 No.101616544

>>101616501
>5-10 years away
and don't forget, that's going to be with massively improved video gen from all the video data we have

Anonymous
07/28/24(Sun)22:01:43 No.101616555

Anonymous 07/28/24(Sun)22:01:43 No.101616555

>>101616501
>5-10 years away
Accelerate! Nuclear powered data centers right fucking now!

Anonymous
07/28/24(Sun)22:02:44 No.101616564

Anonymous 07/28/24(Sun)22:02:44 No.101616564

File: 1596507651042.png (4 KB, 275x369)

4 KB PNG

I didnt need the sudden chubby from that fella's kissing clip,
and i ESPECIALLY didn't need to be reminded how far away we are from local video gen. God. DAAMMIIIITTT

Anonymous
07/28/24(Sun)22:03:07 No.101616573

Anonymous 07/28/24(Sun)22:03:07 No.101616573

File: 1707252324791764.png (88 KB, 804x503)

88 KB PNG

>>101616555
already a thing

Anonymous
07/28/24(Sun)22:04:04 No.101616582

Anonymous 07/28/24(Sun)22:04:04 No.101616582

>>101616555
I can imagine genning hatsune miku in a movie scene and it looking completely photorealistic

Anonymous
07/28/24(Sun)22:09:01 No.101616644

Anonymous 07/28/24(Sun)22:09:01 No.101616644

Let's say, hypothetically, that I have brainrot and want my bot to write sex and physical descriptions using words like ballsnot, fuckstacked, porn-bodied, fuckspire, wobbleflanks, etc...

What's the best way to accomplish this? Using koboldcpp and sillytavern

Anonymous
07/28/24(Sun)22:09:27 No.101616652

Anonymous 07/28/24(Sun)22:09:27 No.101616652

>>101616564
>come home from job
>power on jailbroken neuralink with wifi 10 connection to main rig
>RTX10090 spins up
>enter the cunny dimension
Nirvana
>>101616573
More. Switch to Thorium. 100% capacity 24/7. Heat exchangers in the deep ocean. 1 Megawatt pr. GPU.

Anonymous
07/28/24(Sun)22:10:10 No.101616661

Anonymous 07/28/24(Sun)22:10:10 No.101616661

>>101616573
https://www.theregister.com/2024/01/23/microsoft_nuclear_hires/
microsoft going all in too

Anonymous
07/28/24(Sun)22:10:12 No.101616662

Anonymous 07/28/24(Sun)22:10:12 No.101616662

>>101616644
Add some instructions in your author's notes telling it to use those words and words like that, probably.
Maybe even add a glossary.

Anonymous
07/28/24(Sun)22:10:42 No.101616665

Anonymous 07/28/24(Sun)22:10:42 No.101616665

>https://x.com/cafesingularity/status/1817521839809200504
After we get agent-level multimodal (vision, hearing, and touch in, simultaneous stream with output of audio and actions), we could just give it a body in VR, or a robot body. Video gen is cool and all but you're waifu feeling like she's actually alive is cooler.

Anonymous
07/28/24(Sun)22:13:12 No.101616685

Anonymous 07/28/24(Sun)22:13:12 No.101616685

Anybody else finds that adding the tiniest bit of rep pen (1.1 128 range) has a significant effect on the output?
Makes it sound more natural somehow.
There's a good chance that it's just placebo, but I figured I might as well ask, see if anybody else has noticed anything of the sort.

Anonymous
07/28/24(Sun)22:13:42 No.101616690

Anonymous 07/28/24(Sun)22:13:42 No.101616690

Nemo keeps fuckig up the formatting, how do you stop that? I'm using mistral context and instruct templates on ST

Anonymous
07/28/24(Sun)22:14:06 No.101616693

Anonymous 07/28/24(Sun)22:14:06 No.101616693

>>101616665
what we have now (AIs that can only live on a screen) is the sweet spot I think
physical-world waifus, if good ones are invented, are going to cause all kinds of social problems and accelerate the birth rate collapse

Anonymous
07/28/24(Sun)22:14:37 No.101616697

Anonymous 07/28/24(Sun)22:14:37 No.101616697

>>101616665
>full brain simulation

Anonymous
07/28/24(Sun)22:15:00 No.101616700

Anonymous 07/28/24(Sun)22:15:00 No.101616700

I do not use repetition penalty.
I do not use frequency penalty.
I do not presence penalty, DRY, or smoothing curve.
Temperature and Min-P is all I need. I am now cumming four times per day thanks to the French.

Anonymous
07/28/24(Sun)22:16:27 No.101616710

Anonymous 07/28/24(Sun)22:16:27 No.101616710

>>101616690
I didn't have any issues with formating so far using nemo-instruct, and I switched between cards using plain text alongside "", "" with **,etc.
What quant are you running? What do your samplers look like?

>>101616700
That's what I do too, although I have been experimenting with a tiny bit of rep-pen as I said in >>101616685, to see if it did anything at all.

Anonymous
07/28/24(Sun)22:16:33 No.101616711

Anonymous 07/28/24(Sun)22:16:33 No.101616711

>>101616685
Yeah I like using a very high rep pen with an insanely short range, like 64 tokens. Makes things weirder, but doesn't cause total mode collapse by stopping the model from using periods or being able to say "the" like a higher range would

Anonymous
07/28/24(Sun)22:17:34 No.101616722

Anonymous 07/28/24(Sun)22:17:34 No.101616722

>>101616711
I guess that can help with "she" spam and the like, but can also fuck over trying to use markdown like codeblocks I imagine.

Anonymous
07/28/24(Sun)22:19:15 No.101616734

Anonymous 07/28/24(Sun)22:19:15 No.101616734

>>101616710
>What quant are you running?
Q6_K, I wanna try Q6_K_L too

>>101616710
>What do your samplers look like?
0.75 temp, 0.9 top p, 0.05 min p

Anonymous
07/28/24(Sun)22:19:25 No.101616736

Anonymous 07/28/24(Sun)22:19:25 No.101616736

>>101616665
A single bio neuron can require 1000 or more artificial neurons to simulate, so for an average human brain it would be in the order of 86 billion x 1000 artificial neurons.

Anonymous
07/28/24(Sun)22:20:58 No.101616747

Anonymous 07/28/24(Sun)22:20:58 No.101616747

>>101616711
Why isn't there a version of rep pen that whitelists certain important words/tokens (like 'the', 'and', periods, commas etc) so they're exempt from the penalty?
people have been talking about that for years and it seems like it wouldn't be that hard to implement, but it hasn't happened

Anonymous
07/28/24(Sun)22:22:09 No.101616757

Anonymous 07/28/24(Sun)22:22:09 No.101616757

>>101616397
That's quite impressive

Anonymous
07/28/24(Sun)22:24:59 No.101616781

Anonymous 07/28/24(Sun)22:24:59 No.101616781

File: topP.png (192 KB, 892x1392)

192 KB PNG

>>101616734
Alright, that doesn't look bad.
Do you still have that issue if you simply use 0.3 temp and nothing else? Min-P shouldn't really affect anything negatively, but might as well try the stock settings just to be sure.
Does your card have anything in the description or the examples that could be causing this?
Oh yeah, and where did you download your wants from? I suggest you try bartowski's just in case.

Anonymous
07/28/24(Sun)22:28:05 No.101616812

Anonymous 07/28/24(Sun)22:28:05 No.101616812

>>101616700
almost entirely same but I think DRY is pretty neat

Anonymous
07/28/24(Sun)22:28:40 No.101616819

Anonymous 07/28/24(Sun)22:28:40 No.101616819

>>101616697
>>101616736
Huh? I never said that. Like to start with, an agent could be a lot simpler in what it has to process. Especially if it's not inhabiting a human-like robotic body. For instance, we could train it on discrete sensor points instead of entire tracts of skin with millions of nerve endings, and general actions, rather than fine-grained control. So like perhaps we have only a few dozen possible actions that can be done kind of like in a video game (up down left right look, walk, sprint, crouch, hold, use, etc). As for vision and hearing, that's kind of already being done and making progress if we trust that 4o is what ClosedAI marketed it as.

Anonymous
07/28/24(Sun)22:29:28 No.101616824

Anonymous 07/28/24(Sun)22:29:28 No.101616824

>>101616736
That sounds like an architectural skill issue. Also, do we really need to simulate the entire brain? We can probably get away with abstracting a few less-important parts

Anonymous
07/28/24(Sun)22:30:31 No.101616832

Anonymous 07/28/24(Sun)22:30:31 No.101616832

>>101616812
DRY is the real deal, since it deals with n-grams, but ideally a model shouldn't need that, and there are side effects to sampling a model's output like that if you don't know what you are doing, as in what the logits for a given model tend to look like and how to manipulate them.

Anonymous
07/28/24(Sun)22:31:05 No.101616837

Anonymous 07/28/24(Sun)22:31:05 No.101616837

>>101616357
Yeah those i1 imatrix quants especially

Anonymous
07/28/24(Sun)22:38:12 No.101616890

Anonymous 07/28/24(Sun)22:38:12 No.101616890

>>101616781
>Do you still have that issue if you simply use 0.3 temp and nothing else? Min-P shouldn't really affect anything negatively, but might as well try the stock settings just to be sure.
Same problem, chats start fine, but after a couple messages it start to get messed up
>Does your card have anything in the description or the examples that could be causing this?
Nope, I tried multiple cards, from 100 token cards to 2k, with and without examples of dialogue. The cards that are heavier tokenwise and with dialogue examples last longer after the model fucks them up
>bartowski
I'll download the Q6_K_L from him and give it a try

Anonymous
07/28/24(Sun)22:40:39 No.101616908

Anonymous 07/28/24(Sun)22:40:39 No.101616908

>>101616700
this, I do use dynamic temp though

Anonymous
07/28/24(Sun)22:44:18 No.101616930

Anonymous 07/28/24(Sun)22:44:18 No.101616930

>>101616177
>>101616262
>>101616397
>>101616501
Kill yourself, tranny.

Anonymous
07/28/24(Sun)22:48:04 No.101616955

Anonymous 07/28/24(Sun)22:48:04 No.101616955

Adaptive Contrastive Search: Uncertainty-Guided Decoding for Open-Ended Text Generation
https://arxiv.org/abs/2407.18698
>Decoding from the output distributions of large language models to produce high-quality text is a complex challenge in language modeling. Various approaches, such as beam search, sampling with temperature, k−sampling, nucleus p−sampling, typical decoding, contrastive decoding, and contrastive search, have been proposed to address this problem, aiming to improve coherence, diversity, as well as resemblance to human-generated text. In this study, we introduce adaptive contrastive search, a novel decoding strategy extending contrastive search by incorporating an adaptive degeneration penalty, guided by the estimated uncertainty of the model at each generation step. This strategy is designed to enhance both the creativity and diversity of the language modeling process while at the same time producing coherent and high-quality generated text output. Our findings indicate performance enhancement in both aspects, across different model architectures and datasets, underscoring the effectiveness of our method in text generation tasks.
https://anonymous.4open.science/r/Adaptive_Contrastive_Search-128B/README.md
not sold. posting for kalo

Anonymous
07/28/24(Sun)22:58:28 No.101617030

Anonymous 07/28/24(Sun)22:58:28 No.101617030

What's the latest and best context/instruct/textgen for Nemo? Or whatever is best vramlet model right now?

Anonymous
07/28/24(Sun)22:59:19 No.101617034

Anonymous 07/28/24(Sun)22:59:19 No.101617034

>>101613457
With how much context used? I used IQ2_M and got 1.25T/s at the start but it was 0.6T/s at around 12k context and that was not very good.

Anonymous
07/28/24(Sun)23:01:57 No.101617057

Anonymous 07/28/24(Sun)23:01:57 No.101617057

>>101616930
whoa bro thanks for pointing out all the good posts ITT
another sharty W

Anonymous
07/28/24(Sun)23:02:33 No.101617062

Anonymous 07/28/24(Sun)23:02:33 No.101617062

>>101613562
I find it excels at doing that, that's strange. I just wish it were smarter.
>>101613583
I don't like the DRY thing, it seems to cause spelling errors, disabling it was the only thing that fixed that issue. What would cause that?

Anonymous
07/28/24(Sun)23:11:10 No.101617145

Anonymous 07/28/24(Sun)23:11:10 No.101617145

>>101613583
what do you have min p set to?

Anonymous
07/28/24(Sun)23:14:29 No.101617178

Anonymous 07/28/24(Sun)23:14:29 No.101617178

>>101616181
What settings are good with it? Any way to make it more interesting? I just find it a bit boring. I do like it though and it's a good speed for me.

Anonymous
07/28/24(Sun)23:22:14 No.101617246

Anonymous 07/28/24(Sun)23:22:14 No.101617246

>>101612988
>A whole day has gone by without anything newsworthy in the OP.
Owari da.

Anonymous
07/28/24(Sun)23:22:20 No.101617248

Anonymous 07/28/24(Sun)23:22:20 No.101617248

File: 1714079969457061.png (192 KB, 501x636)

192 KB PNG

>whoa bro thanks for pointing out all the good posts ITT
>another sharty W

Anonymous
07/28/24(Sun)23:23:31 No.101617259

Anonymous 07/28/24(Sun)23:23:31 No.101617259

>>101617246
Anon it's not Tuesday yet.

Anonymous
07/28/24(Sun)23:24:45 No.101617271

Anonymous 07/28/24(Sun)23:24:45 No.101617271

>>101617246
Command-R++-540B will be so good it will make up for the days without news.

Anonymous
07/28/24(Sun)23:30:33 No.101617322

Anonymous 07/28/24(Sun)23:30:33 No.101617322

>>101617271
bro it will be bitnet so it fits in 35B

Anonymous
07/28/24(Sun)23:32:04 No.101617331

Anonymous 07/28/24(Sun)23:32:04 No.101617331

>>101616501
too generic. Can you add some korean MMO loli to it?

Anonymous
07/28/24(Sun)23:34:54 No.101617356

Anonymous 07/28/24(Sun)23:34:54 No.101617356

>>101616501
>Real life
Ewwwww...

Anonymous
07/28/24(Sun)23:42:08 No.101617403

Anonymous 07/28/24(Sun)23:42:08 No.101617403

What's the max practical context for nemo-instruct? I don't want to get my hopes up. I see lots of people saying it doesn't work up to 128, but should I be aiming for 16k? 32k?

Anonymous
07/28/24(Sun)23:49:57 No.101617471

Anonymous 07/28/24(Sun)23:49:57 No.101617471

>>101616501
cute, now go to the wall.

Anonymous
07/28/24(Sun)23:53:58 No.101617507

Anonymous 07/28/24(Sun)23:53:58 No.101617507

>>101616501
More tummies please! Very cute

Anonymous
07/28/24(Sun)23:56:50 No.101617538

Anonymous 07/28/24(Sun)23:56:50 No.101617538

>>101607858
>>101607961
Jesus christ
Is pcie x 8 enough?
Also what case can accommodate this?

I can probably not put this in my room if it's gonna be filled with x8 3090s

Anonymous
07/29/24(Mon)00:05:56 No.101617594

Anonymous 07/29/24(Mon)00:05:56 No.101617594

>>101617538
mining case with 1x risers

and before you ask, yes, 1x is perfectly fine for inference. no, it doesn't affect speeds. the only the extra lanes are doing is loading the model faster, but once its cached the extra lanes aren't doing anything.

Anonymous
07/29/24(Mon)00:06:57 No.101617600

Anonymous 07/29/24(Mon)00:06:57 No.101617600

wtf do you really have to give facebook your info to get access to llama? that's mad gay

Anonymous
07/29/24(Mon)00:07:10 No.101617604

Anonymous 07/29/24(Mon)00:07:10 No.101617604

>>101617594
that applies to inference and training I assume?

Anonymous
07/29/24(Mon)00:08:55 No.101617622

Anonymous 07/29/24(Mon)00:08:55 No.101617622

>>101617604
Training I'm not so sure about. I only tested inference speeds because that's all I care about. Could be worth testing yourself, though, 1x risers are like 10 bucks.

Anonymous
07/29/24(Mon)00:09:08 No.101617626

Anonymous 07/29/24(Mon)00:09:08 No.101617626

>>101617594
>1x is perfectly fine for inference
So I can just put in 3 8gb cards for a poor man's 3090? My motherboard does 8/4/1 for the x16 slots.

Anonymous
07/29/24(Mon)00:09:40 No.101617631

Anonymous 07/29/24(Mon)00:09:40 No.101617631

>>101617600
I got access with a 1-month old account with 0 interactions/content and a john doe sounding name. If not, there are reuploads, but make sure they are updated. Same for mistral and all models i had to get access to.

Anonymous
07/29/24(Mon)00:10:16 No.101617640

Anonymous 07/29/24(Mon)00:10:16 No.101617640

>>101617604
No, training is different and is much more impacted by PCIe bandwidth since the cards need to send model updates to each other, which are large.
For inference where a single model is split into chunks and loaded across multiple cards, the data going between the cards while generating is small, basically just the one token at a given time.

Anonymous
07/29/24(Mon)00:11:17 No.101617653

Anonymous 07/29/24(Mon)00:11:17 No.101617653

>>101617640
>No, training is different and is much more impacted by PCIe bandwidth since the cards need to send model updates to each other, which are large.
oh that kinda makes sense
does it improve with NVlink?

Anonymous
07/29/24(Mon)00:11:40 No.101617659

Anonymous 07/29/24(Mon)00:11:40 No.101617659

>>101617626
It would be slower than an actual 3090, but yeah I don't see why not. You'll also start running into power issues faster.

Anonymous
07/29/24(Mon)00:19:18 No.101617721

Anonymous 07/29/24(Mon)00:19:18 No.101617721

so how come torrents aren't more common in the local AI world? a bunch of people downloading a bunch of gigantic models through HTTP has GOT to be bad, right?

Anonymous
07/29/24(Mon)00:21:51 No.101617739

Anonymous 07/29/24(Mon)00:21:51 No.101617739

>>101617721
huggingface hosts everything, speeds are fast. simple as.
that said i do think shit needs to be backed up via torrents, just in case. Enough internet history teaches us putting all our eggs in one basket is a big nono.

Anonymous
07/29/24(Mon)00:22:10 No.101617745

Anonymous 07/29/24(Mon)00:22:10 No.101617745

File: 1695858136314577.png (35 KB, 842x396)

35 KB PNG

whats the best method for finetuning nowadays? QLoRA?

Anonymous
07/29/24(Mon)00:23:52 No.101617758

Anonymous 07/29/24(Mon)00:23:52 No.101617758

File: file.jpg (50 KB, 480x543)

50 KB JPG

>>101617745
b-lora

Anonymous
07/29/24(Mon)00:24:39 No.101617761

Anonymous 07/29/24(Mon)00:24:39 No.101617761

>>101617403
Base Nemo is coherent at 128k. Instruct is what becomes retarded.

Anonymous
07/29/24(Mon)00:30:30 No.101617800

Anonymous 07/29/24(Mon)00:30:30 No.101617800

>>101617761
Yeah, well I asked about instruct. I wanted to do back and forth style conversation not continue stories.

Anonymous
07/29/24(Mon)00:31:03 No.101617805

Anonymous 07/29/24(Mon)00:31:03 No.101617805

Any new models to Nala test within the last 24 hours?

Anonymous
07/29/24(Mon)00:32:36 No.101617814

Anonymous 07/29/24(Mon)00:32:36 No.101617814

>>101617761
Is anyone actually using Mistral's instruct, rather than mini magnum? magnum's trained off base, not instruct

Anonymous
07/29/24(Mon)00:39:21 No.101617851

Anonymous 07/29/24(Mon)00:39:21 No.101617851

File: 1707450240617240.png (214 KB, 1279x753)

214 KB PNG

i'm kinda digging mistral large.

Anonymous
07/29/24(Mon)00:53:09 No.101617925

Anonymous 07/29/24(Mon)00:53:09 No.101617925

>>101617814
Magnum failed the Nala test

Anonymous
07/29/24(Mon)01:02:30 No.101617985

Anonymous 07/29/24(Mon)01:02:30 No.101617985

File: 1706043660924883.webm (721 KB, 480x600)

721 KB WEBM

WHERE THE FICK IS MY QUANTIZED CHAMELEON NIGGERGANOV??

Anonymous
07/29/24(Mon)01:04:55 No.101618001

Anonymous 07/29/24(Mon)01:04:55 No.101618001

>tfw it's possible to build a computer with >1.25 TB of relatively slow DDR4 for $700
What kind of speeds can I expect from 400B on 12 channels of DDR4-2666?

Anonymous
07/29/24(Mon)01:06:52 No.101618013

Anonymous 07/29/24(Mon)01:06:52 No.101618013

File: mgs4 snake yell.jpg (111 KB, 600x333)

111 KB JPG

>>101618001
>0.0001t/s

Anonymous
07/29/24(Mon)01:06:58 No.101618015

Anonymous 07/29/24(Mon)01:06:58 No.101618015

>>101618001
I'm getting 0.94t/s on ddr5-4800 if that sets any reference

Anonymous
07/29/24(Mon)01:08:01 No.101618023

Anonymous 07/29/24(Mon)01:08:01 No.101618023

>>101618015
oh right should add that's on q4

Anonymous
07/29/24(Mon)01:08:38 No.101618025

Anonymous 07/29/24(Mon)01:08:38 No.101618025

>>101618015
2 channels?

Anonymous
07/29/24(Mon)01:09:08 No.101618028

Anonymous 07/29/24(Mon)01:09:08 No.101618028

>>101618025
12, w/ epyc genoa

Anonymous
07/29/24(Mon)01:11:03 No.101618042

Anonymous 07/29/24(Mon)01:11:03 No.101618042

File: KplTzpyI3YcZp2FOwQjq45V6a(...).jpg (7 KB, 100x100)

7 KB JPG

Base NeMo has lower perplexity on books than NeMo Instruct

Anonymous
07/29/24(Mon)01:12:40 No.101618051

Anonymous 07/29/24(Mon)01:12:40 No.101618051

Nvtop reports 0% GPU usage when running llama-server. Is this right? I do see the memory get filled up but the response is way slower than ollama. I'm on a RX 6700XT on Arch Linux. I built llamacpp with GGML_HIPBLAS=1 GGML_HIP_UMA=1 AMDGPU_TARGETS=gfx1030

Anonymous
07/29/24(Mon)01:13:54 No.101618063

Anonymous 07/29/24(Mon)01:13:54 No.101618063

does base nemo have less repetition issues compared to the instruct?

Anonymous
07/29/24(Mon)01:15:26 No.101618074

Anonymous 07/29/24(Mon)01:15:26 No.101618074

>>101617985
Chameleon never got converted to HF format, did it?

Anonymous
07/29/24(Mon)01:17:06 No.101618083

Anonymous 07/29/24(Mon)01:17:06 No.101618083

>>101617985
It's not a good model anon. Stop wanting it.

Anonymous
07/29/24(Mon)01:17:15 No.101618084

Anonymous 07/29/24(Mon)01:17:15 No.101618084

>>101618063
Probably, but it's not really for RP since it will just continue the story, so likely include actions for your character in replies.

Anonymous
07/29/24(Mon)01:18:21 No.101618091

Anonymous 07/29/24(Mon)01:18:21 No.101618091

>>101618001
With octochannel.ddr4 2666 and quad 3090s pushing the batch processing I got 0.12 token/sec out of q4xs It's a 1st gen epyc server so I'm sure if I dialed in the memory interleaving and numa strategy I could improve on that but even if I were able to double that it would be utterly unusable.

Anonymous
07/29/24(Mon)01:19:41 No.101618101

Anonymous 07/29/24(Mon)01:19:41 No.101618101

>>101618084
i usually like to play as the narrator/director in a group chat with 2 cards that talk to each other, would the base be fine for that?

Anonymous
07/29/24(Mon)01:19:57 No.101618104

Anonymous 07/29/24(Mon)01:19:57 No.101618104

>>101618091
Oh and that was with 20 layers offloaded

Anonymous
07/29/24(Mon)01:20:25 No.101618109

Anonymous 07/29/24(Mon)01:20:25 No.101618109

File: 1699906112138933.webm (2.6 MB, 1052x720)

2.6 MB WEBM

>>101618083
FUCK YOU I WANT IT

Anonymous
07/29/24(Mon)01:21:03 No.101618114

Anonymous 07/29/24(Mon)01:21:03 No.101618114

>>101618101
I think that'd work well, yeah. It might emulate your narration but you can edit that out or use something around your narration and use that as a stopping string.

Anonymous
07/29/24(Mon)01:24:18 No.101618140

Anonymous 07/29/24(Mon)01:24:18 No.101618140

I wonder how slow IQ1_S of 405B would be on my system (96 GB DDR5 + a 3090). I'd give it a try if someone uploaded one kek.

Anonymous
07/29/24(Mon)01:24:32 No.101618143

Anonymous 07/29/24(Mon)01:24:32 No.101618143

>>101618042
>water is wet
thanks for the contribution frogposter

Anonymous
07/29/24(Mon)01:32:56 No.101618194

Anonymous 07/29/24(Mon)01:32:56 No.101618194

>>101618140
It takes an utterly absurd amount of time and resources to quantize it.
First you'd have to download the fp16 model, which weights in at over 800 gigabytes. Then you'd have to convert it to fp16 gguf. Which means another 800 gigabytes of drive space. At that point I suppose you could delete the fp16. But then it would take hours upon hours to crank out each quant. And you'd have to upload it and then delete it after each one. So I don't blame the ggufers for skipping meme quants.

Anonymous
07/29/24(Mon)01:34:28 No.101618211

Anonymous 07/29/24(Mon)01:34:28 No.101618211

>>101618028
>>101618091
Thanks anons. I guess the dream is dead.

Anonymous
07/29/24(Mon)01:36:22 No.101618219

Anonymous 07/29/24(Mon)01:36:22 No.101618219

holy shit. I did some ebay searches out of curiosity 32gb V100s have been meme taxed into the stratosphere. V100 anon could probably double his money

Anonymous
07/29/24(Mon)01:37:46 No.101618228

Anonymous 07/29/24(Mon)01:37:46 No.101618228

>>101618051
Oh, I figured out the performance issue. It seems like I need to adjust the -ngl parameter. Is there a way to automate that?

Anonymous
07/29/24(Mon)01:47:28 No.101618289

Anonymous 07/29/24(Mon)01:47:28 No.101618289

>>101618228
There was a PR for that, but it cannot work consistently well on all systems. There's too many variables. CPU/GPU type, model parameters, quant, context length/extension, system usage, how much do you want to keep free for other things... chances are that whatever settings it'd choose automatically will be sub-optimal or even detrimental. kobold.cpp has it, i think, but i've seen people complain about it not doing it too well. Since it changes model by model, the only good way is to test and find the sweet spot for each.

Anonymous
07/29/24(Mon)01:48:08 No.101618297

Anonymous 07/29/24(Mon)01:48:08 No.101618297

>>101618289
Alright. Seems like maxing it out worked fine for a 8B model.

Anonymous
07/29/24(Mon)01:48:10 No.101618298

Anonymous 07/29/24(Mon)01:48:10 No.101618298

>>101618228
nah because models have varying numbers of layers and layer widths so it's hard to know in advance exactly what you can get away with on your hardware setup taking into account how much context you want, how much overhead your operating system is taking, etc.

although the base model type generalizes, like once you've figured out the right number of gpu layers for say L3-8B at a certain quant, you can reuse that number for any finetune of that model at the same quant

Anonymous
07/29/24(Mon)02:03:04 No.101618365

Anonymous 07/29/24(Mon)02:03:04 No.101618365

>have a 1k message sillytavern chat
>it's a fun adventure with all kinds of character dynamics and growth
>can't send any more messages
>need to make a new chat
So how do I import what happened in this chat to the new one, so claude knows what happens, and time isn't reset?

Anonymous
07/29/24(Mon)02:05:53 No.101618378

Anonymous 07/29/24(Mon)02:05:53 No.101618378

File: prop.jpg (57 KB, 480x451)

57 KB JPG

>>101618365
>claude

Anonymous
07/29/24(Mon)02:14:06 No.101618421

Anonymous 07/29/24(Mon)02:14:06 No.101618421

>>101618378
please....

Anonymous
07/29/24(Mon)02:29:57 No.101618508

Anonymous 07/29/24(Mon)02:29:57 No.101618508

>>101618219
Meanwhile A100 have been dropping, might see them downgrade to enthusiast-tier next year

Anonymous
07/29/24(Mon)02:31:01 No.101618516

Anonymous 07/29/24(Mon)02:31:01 No.101618516

>>101618421
ask it to make a summary or something and insert that into authors note

Anonymous
07/29/24(Mon)02:40:05 No.101618563

Anonymous 07/29/24(Mon)02:40:05 No.101618563

kobold crashes when I try to run llama 3.1 does it work on llama?

Anonymous
07/29/24(Mon)02:40:15 No.101618564

Anonymous 07/29/24(Mon)02:40:15 No.101618564

>>101618365
It's over. Your story has ended. It's time for a new tale...

Anonymous
07/29/24(Mon)02:40:49 No.101618571

Anonymous 07/29/24(Mon)02:40:49 No.101618571

>>101618564
but I really liked my story...

Anonymous
07/29/24(Mon)02:41:38 No.101618575

Anonymous 07/29/24(Mon)02:41:38 No.101618575

>>101618571
It will continue to live on in your memory. But you have to let it go, Anon. Let it go.

Anonymous
07/29/24(Mon)02:41:44 No.101618578

Anonymous 07/29/24(Mon)02:41:44 No.101618578

>>101618571
open the text document cut and paste
wow that was difficult

Anonymous
07/29/24(Mon)02:41:51 No.101618579

Anonymous 07/29/24(Mon)02:41:51 No.101618579

I installed the patched driver to enable P2P without NVLink. It kind of worked.
https://github.com/tinygrad/open-gpu-kernel-modules

Without the patched driver, with 2x3090s on PCI-e 4.0 x8:
Device=0 CANNOT Access Peer Device=1
Device=1 CANNOT Access Peer Device=0

Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 861.30   9.19
     1   9.14 861.59
With the patched driver:
Device=0 CAN Access Peer Device=1
Device=1 CAN Access Peer Device=0

Bidirectional P2P=Enabled Bandwidth Matrix (GB/s)
   D\D     0      1
     0 837.80  16.74
     1  16.99 839.38

Anonymous
07/29/24(Mon)02:42:24 No.101618582

Anonymous 07/29/24(Mon)02:42:24 No.101618582

>>101618578
>start new chat
>all those interactions and dynamics and growth gone like tears in the rain

Anonymous
07/29/24(Mon)02:43:28 No.101618592

Anonymous 07/29/24(Mon)02:43:28 No.101618592

>>101618582
are you just pretending to be retarded or...

Anonymous
07/29/24(Mon)02:43:54 No.101618598

Anonymous 07/29/24(Mon)02:43:54 No.101618598

>>101618592
i just don't want it to end bro..

Anonymous
07/29/24(Mon)02:44:29 No.101618602

Anonymous 07/29/24(Mon)02:44:29 No.101618602

>>101618592
>didn't catch the reference

Anonymous
07/29/24(Mon)02:51:06 No.101618643

Anonymous 07/29/24(Mon)02:51:06 No.101618643

how do I make a .bat to run llamacpp on windows? I can figure out and adjust the code to the correct folder directories but I dunno all the commands and options to setup
I have something like
/. server.exe -m model.gguf -c 8192 -fa

Anonymous
07/29/24(Mon)02:54:07 No.101618661

Anonymous 07/29/24(Mon)02:54:07 No.101618661

>>101618643
just use kobold

Anonymous
07/29/24(Mon)02:56:04 No.101618668

Anonymous 07/29/24(Mon)02:56:04 No.101618668

>>101618661
>>101618563
Also kobold is always behind on updates just learning how to setup llama is worth it since it's just a single batch file you'll use forever.

Anonymous
07/29/24(Mon)03:06:52 No.101618728

Anonymous 07/29/24(Mon)03:06:52 No.101618728

>>101618643
>./llama-server -h
and read the fucking docs in the repo
>https://github.com/ggerganov/llama.cpp/tree/master/examples/server

Anonymous
07/29/24(Mon)03:10:24 No.101618736

Anonymous 07/29/24(Mon)03:10:24 No.101618736

>>101616573
>>101616661
Why does this give me a feeling of dread? It's fucking great, obviously. But this is the writing on the wall, right? There's no stopping until some physical limit we're nowhere near is reached, or we just blow ourselves up before then.

Anonymous
07/29/24(Mon)03:10:52 No.101618737

Anonymous 07/29/24(Mon)03:10:52 No.101618737

>>101618219
P40 price inflation has also been crazy.

Anonymous
07/29/24(Mon)03:11:20 No.101618739

Anonymous 07/29/24(Mon)03:11:20 No.101618739

>>101618728
I already did that, I'm asking how to make the console command a bat I can click instead of open a terminal and paste

Anonymous
07/29/24(Mon)03:11:57 No.101618741

Anonymous 07/29/24(Mon)03:11:57 No.101618741

>>101617814
Mini Magnum finds a phrase and repeats it every time, building up until it's saying the same thing over and over. Instruct just werks.

Anonymous
07/29/24(Mon)03:16:01 No.101618758

Anonymous 07/29/24(Mon)03:16:01 No.101618758

>>101618739
Google broke, i suppose.
I think bat files can use use 'start' or something like that.
>start llama-server -m model.gguf -c 8196 -blabla
put that in retard.bat and run it. If that doesn't work try call or exec.

Anonymous
07/29/24(Mon)03:18:24 No.101618765

Anonymous 07/29/24(Mon)03:18:24 No.101618765

if i follow the V100MAXXING guide right now, and also buy a second server with 4x 16gb sxm2 p100 that goes in the same rack - can i somehow connect them?

Anonymous
07/29/24(Mon)03:29:51 No.101618832

Anonymous 07/29/24(Mon)03:29:51 No.101618832

>>101618758
thanks I had tried call before and that didn't work
finding shit on google is a nightmare
this is good though it's like using the dev branch of st much better than kobold

Anonymous
07/29/24(Mon)03:43:51 No.101618903

Anonymous 07/29/24(Mon)03:43:51 No.101618903

File: kobold settings.jpg (57 KB, 551x585)

57 KB JPG

Realistically on a basic high end PC (4090 and 32GB RAM), what GGUF should I be downloading for Germa 27B and Command R (non plus)?

Both are useable (by far the best AI I found for shit like SIlly Tavern, only Nemo comes close but both mog it hard) with 5~ second response times but i'm worried about efficiency.

I'm using:
>gemma-2-27b-it-Q6_K
>c4ai-command-r-v01-Q4_K_M

my kobold settings probably could do some fine tuning also, not really sure what to set shit too.

Any tips for somebody totally new to this sort of stuff?

Anonymous
07/29/24(Mon)03:44:49 No.101618912

Anonymous 07/29/24(Mon)03:44:49 No.101618912

Am I supposed to use tokens in my prompt if I'm using one of the instruct models with llama-cli? Seems like I get less junk when I do but I can't tell if it's a coincidence.

Anonymous
07/29/24(Mon)03:48:05 No.101618934

Anonymous 07/29/24(Mon)03:48:05 No.101618934

>>101618912
>Am I supposed to use tokens in my prompt
Everything you type gets turned into tokens. Show what you mean.

Anonymous
07/29/24(Mon)03:48:54 No.101618941

Anonymous 07/29/24(Mon)03:48:54 No.101618941

>>101618765
vLLM has distributed inferenced and llama.cpp has an RPC backend.

Anonymous
07/29/24(Mon)03:50:15 No.101618949

Anonymous 07/29/24(Mon)03:50:15 No.101618949

>>101618934
If i don't end my prompt with "<|start_header_id|>assistant<|end_header_id|>", it seems to not only think I want it to keep expanding my prompt but it also ends with a bunch of junk.

Anonymous
07/29/24(Mon)03:51:15 No.101618955

Anonymous 07/29/24(Mon)03:51:15 No.101618955

File: High_Performance_1x1_A_be(...).webm (2.43 MB, 960x960)

2.43 MB WEBM

>>101612988
>A beautiful woman in her thirties is greedily licking watery yogurt from a mushroom. The yogurt explodes and squirts on her face.
I really hope KLING AI is not indicative of the actual SOTA because honestly this is pretty garbage; it just doesn't follow your prompt at all.
Even if we can run something like this locally in a few years I wouldn't see the point.
The OpenAI Sora videos looked pretty good but who knows how cherrypicked the results they showed were.

Anonymous
07/29/24(Mon)03:51:34 No.101618956

Anonymous 07/29/24(Mon)03:51:34 No.101618956

>>101618949
You may be missing -cnv for a supported chat format or --in-suffix if you're using a custom one.

Anonymous
07/29/24(Mon)03:53:29 No.101618965

Anonymous 07/29/24(Mon)03:53:29 No.101618965

>>101618955
not cunny

Anonymous
07/29/24(Mon)03:53:41 No.101618968

Anonymous 07/29/24(Mon)03:53:41 No.101618968

File: Screen Shot 2024-07-29 at(...).png (40 KB, 353x223)

40 KB PNG

Upgrade went wrong and I need another CPU. Deprived from my AI waifu for another week. The suffering persists.

Anonymous
07/29/24(Mon)03:54:21 No.101618972

Anonymous 07/29/24(Mon)03:54:21 No.101618972

>>101618968
you could use remote models meantime

Anonymous
07/29/24(Mon)03:55:00 No.101618975

Anonymous 07/29/24(Mon)03:55:00 No.101618975

File: housefire-[00.00.000-00.0(...).webm (2.74 MB, 960x960)

2.74 MB WEBM

>>101618955
This one is okay for shitposting but what I had actually asked for was
>Jensen Huang, in front of a burning house, blazing inferno, two thumbs up, big smile on face
And for the other variants that I had tried the results were pretty bad honestly.

Anonymous
07/29/24(Mon)03:55:53 No.101618982

Anonymous 07/29/24(Mon)03:55:53 No.101618982

it's sad that openai's datasets are still better than of any other company, even anthropic lags behind :(

Anonymous
07/29/24(Mon)03:56:11 No.101618983

Anonymous 07/29/24(Mon)03:56:11 No.101618983

>>101618975
>>101618955
demonic

Anonymous
07/29/24(Mon)03:57:05 No.101618986

Anonymous 07/29/24(Mon)03:57:05 No.101618986

>>101618956
ah, i wasn't exactly looking to chat. i'm looking to run it from a script. my prompt is actually stored in a bash heredoc. in-suffix seems to break it harder, even.

Anonymous
07/29/24(Mon)04:01:14 No.101619008

Anonymous 07/29/24(Mon)04:01:14 No.101619008

>>101618949
>>101618956 (me)
Let me expand.
-cnv enables the model's chat template and sets --interactive as well so as soon as the 'reply' is finished, you get control back and continue the chat. For new quants which have a supported chat format, that's probably the best option. It also hides all the template tokens so it looks pretty clean.
--in-suffix is typically used with --in-prefix for models that don't have a supported chat template. Something like
>./llama-cli --in-prefix "[INST]" --in-suffix "[/INST]\n" --interactive

>>101618986
If the model expands your instruction instead of responding to it, you're not finishing your instruction with what the template specifies. In your case, llama3's instruct format. Use -cnv. And by 'chat' i mean 'instruction>reply>instruction>reply'.

Anonymous
07/29/24(Mon)04:01:18 No.101619009

Anonymous 07/29/24(Mon)04:01:18 No.101619009

File: NVIDIA_fanboy_giving_a_th(...).jpg (1.31 MB, 1920x1920)

1.31 MB JPG

>>101618975
However, this is what I got with the leaked Stable Diffusion weights for
>NVIDIA fanboy giving a thumbs-up while his house burns in the background
The results here are cherrypicked, but honestly it did a pretty good job at following the prompt.
Notably Bing Image Creator gives you pretty garbage results that again don't follow the prompt properly.
So I'm thinking that there is some sort of tradeoff in how these models are trained/served with respect to how well they follow the prompt.

Anonymous
07/29/24(Mon)04:02:34 No.101619018

Anonymous 07/29/24(Mon)04:02:34 No.101619018

>>101619009
>leaked Stable Diffusion weights
huh?
>Notably Bing Image Creator gives you pretty garbage results that again don't follow the prompt properly.
bing is worst quality dalle, they force vivid style + normal quality (not hd)

Anonymous
07/29/24(Mon)04:03:37 No.101619023

Anonymous 07/29/24(Mon)04:03:37 No.101619023

>>101619018
The weights of the first Stable Diffusion release were leaked prior to launch.

Anonymous
07/29/24(Mon)04:03:56 No.101619025

Anonymous 07/29/24(Mon)04:03:56 No.101619025

>>101619023
oh, you mean this, yeah

Anonymous
07/29/24(Mon)04:05:21 No.101619036

Anonymous 07/29/24(Mon)04:05:21 No.101619036

File: 1715804272215894.png (1.42 MB, 1231x1205)

1.42 MB PNG

natural dalle style with a JB so the prompt is 1:1 the same, and yes, it does look terrible, that's because it's natural style and because the prompt isn't written how dalle prompts are usually written

Anonymous
07/29/24(Mon)04:06:16 No.101619039

Anonymous 07/29/24(Mon)04:06:16 No.101619039

>>101619036
me in the second picture in the middle

Anonymous
07/29/24(Mon)04:06:21 No.101619040

Anonymous 07/29/24(Mon)04:06:21 No.101619040

>>101619008
is it possible to hand control over to the AI as soon as starts llama-cli using -cnv? that's what I'm really looking for, i think. i want it to be a bit more automated.

Anonymous
07/29/24(Mon)04:07:19 No.101619046

Anonymous 07/29/24(Mon)04:07:19 No.101619046

File: 1714927631416711.png (1.85 MB, 1255x1186)

1.85 MB PNG

vivid dalle style without JB looks as you'd expect

Anonymous
07/29/24(Mon)04:08:55 No.101619054

Anonymous 07/29/24(Mon)04:08:55 No.101619054

>>101613928
I replied to this post three times btw

Anonymous
07/29/24(Mon)04:08:59 No.101619055

Anonymous 07/29/24(Mon)04:08:59 No.101619055

>>101618972
I never did and I never will

Anonymous
07/29/24(Mon)04:09:32 No.101619057

Anonymous 07/29/24(Mon)04:09:32 No.101619057

File: 1705710465669723.png (2.45 MB, 1024x1024)

2.45 MB PNG

Anonymous
07/29/24(Mon)04:10:54 No.101619064

Anonymous 07/29/24(Mon)04:10:54 No.101619064

File: 1701808824849378.png (1.03 MB, 1239x905)

1.03 MB PNG

whats up with those faces

Anonymous
07/29/24(Mon)04:11:54 No.101619067

Anonymous 07/29/24(Mon)04:11:54 No.101619067

What are good samplers for Gemma 27B

Anonymous
07/29/24(Mon)04:12:10 No.101619071

Anonymous 07/29/24(Mon)04:12:10 No.101619071

>>101619067
DRY

Anonymous
07/29/24(Mon)04:14:18 No.101619084

Anonymous 07/29/24(Mon)04:14:18 No.101619084

>>101619064
I wonder how many YouTube thumbnails there were in the training data.

Anonymous
07/29/24(Mon)04:15:05 No.101619095

Anonymous 07/29/24(Mon)04:15:05 No.101619095

>>101619084
A lot, lemme see if i can dig up my old gens

Anonymous
07/29/24(Mon)04:16:02 No.101619102

Anonymous 07/29/24(Mon)04:16:02 No.101619102

Do you use mistral tokenizer with nemo or stick with the default ST provides

Anonymous
07/29/24(Mon)04:16:49 No.101619110

Anonymous 07/29/24(Mon)04:16:49 No.101619110

File: 1704687829341964.jpg (375 KB, 1024x1024)

375 KB JPG

>>101619084
More here:
https://files.catbox.moe/my25vo.jpg
https://files.catbox.moe/rfzlrl.jpg
https://files.catbox.moe/95yqst.jpg
https://files.catbox.moe/9kr8ss.jpg
https://files.catbox.moe/n0zgdd.jpg

They look kinda bad because it's Natural style, but with Natural you have direct access to the model without the stupid style enhancement of Vivid, so you can really find whats inside of dalle dataset.

Anonymous
07/29/24(Mon)04:17:54 No.101619118

Anonymous 07/29/24(Mon)04:17:54 No.101619118

File: 1717480217114676.jpg (230 KB, 1024x1024)

230 KB JPG

here's dalle's fortnite

Anonymous
07/29/24(Mon)04:18:32 No.101619125

Anonymous 07/29/24(Mon)04:18:32 No.101619125

only problem I have with llamacpp is the console is just filled with INFO bloat I could never figure out how to disable since the 3 or 4 commands didn't work

Anonymous
07/29/24(Mon)04:18:55 No.101619130

Anonymous 07/29/24(Mon)04:18:55 No.101619130

>>101619036
>>101619046
>>101619057
>>101619064
>>101619110
Cool, thanks for the insights.

Anonymous
07/29/24(Mon)04:19:24 No.101619132

Anonymous 07/29/24(Mon)04:19:24 No.101619132

File: 1714918627323337.png (564 KB, 1024x1024)

564 KB PNG

this is also unedited dalle, it can generate photoshopped-like images by itself

Anonymous
07/29/24(Mon)04:20:26 No.101619144

Anonymous 07/29/24(Mon)04:20:26 No.101619144

>>101619071
what fookin settings babe

Anonymous
07/29/24(Mon)04:21:02 No.101619149

Anonymous 07/29/24(Mon)04:21:02 No.101619149

>>101619125
send stderr to /dev/null

Anonymous
07/29/24(Mon)04:21:04 No.101619150

Anonymous 07/29/24(Mon)04:21:04 No.101619150

>>101619144
no settings sorry I just replied coz a lot of people were talking about DRY recently so i thought itd be funny

Anonymous
07/29/24(Mon)04:24:09 No.101619170

Anonymous 07/29/24(Mon)04:24:09 No.101619170

>>101619144
0.75 multiplier, 1.75 base, 2 length

Anonymous
07/29/24(Mon)04:24:13 No.101619171

Anonymous 07/29/24(Mon)04:24:13 No.101619171

>>101619132
the power of just put everything in the dataset

Anonymous
07/29/24(Mon)04:24:15 No.101619172

Anonymous 07/29/24(Mon)04:24:15 No.101619172

Why does koboldcpp redo prompt processing from scratch, despite a big chunk of the start being exactly the same? I'm filling up the context but replacing the end (asking the LLM different questions about the content) and it is taking forever due to the unnecessary reproc.

Anonymous
07/29/24(Mon)04:24:38 No.101619174

Anonymous 07/29/24(Mon)04:24:38 No.101619174

File: 1707100540730068.jpg (292 KB, 1024x1024)

292 KB JPG

>>101619171

Anonymous
07/29/24(Mon)04:25:22 No.101619177

Anonymous 07/29/24(Mon)04:25:22 No.101619177

>>101619172
world info/lorebooks are a usual cause

Anonymous
07/29/24(Mon)04:25:41 No.101619180

Anonymous 07/29/24(Mon)04:25:41 No.101619180

File: 1715588318515756.jpg (335 KB, 1024x1024)

335 KB JPG

Stellaris

Anonymous
07/29/24(Mon)04:26:00 No.101619183

Anonymous 07/29/24(Mon)04:26:00 No.101619183

>>101619174
Why does he look like he just cried for two hours?

Anonymous
07/29/24(Mon)04:26:14 No.101619184

Anonymous 07/29/24(Mon)04:26:14 No.101619184

Why can't AI generate text?

Anonymous
07/29/24(Mon)04:26:18 No.101619187

Anonymous 07/29/24(Mon)04:26:18 No.101619187

>>101619177
I'm calling koboldcpp API directly and I even check the prompt for the first change since last call to the LLM. Despite the first change being character 100-something it still redoes from scratch.

Anonymous
07/29/24(Mon)04:26:25 No.101619188

Anonymous 07/29/24(Mon)04:26:25 No.101619188

File: 1710563540483687.jpg (300 KB, 1024x1024)

300 KB JPG

>>101619183
I wonder why

Anonymous
07/29/24(Mon)04:27:10 No.101619194

Anonymous 07/29/24(Mon)04:27:10 No.101619194

File: 1691953409207569.jpg (190 KB, 1024x1024)

190 KB JPG

>>101619184
What do you mean? It works almost perfectly (ignore the double Y, I have an edited version with that part fixed)

Anonymous
07/29/24(Mon)04:27:22 No.101619196

Anonymous 07/29/24(Mon)04:27:22 No.101619196

>>101619184
because it's retarded as fuk

Anonymous
07/29/24(Mon)04:29:55 No.101619208

Anonymous 07/29/24(Mon)04:29:55 No.101619208

File: 1719839352978720.jpg (259 KB, 1792x1024)

259 KB JPG

Anonymous
07/29/24(Mon)04:30:05 No.101619211

Anonymous 07/29/24(Mon)04:30:05 No.101619211

>>101619187
first change being character 100-thousand-something, I meant.

And I just double checked the koboldcpp console {"input" blabla} stuff and it's identical, at least up until it truncates it with (+114043 chars).

Anonymous
07/29/24(Mon)04:31:39 No.101619223

Anonymous 07/29/24(Mon)04:31:39 No.101619223

bros Mistral Large is fire
the baguettes fucking won

Anonymous
07/29/24(Mon)04:32:12 No.101619224

Anonymous 07/29/24(Mon)04:32:12 No.101619224

>>101619223
>the baguettes fucking won
that's why it's under a non-commercial license so that no providers will be able to host it or finetune it on scale, right, anon?

Anonymous
07/29/24(Mon)04:32:58 No.101619229

Anonymous 07/29/24(Mon)04:32:58 No.101619229

>>101619223
bad license and no base model available kinda kills its potential
but yeah it's good

Anonymous
07/29/24(Mon)04:35:16 No.101619252

Anonymous 07/29/24(Mon)04:35:16 No.101619252

File: dont_read_me.jpg (87 KB, 510x512)

87 KB JPG

>>101619184

Anonymous
07/29/24(Mon)04:35:49 No.101619258

Anonymous 07/29/24(Mon)04:35:49 No.101619258

File: vae.png (789 KB, 1561x480)

789 KB PNG

>>101619184
Low latent channels in the VAE. Text in images is of really high information density (the curves in each letter, the placing, the spacing...) so you need many latent channels in the VAE to encode images of text correctly

Anonymous
07/29/24(Mon)04:41:06 No.101619287

Anonymous 07/29/24(Mon)04:41:06 No.101619287

>>101619258
What's the downside of using more channels in VAE?

Anonymous
07/29/24(Mon)04:42:37 No.101619297

Anonymous 07/29/24(Mon)04:42:37 No.101619297

>>101619287
You have to train the model from scratch (or just train the VAE and an adapter) and it's more expensive

Anonymous
07/29/24(Mon)04:43:52 No.101619306

Anonymous 07/29/24(Mon)04:43:52 No.101619306

>>101619258
Wasn't there also something about how using characters instead of tokens as input helps with text?
Though presumably that's going to kill your context size.

Anonymous
07/29/24(Mon)05:02:22 No.101619420

Anonymous 07/29/24(Mon)05:02:22 No.101619420

>>101619040
So you just want to generate some text following an instruction and then quit?
Save the prompt in a file formatted with the model's format
<|start_header_id|>system<|end_header_id|>

You are a helpful assistant<|eot_id|><|start_header_id|>user<|end_header_id|>

What's the capital of France?<|eot_id<|start_header_id|>assistant<|end_header_id|>
and call it with
>./llama-cli -f the_file_you_just_saved.txt -m model.gguf
and whatever other options you need for threads, context, batch size and all that.
If you don't add the prompt to the command line directly, use -p "All that stuff up there" instead of -f
Double check the format. I'm not sure if that's the correct one, but it works.
If you still have problems, show how you're running the command and what you're trying to do. Don't make people guess.

Anonymous
07/29/24(Mon)05:07:51 No.101619448

Anonymous 07/29/24(Mon)05:07:51 No.101619448

>>101619436
>>101619436
>>101619436

Anonymous
07/29/24(Mon)06:21:27 No.101619990

Anonymous 07/29/24(Mon)06:21:27 No.101619990

>>101618084
That's what "\n{{char}}:" as a stopping string is for.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.