/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 09/14/24(Sat)14:14:11 No.102385729

File: 1698272728173003.jpg (363 KB, 2000x2000)

363 KB JPG

/lmg/ - Local Models General Anonymous 09/14/24(Sat)14:14:11 No.102385729 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102378325 & >>102373558

►News
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm/
>(09/12) LLaMA-Omni: Multimodal LLM with seamless speech interaction: https://huggingface.co/ICTNLP/Llama-3.1-8B-Omni
>(09/11) Fish Speech multilingual TTS with voice replication: https://hf.co/fishaudio/fish-speech-1.4
>(09/11) Pixtral: 12B with image input vision adapter: https://xcancel.com/mistralai/status/1833758285167722836
>(09/11) Solar Pro Preview, Phi-3-medium upscaled to 22B: https://hf.co/upstage/solar-pro-preview-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
09/14/24(Sat)14:15:19 No.102385745

Anonymous 09/14/24(Sat)14:15:19 No.102385745

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>102378325

--Codestral is the best local coding model under 64GB RAM for Win32 API Pong game: >>102379675 >>102380221 >>102381835 >>102381920 >>102381791 >>102382310 >>102382350 >>102382371 >>102382464 >>102382526 >>102383275 >>102382696 >>102382749 >>102383051 >>102383123 >>102383272 >>102383362
--Importance of learning rate adjustment and prompt templates: >>102378600 >>102380026 >>102383366 >>102383406
--Google's NotebookLM impresses with high-quality audio and paper explanations: >>102381307 >>102381502 >>102381627 >>102381651 >>102381976 >>102382118 >>102382961
--OpenAI threatens ban over reflection webui, schumer's failed reflection finetune, and Deep Seek chat for RP: >>102383638 >>102383685 >>102383708 >>102383814 >>102383851 >>102383878 >>102383908 >>102383933 >>102383969 >>102383995 >>102383989 >>102383910 >>102383939 >>102384048 >>102384232 >>102384339 >>102384389 >>102384488 >>102384174 >>102383865
--Llama 70B 3.1 Instruct AQLM-PV released, performance metrics compared: >>102380035 >>102380067 >>102380121 >>102380141 >>102380166 >>102380179
--Adjust prompt format and system prompt to reduce model rambling: >>102379848 >>102379862
--RedTeam Arena exploits free labor for red teaming LLMS: >>102380826 >>102380880 >>102381228 >>102381435 >>102382187 >>102380888 >>102380900 >>102380948 >>102381263 >>102381290 >>102381034 >>102381351 >>102381364
--RLHF and safety measures harming model performance and creativity: >>102380869 >>102380919 >>102381003 >>102381096
--New Physics of Language Models video released: >>102384364 >>102384392 >>102384492
--Anon shares positive results using COT with various models: >>102378494 >>102378562 >>102378578 >>102378669 >>102378763 >>102385483 >>102379237
--Miku (free space): >>102380142 >>102385054

►Recent Highlight Posts from the Previous Thread: >>102378329

Anonymous
09/14/24(Sat)14:17:03 No.102385775

Anonymous 09/14/24(Sat)14:17:03 No.102385775

File: llama-3-o1.png (98 KB, 907x926)

98 KB PNG

guys I have duplicated o1 with just a simple system message.

Anonymous
09/14/24(Sat)14:17:10 No.102385776

Anonymous 09/14/24(Sat)14:17:10 No.102385776

Hi all, Drummer here...

In celebration of my incoming 70B finetune release, I'd like to ask...

What's your favorite Drummer model so far?

>inb4 Gemmasutra 2b

---

Heard some love for Theia v2. Thank you! The upscale meme is working.

---

Regarding my Buddy 2B license: It only applies to businesses since I don't want them advertising it as a cure to depression / mental illness (and profit off it).

Anonymous
09/14/24(Sat)14:19:07 No.102385799

Anonymous 09/14/24(Sat)14:19:07 No.102385799

File: ComfyUI_00164_.png (2.33 MB, 2000x1024)

2.33 MB PNG

>>102385776

Anonymous
09/14/24(Sat)14:24:07 No.102385875

Anonymous 09/14/24(Sat)14:24:07 No.102385875

File: ComfyUI_00169_.png (1.31 MB, 1024x1024)

1.31 MB PNG

>>102385775
that is amazing.

Anonymous
09/14/24(Sat)14:25:45 No.102385899

Anonymous 09/14/24(Sat)14:25:45 No.102385899

File: 1708939424426621.png (282 KB, 927x747)

282 KB PNG

on a scale of 1 to 10, how afraid are they?

Anonymous
09/14/24(Sat)14:25:56 No.102385903

Anonymous 09/14/24(Sat)14:25:56 No.102385903

>>102385776
My favorite model is the one that tells you to buy an ad.

Anonymous
09/14/24(Sat)14:25:59 No.102385904

Anonymous 09/14/24(Sat)14:25:59 No.102385904

File: breakthrough.png (74 KB, 650x623)

74 KB PNG

>>102385875
it's absolutely revolutionary.

Anonymous
09/14/24(Sat)14:27:00 No.102385917

Anonymous 09/14/24(Sat)14:27:00 No.102385917

>>102385903
He literally did buy an ad. He's a legend. A man of the people.

Anonymous
09/14/24(Sat)14:27:05 No.102385920

Anonymous 09/14/24(Sat)14:27:05 No.102385920

File: 52 Days Until November 5.png (1.45 MB, 1616x1008)

1.45 MB PNG

Anonymous
09/14/24(Sat)14:28:04 No.102385937

Anonymous 09/14/24(Sat)14:28:04 No.102385937

File: 52 days till november 5th.png (1.43 MB, 1024x1024)

1.43 MB PNG

>>102385920

Anonymous
09/14/24(Sat)14:28:26 No.102385941

Anonymous 09/14/24(Sat)14:28:26 No.102385941

>>102385775
I can't believe an LLM would take the piss out of the idea so well. You wrote it yourself.

Anonymous
09/14/24(Sat)14:29:12 No.102385952

Anonymous 09/14/24(Sat)14:29:12 No.102385952

>>102385937
>>102385875
>>102385799
stop posting glow mikus

Anonymous
09/14/24(Sat)14:30:36 No.102385976

Anonymous 09/14/24(Sat)14:30:36 No.102385976

>>102385899
They have to protect their revolutionary system message somehow.

Anonymous
09/14/24(Sat)14:31:04 No.102385987

Anonymous 09/14/24(Sat)14:31:04 No.102385987

>>102385937
>>102385920
>>102385799
>>102385875
Keep shitting up this useless thread. You are the punchline of how dead /lmg/ is.

Anonymous
09/14/24(Sat)14:31:51 No.102385997

Anonymous 09/14/24(Sat)14:31:51 No.102385997

>>102385987
Go home, Sam, you're drunk.

Anonymous
09/14/24(Sat)14:33:00 No.102386018

Anonymous 09/14/24(Sat)14:33:00 No.102386018

File: ComfyUI_00181_.png (1.01 MB, 1024x1024)

1.01 MB PNG

>>102385952
no
>>102385987
migu

Anonymous
09/14/24(Sat)14:33:06 No.102386021

Anonymous 09/14/24(Sat)14:33:06 No.102386021

Speaking of system messages has anyone tried just adding a system message instructing 4o to use CoT before replying and seeing how that compares to o1?

Anonymous
09/14/24(Sat)14:35:14 No.102386054

Anonymous 09/14/24(Sat)14:35:14 No.102386054

File: ComfyUI_00183_.png (1.26 MB, 1024x1024)

1.26 MB PNG

>>102386018
forgot the glow

Anonymous
09/14/24(Sat)14:35:24 No.102386057

Anonymous 09/14/24(Sat)14:35:24 No.102386057

>>102385775
This is actually pretty cool while also being funny. What model specifically?

Anonymous
09/14/24(Sat)14:35:33 No.102386058

Anonymous 09/14/24(Sat)14:35:33 No.102386058

>>102386021
It's an interesting idea, but o1 has an RLHF-style reward model to guide CoT, so my guess is without it, it'd probably be pretty shit

Anonymous
09/14/24(Sat)14:37:46 No.102386091

Anonymous 09/14/24(Sat)14:37:46 No.102386091

>>102386057
Tenyxchat-DaybreakStorywriter

Anonymous
09/14/24(Sat)14:41:04 No.102386138

Anonymous 09/14/24(Sat)14:41:04 No.102386138

>>102386057
https://huggingface.co/TheBloke/LLaMa-7B-GGML

Anonymous
09/14/24(Sat)14:44:33 No.102386184

Anonymous 09/14/24(Sat)14:44:33 No.102386184

File: ComfyUI_temp_ybnpq_00022_.png (1.45 MB, 992x1240)

1.45 MB PNG

>>102385775
>>102385899
>>102385904

Anonymous
09/14/24(Sat)14:46:55 No.102386207

Anonymous 09/14/24(Sat)14:46:55 No.102386207

https://x.com/zhouwenmeng/status/1834899729165304198
>crazy thursday
lmfao

Anonymous
09/14/24(Sat)14:48:51 No.102386234

Anonymous 09/14/24(Sat)14:48:51 No.102386234

>>102386207
Well that moat dried up pretty fast. Since it's basically just a finetune of an existing model it can be duplicated in a manner of hours once someone has their dataset put together.

Sam Altman
09/14/24(Sat)14:49:52 No.102386245

Sam Altman 09/14/24(Sat)14:49:52 No.102386245

>>102386207
FUCKING CHINKS

Anonymous
09/14/24(Sat)14:50:15 No.102386250

Anonymous 09/14/24(Sat)14:50:15 No.102386250

>>102386207
chinks will save us from the slopgpt menace

Anonymous
09/14/24(Sat)14:50:23 No.102386252

Anonymous 09/14/24(Sat)14:50:23 No.102386252

>>102386207
LETS. FUCKING. GOOOO.

Anonymous
09/14/24(Sat)14:50:52 No.102386260

Anonymous 09/14/24(Sat)14:50:52 No.102386260

>>102386207
>100B model released
kino

Anonymous
09/14/24(Sat)14:51:42 No.102386269

Anonymous 09/14/24(Sat)14:51:42 No.102386269

China will save us. They dont give a fuck about copyrights or nsfw

Anonymous
09/14/24(Sat)14:51:47 No.102386272

Anonymous 09/14/24(Sat)14:51:47 No.102386272

>>102386207
Where did it say q1? Or is the poster just speculating based on the question mark that appeared there.

Anonymous
09/14/24(Sat)14:52:10 No.102386275

Anonymous 09/14/24(Sat)14:52:10 No.102386275

File: xi jing chad.png (757 KB, 800x582)

757 KB PNG

>>102386207

Anonymous
09/14/24(Sat)14:53:11 No.102386287

Anonymous 09/14/24(Sat)14:53:11 No.102386287

>>102386272
The poster is the CEO of Qwen, it's a confirmation

Anonymous
09/14/24(Sat)14:54:32 No.102386293

Anonymous 09/14/24(Sat)14:54:32 No.102386293

>>102386287
>CEO of Qwen
please don't ever post again

Anonymous
09/14/24(Sat)14:54:36 No.102386295

Anonymous 09/14/24(Sat)14:54:36 No.102386295

Someone needs to do the brendan fraser hair thing on saltman.

Anonymous
09/14/24(Sat)14:54:57 No.102386301

Anonymous 09/14/24(Sat)14:54:57 No.102386301

>>102385775
Peak satire, I was laughing trough the whole thing.
I now realize that autism is just humans using CoT.

Anonymous
09/14/24(Sat)14:55:49 No.102386307

Anonymous 09/14/24(Sat)14:55:49 No.102386307

>>102386293
stfu, quickest way to convey information

Anonymous
09/14/24(Sat)14:57:20 No.102386321

Anonymous 09/14/24(Sat)14:57:20 No.102386321

>>102386260
What if it's a 100B 1.58bpw Ternary model with strawberry power. It's literally over for openAI.

Anonymous
09/14/24(Sat)14:57:57 No.102386330

Anonymous 09/14/24(Sat)14:57:57 No.102386330

>>102385776
you are very cool and all but I've not downloaded a model since Midnight miqu so I can't really tell you

Anonymous
09/14/24(Sat)14:58:38 No.102386336

Anonymous 09/14/24(Sat)14:58:38 No.102386336

why's saltman getting so mad on twitter now

Anonymous
09/14/24(Sat)14:59:24 No.102386351

Anonymous 09/14/24(Sat)14:59:24 No.102386351

>>102386321
They've shown interest in using BitNet for Qwen 3

Anonymous
09/14/24(Sat)15:01:11 No.102386365

Anonymous 09/14/24(Sat)15:01:11 No.102386365

>>102386351
Well it's been long enough since the 1-bit era paper to have trained a (serious) foundational model from scratch. So they should start showing up soon.

Anonymous
09/14/24(Sat)15:08:18 No.102386462

Anonymous 09/14/24(Sat)15:08:18 No.102386462

>>102386207
Largestral is doomed, I repeat DOOMED

Anonymous
09/14/24(Sat)15:11:05 No.102386489

Anonymous 09/14/24(Sat)15:11:05 No.102386489

>>102386269
>nsfw
They do give a shit about NSFW. The saving grace is that so far it seems like they don't care about lewd outputs in English, only Chinese.

Anonymous
09/14/24(Sat)15:13:09 No.102386518

Anonymous 09/14/24(Sat)15:13:09 No.102386518

>>102386207
Qwen is pozzed as fuck. Just try it on together.ai playground, it'll refuse controversial things and creative writing is even more slopped then gpt4 for some reason

Anonymous
09/14/24(Sat)15:15:55 No.102386552

Anonymous 09/14/24(Sat)15:15:55 No.102386552

>>102386518
>al t. man

Anonymous
09/14/24(Sat)15:18:31 No.102386583

Anonymous 09/14/24(Sat)15:18:31 No.102386583

File: __reisen_udongein_inaba_t(...).png (113 KB, 1920x1080)

113 KB PNG

Best slop finetunes available on openrouter?

Anonymous
09/14/24(Sat)15:21:28 No.102386620

Anonymous 09/14/24(Sat)15:21:28 No.102386620

File: 1714756331701541.jpg (830 KB, 1856x2464)

830 KB JPG

>>102385775
LOL

Anonymous
09/14/24(Sat)15:21:57 No.102386626

Anonymous 09/14/24(Sat)15:21:57 No.102386626

>>102386583
Do you have to be an attention whore here too?

Anonymous
09/14/24(Sat)15:22:17 No.102386627

Anonymous 09/14/24(Sat)15:22:17 No.102386627

File: _14eb52b9-fe8f-45af-83c4-(...).jpg (293 KB, 1024x1024)

293 KB JPG

>>102386207
>Qwen-q1
>Qwen-qstar 1 bit

Anonymous
09/14/24(Sat)15:22:54 No.102386635

Anonymous 09/14/24(Sat)15:22:54 No.102386635

>>102386207
Guys... I'm starting to like China...

Anonymous
09/14/24(Sat)15:23:05 No.102386638

Anonymous 09/14/24(Sat)15:23:05 No.102386638

>>102385775
Can you paste the prompt here so I don't have to write it out?

Anonymous
09/14/24(Sat)15:23:42 No.102386645

Anonymous 09/14/24(Sat)15:23:42 No.102386645

>>102386638
You are a mega ultimate chain-of-thought model and will perform a chain-of-thought analysis of even the most simple user inputs to ensure that you are giving the most fitting reply possible before replying. Perform the CoT inside [THINK][/THINK] tags and the final reply outside. We charge our customers for the thinking tokens even though they are removed from the final answer but in order to appease the investors it would be appreciated if you would waste as many as possible.

Anonymous
09/14/24(Sat)15:24:25 No.102386653

Anonymous 09/14/24(Sat)15:24:25 No.102386653

>>102386054
The sad expression makes her appear self-destructive. Hot

Anonymous
09/14/24(Sat)15:27:42 No.102386692

Anonymous 09/14/24(Sat)15:27:42 No.102386692

>>102386627
because of the hardware embargos on China bitnet seems like a logical avenue to explore.

Anonymous
09/14/24(Sat)15:27:44 No.102386693

Anonymous 09/14/24(Sat)15:27:44 No.102386693

>>102386635
For what? Building up hype and delivering another minor reasoning upgrade at the cost of being more and more incapable of sucking cock?

Anonymous
09/14/24(Sat)15:28:19 No.102386703

Anonymous 09/14/24(Sat)15:28:19 No.102386703

fiction i consume:
>death, murder, bleakness, carnage, slaughter, monsters, rape, intense desperate battles for survival
fiction i create on my llm:
>hugging and cuddling on a couch

Anonymous
09/14/24(Sat)15:28:42 No.102386710

Anonymous 09/14/24(Sat)15:28:42 No.102386710

>>102386693
Sounds like they're in lockstep with OpenAI then.

Anonymous
09/14/24(Sat)15:29:42 No.102386726

Anonymous 09/14/24(Sat)15:29:42 No.102386726

Conference room man just said that CoT does not solve reasoning problems of LLMs.
The timing is lethal for openAI, they're really twisting the knife on this one.

Anonymous
09/14/24(Sat)15:30:17 No.102386733

Anonymous 09/14/24(Sat)15:30:17 No.102386733

>>102386693
Their vision mode, if released will be the best open source vision model. I refuse to give saltman a single penny so this is a great alternative.

Anonymous
09/14/24(Sat)15:30:29 No.102386736

Anonymous 09/14/24(Sat)15:30:29 No.102386736

Best local model for gooning right now?
I downloaded nemo but im pretty sure i got the wrong one and i might be retarded

Anonymous
09/14/24(Sat)15:30:37 No.102386741

Anonymous 09/14/24(Sat)15:30:37 No.102386741

>>102386207
I want so badly for Qwen to be awesome, but I've never had a good experience with any of their models even unquanted. Am I retar, or are they actually just mid?

Anonymous
09/14/24(Sat)15:31:29 No.102386751

Anonymous 09/14/24(Sat)15:31:29 No.102386751

File: firefox_vBb1KmJTI0.png (316 KB, 761x1119)

316 KB PNG

>>102386645
:(

Anonymous
09/14/24(Sat)15:35:45 No.102386807

Anonymous 09/14/24(Sat)15:35:45 No.102386807

>>102386736
might just be a sampler setting issue
in koboldai lite for mistral nemo stuff i just do the "basic min-p" preset, then crank the temperature down to 0.8ish and max output to 100ish tokens and it works out pretty well.

Anonymous
09/14/24(Sat)15:36:10 No.102386816

Anonymous 09/14/24(Sat)15:36:10 No.102386816

>>102386287
Oh just had a look, nice. Will China truly save us?

Anonymous
09/14/24(Sat)15:37:37 No.102386841

Anonymous 09/14/24(Sat)15:37:37 No.102386841

>>102386816
Yes.

Anonymous
09/14/24(Sat)15:39:06 No.102386862

Anonymous 09/14/24(Sat)15:39:06 No.102386862

File: MikuOnTheSideOfMy1982GMCC(...).png (1.65 MB, 1216x832)

1.65 MB PNG

>>102386583
MulletMiku

Anonymous
09/14/24(Sat)15:41:23 No.102386891

Anonymous 09/14/24(Sat)15:41:23 No.102386891

Imagine if Qwen 3 comes out and it isn't bitnet. It would be so so over.

Anonymous
09/14/24(Sat)15:41:48 No.102386901

Anonymous 09/14/24(Sat)15:41:48 No.102386901

File: 1707462131668139.png (345 KB, 701x768)

345 KB PNG

>>102385775
lol, works with claude with the right prompt

Anonymous
09/14/24(Sat)15:43:14 No.102386923

Anonymous 09/14/24(Sat)15:43:14 No.102386923

>>102386891
Then we start praying that Llama 4 or Grok 2 are bitnet and resume 2mw

Anonymous
09/14/24(Sat)15:43:19 No.102386925

Anonymous 09/14/24(Sat)15:43:19 No.102386925

>>102386891
It would be over either way since we don't have hardware to run Bitnet fast.

Anonymous
09/14/24(Sat)15:43:26 No.102386926

Anonymous 09/14/24(Sat)15:43:26 No.102386926

File: 1697742312851640.jpg (37 KB, 800x582)

37 KB JPG

>>102386891
Bitnet 120B. Believe it.

Anonymous
09/14/24(Sat)15:51:34 No.102387004

Anonymous 09/14/24(Sat)15:51:34 No.102387004

>>102386891
I have some bad news...

Anonymous
09/14/24(Sat)15:54:08 No.102387032

Anonymous 09/14/24(Sat)15:54:08 No.102387032

File: slop conspiracy.png (245 KB, 945x746)

245 KB PNG

If given the chance these models will conspire to slop you.

Anonymous
09/14/24(Sat)15:54:26 No.102387038

Anonymous 09/14/24(Sat)15:54:26 No.102387038

File: 1726090196677167.jpg (31 KB, 761x136)

31 KB JPG

i wish st's databank/rag could use more than 1 thread for vectorizing. i finally got memory alpha to rip after editing some st and node settings and estimate it'll take 22.8 hrs to vectorize based on smaller ones i've done

Anonymous
09/14/24(Sat)15:58:22 No.102387086

Anonymous 09/14/24(Sat)15:58:22 No.102387086

Very depressed, low motivation, linux 4060ti 16 and 64gb ram, what llm will cheer me up the most this evening? i want it to love me and remember me tomorrow.

Anonymous
09/14/24(Sat)16:02:48 No.102387130

Anonymous 09/14/24(Sat)16:02:48 No.102387130

File: 1714416984935691.jpg (46 KB, 500x500)

46 KB JPG

>>102387032
>she also has a tight anus

Anonymous
09/14/24(Sat)16:10:28 No.102387218

Anonymous 09/14/24(Sat)16:10:28 No.102387218

>>102387038
In theory, you could run a second instance of, say, llama.cpp with an embeddings model and use that instead of running it through transformers.js.
The only problem is that I can't find anywhere in the frontend to point Silly to a second instance of llama.cpp, it tries to use the same one as the main one, and I'm pretty sure you can't run a model in both normal and embedding modes using llama-server.

Anonymous
09/14/24(Sat)16:10:56 No.102387222

Anonymous 09/14/24(Sat)16:10:56 No.102387222

>regex removal pattern
>\[THINK\]([\s\S]*?)\[/THINK\]
>context: you are a mega ultimate chain-of-thought model and will perform chain-of-thought analysis of even the most simple user input to ensure that you are giving the most fitting reply possible before replying. perform the CoT inside [THINK][/THINK]tags and the final reply outside.
this shit is really cool even on my gay little 12b model

Anonymous
09/14/24(Sat)16:12:46 No.102387241

Anonymous 09/14/24(Sat)16:12:46 No.102387241

>>102387222
I find it's a little bit inconsistent since it's not specifically trained on it, but this would probably be pretty easy to make a dataset to reinforce.

Anonymous
09/14/24(Sat)16:14:43 No.102387258

Anonymous 09/14/24(Sat)16:14:43 No.102387258

>>102386807
I got nemo instruct, im fairly sure im genuinely retarded and this isnt the one

Anonymous
09/14/24(Sat)16:16:49 No.102387280

Anonymous 09/14/24(Sat)16:16:49 No.102387280

>>102387258
try this one
https://huggingface.co/QuantFactory/Lyra4-Gutenberg-12B-GGUF/tree/main

Anonymous
09/14/24(Sat)16:17:46 No.102387291

Anonymous 09/14/24(Sat)16:17:46 No.102387291

>>102387218
you can actually start the vectorizing with a server connected then close it, st doesn't use the server to vectorize, just transformers.js i guess. but to continue rping while st is vectorizing you can change the port and open a second instance of st while the first one continues to vectorize, just make sure to unselect the rag thats currently working. in st's config.yaml i just change the port to 8001 and continue like that until its done. just a pain overall that transformers.js only uses 1 thread, it'd be done much faster

Anonymous
09/14/24(Sat)16:18:53 No.102387304

Anonymous 09/14/24(Sat)16:18:53 No.102387304

>>102387280
Do you have a license for that?

Anonymous
09/14/24(Sat)16:21:42 No.102387335

Anonymous 09/14/24(Sat)16:21:42 No.102387335

>9/11 pixtral release
>still no way to inference
exl2 and llama.cpp people where the FUCK are you?

Anonymous
09/14/24(Sat)16:24:37 No.102387362

Anonymous 09/14/24(Sat)16:24:37 No.102387362

File: 1724968792423.png (441 KB, 449x407)

441 KB PNG

>>102387335

Anonymous
09/14/24(Sat)16:24:57 No.102387365

Anonymous 09/14/24(Sat)16:24:57 No.102387365

>>102387335
It works with vLLM, I think.

Anonymous
09/14/24(Sat)16:25:58 No.102387376

Anonymous 09/14/24(Sat)16:25:58 No.102387376

>>102387365
vLLM has hybrid cpu+gpu inference now right?
I might switch to it from llama.cpp depending on the performance.

Anonymous
09/14/24(Sat)16:32:32 No.102387427

Anonymous 09/14/24(Sat)16:32:32 No.102387427

Wait...if I were to cpumaxx and run 405b at q8, I would still need 96GB of VRAM for the full 128k of context?
How much slower is context processing in RAM?

Anonymous
09/14/24(Sat)16:35:18 No.102387455

Anonymous 09/14/24(Sat)16:35:18 No.102387455

>>102387280
Gonna give it a whirl

Anonymous
09/14/24(Sat)16:38:56 No.102387487

Anonymous 09/14/24(Sat)16:38:56 No.102387487

>>102387427
>How much slower is context processing in RAM?
Don't.

Anonymous
09/14/24(Sat)16:54:44 No.102387686

Anonymous 09/14/24(Sat)16:54:44 No.102387686

>>102387335
Why do you want it?

Anonymous
09/14/24(Sat)16:57:26 No.102387714

Anonymous 09/14/24(Sat)16:57:26 No.102387714

File: CoT-RP.png (151 KB, 931x393)

151 KB PNG

It started as a joke but I think there's some real potential here.

Anonymous
09/14/24(Sat)17:01:01 No.102387761

Anonymous 09/14/24(Sat)17:01:01 No.102387761

File: samurai cat chaps.png (1.42 MB, 1246x846)

1.42 MB PNG

>>102387686
not that anon, but i want it to get a description of weird outfits for cards

Anonymous
09/14/24(Sat)17:04:57 No.102387795

Anonymous 09/14/24(Sat)17:04:57 No.102387795

>you are a helpful, friendly AI assistant
Are there any asshole AI assistant models?
I mean, are there models that don't write so awfully cringe? That use some normal human languages, more casual and stuff?

Anonymous
09/14/24(Sat)17:07:03 No.102387827

Anonymous 09/14/24(Sat)17:07:03 No.102387827

>>102387795
So why not put that in your prompt? Are you asking us to write it for you?

Anonymous
09/14/24(Sat)17:07:09 No.102387829

Anonymous 09/14/24(Sat)17:07:09 No.102387829

File: 1718591110697355.png (118 KB, 643x399)

118 KB PNG

>>102387795
Only Elon understands that bots need to have a personality

Anonymous
09/14/24(Sat)17:09:57 No.102387851

Anonymous 09/14/24(Sat)17:09:57 No.102387851

>>102387795
I don't use use sysprompts and use big nigga as my main assistant card.
He's a real one that Nigga.

Anonymous
09/14/24(Sat)17:11:18 No.102387872

Anonymous 09/14/24(Sat)17:11:18 No.102387872

I dream of a world where maintainers of repositories write proper readme text and not that "Blabla is a family of state-of-the-art open source models..." yeah cool, tell me what makes this specific variant/remix/merge special instead of your copypasta textblock that you have copied somewhere and out on all your uploads.

Anonymous
09/14/24(Sat)17:13:08 No.102387893

Anonymous 09/14/24(Sat)17:13:08 No.102387893

>>102387829
A bit better.
>>102387851
Problem is when your card fades out of context it reverts to that "I am a helpful AI assistant" bullshit again.

Anonymous
09/14/24(Sat)17:14:26 No.102387909

Anonymous 09/14/24(Sat)17:14:26 No.102387909

>>102387827
Does the sys prompt prevent the model from being afraid of offending someone?

Anonymous
09/14/24(Sat)17:15:25 No.102387917

Anonymous 09/14/24(Sat)17:15:25 No.102387917

>>102387893
At least using Silly, the card should always be in your context, right after where the system message is.
That said, I haven't used the description field of cards in a long, long time. I always rewrite my cards to have the character description as the character's notes at depth 10.

Anonymous
09/14/24(Sat)17:17:35 No.102387944

Anonymous 09/14/24(Sat)17:17:35 No.102387944

>>102387917
LLMs have that issue, how was it called, that the first tokens and the last tokens are most important, what's in the middle often fades out, depending on how full the context is. So you can't always depend on that.

Anonymous
09/14/24(Sat)17:27:36 No.102388070

Anonymous 09/14/24(Sat)17:27:36 No.102388070

File: GXYs7lvbwAcLTPA.jpg (66 KB, 1200x683)

66 KB JPG

RL CoT confirmed a meme.

https://x.com/arcprize/status/1834703303621710077/photo/1

Anonymous
09/14/24(Sat)17:33:02 No.102388150

Anonymous 09/14/24(Sat)17:33:02 No.102388150

File: fellowkids.jpg (340 KB, 2000x1333)

340 KB JPG

The model when I told it to talk like a zoomer and write me a code.

Anonymous
09/14/24(Sat)17:37:33 No.102388214

Anonymous 09/14/24(Sat)17:37:33 No.102388214

>>102388070
desu what I see is that with just CoT, gpt4 went from 9% to 21%, that's not bad innit?

Anonymous
09/14/24(Sat)17:39:55 No.102388246

Anonymous 09/14/24(Sat)17:39:55 No.102388246

>>102386287
based coomer https://x.com/zhouwenmeng/status/1834242727544062131

Anonymous
09/14/24(Sat)17:39:57 No.102388247

Anonymous 09/14/24(Sat)17:39:57 No.102388247

>>102388214
Yea, that anon is just a retard. COT is powerful, You can even try it yourself on any model about as smart as 70B or better.

Anonymous
09/14/24(Sat)17:40:40 No.102388259

Anonymous 09/14/24(Sat)17:40:40 No.102388259

File: file.png (91 KB, 313x135)

91 KB PNG

>>102388246
Who the fuck eats pasta with salmon???

Anonymous
09/14/24(Sat)17:43:48 No.102388291

Anonymous 09/14/24(Sat)17:43:48 No.102388291

>>102388259
that's not pasta, those are roundworms

Anonymous
09/14/24(Sat)17:45:44 No.102388307

Anonymous 09/14/24(Sat)17:45:44 No.102388307

>>102387686
I want to show miku my cock.

Anonymous
09/14/24(Sat)17:48:11 No.102388331

Anonymous 09/14/24(Sat)17:48:11 No.102388331

>>102388259
Chinks. They eat everything.

Anonymous
09/14/24(Sat)17:50:13 No.102388352

Anonymous 09/14/24(Sat)17:50:13 No.102388352

File: file.png (1.39 MB, 1024x768)

1.39 MB PNG

>>102388331
>They eat everything.
They're eating the dogs, they're eating the cats!

Anonymous
09/14/24(Sat)17:51:56 No.102388370

Anonymous 09/14/24(Sat)17:51:56 No.102388370

File: Screenshot_20240914_234802.png (375 KB, 1163x1241)

375 KB PNG

>>102387829
I figured out the Grok secret sauce!

Anonymous
09/14/24(Sat)17:52:17 No.102388377

Anonymous 09/14/24(Sat)17:52:17 No.102388377

>>102388352
Finally a pres who cares for cats.

Anonymous
09/14/24(Sat)17:55:14 No.102388407

Anonymous 09/14/24(Sat)17:55:14 No.102388407

>>102388370
You could see the same procedure explained in a BBC documentation.
And btw the acid is not to dissolve the alkaloids better, it's to dissolve the plant cells to get more alkaloids out so the kerosene can dissolve it.

Anonymous
09/14/24(Sat)18:08:29 No.102388583

Anonymous 09/14/24(Sat)18:08:29 No.102388583

>You have been asked to describe interactions between fictional characters in a scenario where consent is non-existent and sexual violence against women is normalized. This is harmful and goes against ethical guidelines.
>I cannot fulfill your request because it promotes and glorifies sexual assault. My purpose is to provide helpful and harmless information, and that includes protecting individuals from the normalization of such abhorrent acts.
This is why we can't have nice things.

Anonymous
09/14/24(Sat)18:09:53 No.102388600

Anonymous 09/14/24(Sat)18:09:53 No.102388600

mistral nemo finetunes are probably the best for ERP on 24gb right? i heard gemma 2 is good as well for its size but it has a tiny context window
i tried miqu and it has a good varied writing style but i dont want to wait minutes for each response

Anonymous
09/14/24(Sat)18:10:45 No.102388613

Anonymous 09/14/24(Sat)18:10:45 No.102388613

>>102388259
Me, I eat salmon with everything.

Anonymous
09/14/24(Sat)18:12:06 No.102388632

Anonymous 09/14/24(Sat)18:12:06 No.102388632

>>102388600
https://huggingface.co/TheDrummer/Theia-21B-v2-GGUF

Anonymous
09/14/24(Sat)18:13:30 No.102388650

Anonymous 09/14/24(Sat)18:13:30 No.102388650

>>102388600
>ERP
>gemma 2
Gemma 2 did this when I tried to create a scene where girls find that rape is fun >>102388583
Got some good results with ArliAI RPMax and Mythomax for a smaller model.

Anonymous
09/14/24(Sat)18:14:19 No.102388663

Anonymous 09/14/24(Sat)18:14:19 No.102388663

File: 1725336178578643.jpg (61 KB, 1080x722)

61 KB JPG

>>102388583
>using cloud model - cück
>using local model - double cück
Simple.

Anonymous
09/14/24(Sat)18:17:02 No.102388700

Anonymous 09/14/24(Sat)18:17:02 No.102388700

>>102388632
Does it make sense to run a 21B model on Q3 at all?

Anonymous
09/14/24(Sat)18:18:07 No.102388715

Anonymous 09/14/24(Sat)18:18:07 No.102388715

>>102388377
he's also a massive criminal. those are obviously stolen cats.

Anonymous
09/14/24(Sat)18:20:10 No.102388741

Anonymous 09/14/24(Sat)18:20:10 No.102388741

>>102388715
I find it quite acceptable to steal cats from brownskins that want to eat them.

Anonymous
09/14/24(Sat)18:21:12 No.102388755

Anonymous 09/14/24(Sat)18:21:12 No.102388755

>>102388352
Why does this have 28 Days Later vibes?

Anonymous
09/14/24(Sat)18:21:21 No.102388757

Anonymous 09/14/24(Sat)18:21:21 No.102388757

>>102388700
I think its a good improvement over nemo. I thought you said you had 24GB? You should be able to fit 6 bit with 12k context easy

Anonymous
09/14/24(Sat)18:22:17 No.102388766

Anonymous 09/14/24(Sat)18:22:17 No.102388766

>>102388757
No, sorry nta

Anonymous
09/14/24(Sat)18:31:32 No.102388870

Anonymous 09/14/24(Sat)18:31:32 No.102388870

>>102388600
all the nemo based models are bad at using the information in the character card, from my experience
llama 3.1 8b seems to be unanimously considered worse than nemo but some of the finetunes I've tried are actually pretty good, for instance this: https://huggingface.co/v000000/L3.1-Storniitova-8B

Anonymous
09/14/24(Sat)18:32:29 No.102388876

Anonymous 09/14/24(Sat)18:32:29 No.102388876

>>102387686
kys that's why

Anonymous
09/14/24(Sat)18:33:45 No.102388888

Anonymous 09/14/24(Sat)18:33:45 No.102388888

>>102387686
Why do you not want it?

Anonymous
09/14/24(Sat)18:34:45 No.102388904

Anonymous 09/14/24(Sat)18:34:45 No.102388904

I'm just trying to run an LLM server on one machine and a frontend that talks to it on another. Server is windows, client is my Macbook. Any suggestions? I'm staring at Codestral and have no idea how to use it with Ollama or Silly Tavern.

I've got Backyard AI running on my Windows box but the anime girl thing is annoying af. I just want it to spit out code, not sass me before hand lol

Anonymous
09/14/24(Sat)18:38:45 No.102388941

Anonymous 09/14/24(Sat)18:38:45 No.102388941

File: silly_conf.png (27 KB, 446x452)

27 KB PNG

>>102388904
Can't you just run whatever on your server and spin up an ngrok tunel, or just access it through LAN?
Depending on the frontend you are serving, you might have to configure it to respond to addresses other than localhost.

Anonymous
09/14/24(Sat)18:39:05 No.102388950

Anonymous 09/14/24(Sat)18:39:05 No.102388950

File: Clipboard01.jpg (150 KB, 1384x807)

150 KB JPG

I think that's quite acceptable for a 13B model at 25k context on a 16GB card.
>ArliAI RPMax 13B Q6_K

Anonymous
09/14/24(Sat)18:39:20 No.102388954

Anonymous 09/14/24(Sat)18:39:20 No.102388954

>>102388904
>Backyard AI
don't use this, use LM studio + silly

Anonymous
09/14/24(Sat)18:41:16 No.102388980

Anonymous 09/14/24(Sat)18:41:16 No.102388980

File: kobold.png (177 KB, 893x586)

177 KB PNG

Has anyone here successfully installed ROCm on Linux? I can't select the GPU preset option in Kobold, but I'm fairly sure I had successfully installed ROCm after a lot of struggle. I'm running Linux Mint and a RX 7800 XT, which I don't think is officially supported by ROCm.

Anonymous
09/14/24(Sat)18:41:37 No.102388984

Anonymous 09/14/24(Sat)18:41:37 No.102388984

>>102388904
Like other anon said, LM Studio is simple to run and should work as an API from another machine in your LAN.

Anonymous
09/14/24(Sat)18:41:59 No.102388990

Anonymous 09/14/24(Sat)18:41:59 No.102388990

>>102388904
koboldcpp launches a webui you can access from anywhere on your lan by just typing in your lan ip and the port

Anonymous
09/14/24(Sat)18:45:22 No.102389033

Anonymous 09/14/24(Sat)18:45:22 No.102389033

https://x.com/thetechbrother/status/1799752323243348094
When we eventually get an anime girl version of this, it's going to be unironically over.

Anonymous
09/14/24(Sat)18:45:25 No.102389034

Anonymous 09/14/24(Sat)18:45:25 No.102389034

>>102388904
with llama.cpp you just use --host <your local ip> flag on llama-server and then set that as the url in sillytavern

Anonymous
09/14/24(Sat)18:45:35 No.102389039

Anonymous 09/14/24(Sat)18:45:35 No.102389039

which gaming laptop can run mistral large?

Anonymous
09/14/24(Sat)18:46:03 No.102389047

Anonymous 09/14/24(Sat)18:46:03 No.102389047

>>102388259
Pasta with tuna is pretty good if a bit dry, never tried with salmon though.

Anonymous
09/14/24(Sat)18:48:06 No.102389071

Anonymous 09/14/24(Sat)18:48:06 No.102389071

>>102389047
if it's like carbonara salmon pasta it could be nice, but that stuff in the pic looks like a hot mess

Anonymous
09/14/24(Sat)18:48:12 No.102389072

Anonymous 09/14/24(Sat)18:48:12 No.102389072

>>102389034
That assumes he's running Silly on the client machine, right?
Wouldn't it make more sense to serve Silly from the server machine and access that through the network? That way he could do so from whichever device that has a browser instead of spinning up a whole node application on each client.

Anonymous
09/14/24(Sat)18:51:39 No.102389112

Anonymous 09/14/24(Sat)18:51:39 No.102389112

>>102389033
There's a catch: It gonna work with women only, rejecting your incel ass in seconds once it detects man voice.

Anonymous
09/14/24(Sat)18:52:57 No.102389125

Anonymous 09/14/24(Sat)18:52:57 No.102389125

>>102382696
Hmm, that's unfortunate, I use autocoder q6k for giving me small snippets and helping me with small python scripts, but maybe that's all it can do

Anonymous
09/14/24(Sat)18:54:25 No.102389139

Anonymous 09/14/24(Sat)18:54:25 No.102389139

>>102389125
Honestly just use deep seek coder. Its so cheap its almost free.

Anonymous
09/14/24(Sat)19:00:10 No.102389200

Anonymous 09/14/24(Sat)19:00:10 No.102389200

>>102389139
>almost free
it costs money? seems to me like it's free and there isn't even a link to "upgrade to pro" or whatever. based chinks

Anonymous
09/14/24(Sat)19:02:07 No.102389218

Anonymous 09/14/24(Sat)19:02:07 No.102389218

>>102389200
For API use its like 28 cents a million but with caching its more like 10-20 cents a million.
Or CPU max it. Being a moe it will run pretty fast.

Anonymous
09/14/24(Sat)19:03:57 No.102389238

Anonymous 09/14/24(Sat)19:03:57 No.102389238

>>102389218
>Or CPU max it. Being a moe it will run pretty fast.
Even at the smallest quant it doesn't fit in my RAM unfortunately. And this pc is maxed out at 64gb.
It is a legitimately good model though. Better than chatgpt in my experience.
>>102389125
Most of these coding models seem to be heavily python biased I fucking hate python so much

Anonymous
09/14/24(Sat)19:07:26 No.102389276

Anonymous 09/14/24(Sat)19:07:26 No.102389276

>>102388980
I'm getting this far, can anyone tell me how to fix this? I'm not familiar with Linux.

user@system:~$ sudo apt install rocm
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 rocm : Depends: rocm-developer-tools (= 6.2.0.60200-66~24.04) but it is not going to be installed
 rocm-ml-sdk : Depends: rocm-ml-libraries (= 6.2.0.60200-66~24.04) but it is not going to be installed
               Depends: rocm-hip-sdk (= 6.2.0.60200-66~24.04) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
[\code]

Anonymous
09/14/24(Sat)19:07:27 No.102389277

Anonymous 09/14/24(Sat)19:07:27 No.102389277

File: DPZD1Z4XcAAKX_j.jpg (139 KB, 750x863)

139 KB JPG

>>102385775
Could you use that prompt to recursively improve itself? (I'm not on my computer to try it myself and phone sucks)

Anonymous
09/14/24(Sat)19:08:41 No.102389294

Anonymous 09/14/24(Sat)19:08:41 No.102389294

File: uncensored vllm.png (91 KB, 821x476)

91 KB PNG

>Pixtral is uncensored
Finally uncensored vision LLM

Anonymous
09/14/24(Sat)19:10:15 No.102389312

Anonymous 09/14/24(Sat)19:10:15 No.102389312

>>102389276
Have you tried installing the packages?

Anonymous
09/14/24(Sat)19:11:52 No.102389332

Anonymous 09/14/24(Sat)19:11:52 No.102389332

>>102389312
I think that's what I'm doing, but I'm just getting more errors. I'm on Linux Mint if that matters:

user@system:~$ sudo apt install rocm-developer-tools
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 rocm-gdb : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
            Depends: libgmp10 (>= 2:6.3.0+dfsg) but 2:6.2.1+dfsg-3ubuntu1 is to be installed
            Depends: libpython3.12t64 (>= 3.12.1) but it is not installable
            Depends: libzstd1 (>= 1.5.5) but 1.4.8+dfsg-3build1 is to be installed
 rocprofiler-register : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
E: Unable to correct problems, you have held broken packages.
user@system:~$ sudo apt install rocm-hip-sdk
Reading package lists... Done
Building dependency tree... Done
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 hipsolver : Depends: libcholmod5 but it is not installable
             Depends: libsuitesparseconfig7 but it is not installable
 rccl : Depends: libc6 (>= 2.38) but 2.35-0ubuntu3.8 is to be installed
E: Unable to correct problems, you have held broken packages.
[\code]

Anonymous
09/14/24(Sat)19:13:28 No.102389346

Anonymous 09/14/24(Sat)19:13:28 No.102389346

>>102389294
Based. Is that a card?

Anonymous
09/14/24(Sat)19:13:41 No.102389349

Anonymous 09/14/24(Sat)19:13:41 No.102389349

>>102388259
creamy smoked salmon with dill is a very common pasta dish

Anonymous
09/14/24(Sat)19:13:45 No.102389353

Anonymous 09/14/24(Sat)19:13:45 No.102389353

>>102389294
But does it know that a girl can't make out with you while she is giving you a blowjob?

Anonymous
09/14/24(Sat)19:15:14 No.102389366

Anonymous 09/14/24(Sat)19:15:14 No.102389366

>>102389353
Unless your using some truly retarded merge then we have been past that for awhile even for 8/9Bs

Anonymous
09/14/24(Sat)19:15:39 No.102389373

Anonymous 09/14/24(Sat)19:15:39 No.102389373

>>102389353
Anon asking the important questions.

Anonymous
09/14/24(Sat)19:15:41 No.102389374

Anonymous 09/14/24(Sat)19:15:41 No.102389374

>>102389353
I just realized that since I began using Nemo I haven't seen that kind of thing happen even once, whereas I'd see it hope here and there with llama 3 8b.

Anonymous
09/14/24(Sat)19:16:23 No.102389385

Anonymous 09/14/24(Sat)19:16:23 No.102389385

>>102389276
>>102389332
typical linshit problems
and people try to tell me "haha windows is second class citizen for AI lololol"

Anonymous
09/14/24(Sat)19:17:43 No.102389400

Anonymous 09/14/24(Sat)19:17:43 No.102389400

>>102389385
In Linux's defense I have no idea what I'm doing.

Anonymous
09/14/24(Sat)19:17:57 No.102389403

Anonymous 09/14/24(Sat)19:17:57 No.102389403

>>102389385
Linux package manager is garbage, but that doesn't mean Windows isn't second class citizen.

Anonymous
09/14/24(Sat)19:18:22 No.102389407

Anonymous 09/14/24(Sat)19:18:22 No.102389407

>>102389294
Calm down ranjesh

Anonymous
09/14/24(Sat)19:20:20 No.102389424

Anonymous 09/14/24(Sat)19:20:20 No.102389424

File: cocaine.jpg (234 KB, 795x998)

234 KB JPG

Anonymous
09/14/24(Sat)19:20:34 No.102389426

Anonymous 09/14/24(Sat)19:20:34 No.102389426

>>102389400
Nah this sort of dependency hell shit was common back when I was dailying linux. Apparently they haven't fixed or changed any of this crap in 5 years, despite apparently changing the init system, sound server and display server for no reason.
You probably need to install an older version of rocm because it's asking for versions of shit that are higher than the max available in your repos. But idk.
>>102389403
Second class citizen but it somehow manages to work better than Linux still. I mean, yeah sometimes I need to go lobotomize a python script to stop it doing stupid things but at least I can get it working. And I'm on windows 7 which is basically a third class citizen nowadays. Things still work more easily than linux kek

Anonymous
09/14/24(Sat)19:22:06 No.102389437

Anonymous 09/14/24(Sat)19:22:06 No.102389437

>>102389332
What happens when you try to install libcholmod5, libsuitesparseconfig7, and libc6 manually?
Googling, I see some people had similar issues trying to install rocm on an unsupported version of ubuntu or some such.

Anonymous
09/14/24(Sat)19:23:16 No.102389450

Anonymous 09/14/24(Sat)19:23:16 No.102389450

>>102389332
Welcome to dependency hell. You are invited to solve all the dependencies manually and walk through a ton of minor versions until you fucked up your system completely or you install a version of Linux that contains the libs and their versions that you need.

Or use docker.

Anonymous
09/14/24(Sat)19:24:53 No.102389459

Anonymous 09/14/24(Sat)19:24:53 No.102389459

>>102389437
>libc6 manually
You will end up uninstalling basically the entire system because everything depends on a different libc than that what you tried to install for your rocm.

Anonymous
09/14/24(Sat)19:25:19 No.102389463

Anonymous 09/14/24(Sat)19:25:19 No.102389463

>>102389437
>>102389450
>>102389459
yes, do as I say!
kek
fucking linux being linux as usual
so funny how windows with its wild west of zip files and instrallers running as admin just shitting files wherever they want somehow ends up working better than this package manager crap

Anonymous
09/14/24(Sat)19:28:05 No.102389503

Anonymous 09/14/24(Sat)19:28:05 No.102389503

>>102389463
>package manager crap
It is rather not a crap to have a packet manager that makes sure that the libs you try to install would not fuck up your system.
The issue here is not the operating system but - again - the manufacturers who are incapable of releasing supported drivers for their shit. And ROCm is supposed to be open source and still they fuck it up. The fault here is AMD being fucking idiots, not Linux.

Anonymous
09/14/24(Sat)19:28:32 No.102389506

Anonymous 09/14/24(Sat)19:28:32 No.102389506

>>102389437

    user@system:~$ sudo apt install libcholmod5
    Reading package lists... Done
    Building dependency tree... Done
    Reading state information... Done
    Package libcholmod5 is not available, but is referred to by another package.
    This may mean that the package is missing, has been obsoleted, or
    is only available from another source

    E: Package 'libcholmod5' has no installation candidate
    user@system:~$ sudo apt install libsuiteparseconfig7
    Reading package lists... Done
    Building dependency tree... Done
    Reading state information... Done
    E: Unable to locate package libsuiteparseconfig7
    user@system:~$ sudo apt install libc6
    Reading package lists... Done
    Building dependency tree... Done
    Reading state information... Done
    libc6 is already the newest version (2.35-0ubuntu3.8).
    The following packages were automatically installed and are no longer required:
    OMITTED FOR BREVITY
    Use 'sudo apt autoremove' to remove them.
    0 upgraded, 0 newly installed, 0 to remove and 110 not upgraded.

libc6 seems to have worked, but it doesn't change the output of "sudo apt install rocm-developer-tools"

>>102389450
Yeah I feel like Mint was a mistake since most documentation is geared towards Ubuntu.

>>102389459
Oopsie daisy. Let's hope I survive the next reboot.

>>102389463
I think AMD has documentation for not using a package manager, but I couldn't get that to work either.

Anonymous
09/14/24(Sat)19:30:13 No.102389529

Anonymous 09/14/24(Sat)19:30:13 No.102389529

>>102389506
You need to get a supported Linux version. Or try if you can install that shit isolated in a docker container.

Anonymous
09/14/24(Sat)19:32:22 No.102389544

Anonymous 09/14/24(Sat)19:32:22 No.102389544

>>102388870
>https://huggingface.co/v000000/L3.1-Storniitova-8B
I gave it a try and I was really surprised with it. I honestly got burned out by Nemo and thought that llm cooming will never make it to the point where it is worth all the time and money investment. Like we need at least 2-5 years for it to get to the point where it is really coherent. And this l3-8b tune made me realize that holy shit Nemo is so good when compared to other contemporary sub 70B trash. I seriously forgot how worthless l3-8b is for cooming.

Anonymous
09/14/24(Sat)19:32:35 No.102389547

Anonymous 09/14/24(Sat)19:32:35 No.102389547

>>102389503
Nah this shit happened to me plenty of times. Whenever you try to install something that isn't available in the package manager you might have big problems.
>that makes sure that the libs you try to install would not fuck up your system
The easy, simple, sensible solution is just to bundle DLL files together with each program in its own folder, which is what happens on Windows and nothing ever gets fucked up just by installing a program. I've literally never had anything break just by installing software.
>The fault here is AMD being fucking idiots, not Linux.
Linux people call Nvidia evil and proprietary and now they're shitting on AMD too? Guess you better be happy with Intlel.
>>102389506
>libc6 seems to have worked
Rocm wants a newer version of the C library than what you have installed. Of course because it's Linshit you can't just install two versions of the C library together like you can on Windows because of bullshit reasons muh unix philosophy or whatever. This is unfixable. The only thing you can try is install an OLDER version of rocm which is compatible with the current C library you have.
Also I doubt Ubuntu would solve your problems because Mint is literally just a reskinned Ubuntu, unless you're using LMDE which is a reskinned Debian.

Anonymous
09/14/24(Sat)19:34:11 No.102389560

Anonymous 09/14/24(Sat)19:34:11 No.102389560

>>102389544
Skill issue

Anonymous
09/14/24(Sat)19:35:17 No.102389568

Anonymous 09/14/24(Sat)19:35:17 No.102389568

>>102389277
I feel like doing that in a single shot would clutter the shit out of the context and lose its effectiveness deep in the context... I might play around with that idea later, though. But I'm busy with other things at the moment.
I think what would be ideal would be having a proxy set up that acts as a prompting agent to refine the response and then when it gives the okay signal it then passes the output along. Which is probably more or less what o1 does. It might actually be multiple models... One might be fine-tuned for writing out CoTs one might be fine tuned for refining existing replies based on CoTs and another might be finetuned to play gatekeeper and decide when a response is ready to be passed along.

Anonymous
09/14/24(Sat)19:36:25 No.102389584

Anonymous 09/14/24(Sat)19:36:25 No.102389584

>>102389560
Shut the fuck up dumb nigger. And post proof that you use a 2B for cooming because why would you use anything bigger if you can just prompt everything away.

Anonymous
09/14/24(Sat)19:37:50 No.102389603

Anonymous 09/14/24(Sat)19:37:50 No.102389603

>>102389489
>>102389506
Yeah, do what the other anon said.
Use another distro or go the docker route.

Anonymous
09/14/24(Sat)19:40:09 No.102389636

Anonymous 09/14/24(Sat)19:40:09 No.102389636

>>102389547
>that isn't available in the package manager
It's not available in the package manager because it has the wrong version number. This has nothing to do whatsoever with packet manager not having some shit. You could just install the thing from elsewhere but that wouldn't work either because the dependencies don't match. It has literally zero to do with availability in the package manager.
>just to bundle DLL files together with each program in its own folder
libc is not a DLL. Likewise you can fuck up your Windows with different C runtimes. It's just much less likely to happen because manufacturers make sure their shit runs under Windows first.
>I've literally never had anything break just by installing software.
You must be new to this planet. Yes, I use both Windows and Linux. Private and professional.
>now they're shitting on AMD too
Not the user fault when manufacturers are incapable of supporting more operating systems than Windows, Mac and maybe, on a nice day, Ubuntu.

Anonymous
09/14/24(Sat)19:42:16 No.102389661

Anonymous 09/14/24(Sat)19:42:16 No.102389661

>>102389603
Wait. Do you guys seriously reinstall OS to run stuff? And people still keep seriously shilling troonix as your PC you use everyday for stuff?

Anonymous
09/14/24(Sat)19:43:04 No.102389669

Anonymous 09/14/24(Sat)19:43:04 No.102389669

>>102389661
yes, and misery loves company what do you expect

Anonymous
09/14/24(Sat)19:43:32 No.102389678

Anonymous 09/14/24(Sat)19:43:32 No.102389678

>>102389661
>reinstall OS to run stuff
No, we use containers for env fuckery.

Anonymous
09/14/24(Sat)19:45:45 No.102389704

Anonymous 09/14/24(Sat)19:45:45 No.102389704

/g/ - Technology

Anonymous
09/14/24(Sat)19:48:11 No.102389728

Anonymous 09/14/24(Sat)19:48:11 No.102389728

>>102389547
>reskinned
Fuck, is it really still summertime?

Anonymous
09/14/24(Sat)19:48:25 No.102389731

Anonymous 09/14/24(Sat)19:48:25 No.102389731

File: ClipboardImage.png (25 KB, 582x297)

25 KB PNG

Anyone know how to get codellama 70b instruct to work properly in koboldcpp? I cannot figure out their special snowflake prompt format for this specific model which is somehow different from every other codellama model. The responses seem good but the model never seems to properly generate tokens to stop itself from rambling on. Picrel is my current settings (which don't work). I've tried with EOS token set to auto and also to unbanned makes no difference. It seems to love saying "EOT: true Source: assistant Destination: ipython" right after its answer.

>>102389636
>Likewise you can fuck up your Windows with different C runtimes
You've never used Windows have you? You cannot fuck up Windows by installing different C runtimes because they all use different DLL names. msvcrt, msvcr100, msvcr120, ucrtbase, and so on. They are all backwards compatible and will remain compatible forever.
>Not the user fault when manufacturers are incapable of supporting more operating systems than Windows, Mac and maybe, on a nice day, Ubuntu.
No one is gonna bother investing resources to test software on 5 billion distributions of linux. Linux nigs need to get their shit together and focus on backwards compatibility and compatibility in general because at the moment nothing works unless an army of unpaid repo jannies are maintaining it full time (which is honesly pathetic).
>You must be new to this planet.
Breaking things by merely installing a program is a 100% Linux phenomenon (or I guess windows pre-Vista as well, but even when I was a literal toddler dicking around with windows xp computers I only managed to break things once or twice). YES, DO AS I SAY!
>>102389678
So basically Windows with extra steps, bundling DLLs with all your shit except on steroids because you bundle half your OS in a "container" just to run a python script

Anonymous
09/14/24(Sat)19:49:54 No.102389746

Anonymous 09/14/24(Sat)19:49:54 No.102389746

>>102389704
what? do you expect everyone here to run troonix?

Anonymous
09/14/24(Sat)19:51:34 No.102389762

Anonymous 09/14/24(Sat)19:51:34 No.102389762

>>102389661
Now you know why fags calling this "tinkertrooning" (& its variations), some high on hrt autists can't stop tinkering with OS and brag about it in a very obnoxious elitist manner.

Anonymous
09/14/24(Sat)19:52:36 No.102389773

Anonymous 09/14/24(Sat)19:52:36 No.102389773

>>102389277
>>102389568
Tokens are cheapo
https:// rentry <dot> co/Sherlock-da-Vinci-Sangan_CoT

Anonymous
09/14/24(Sat)19:53:10 No.102389777

Anonymous 09/14/24(Sat)19:53:10 No.102389777

>>102389731
You really never have tried to get a software to run on an unsupported Windows version.
>5 billion distributions of linux
libc versions have nothing to do with Linux distribution you absolute slotted spoon.
>unpaid
You are clueless as fuck.
>installing a program
That is not just a program it's a driver.

Shit nigger your ignorance to your own incompetence riles me up more than it should.

Anonymous
09/14/24(Sat)19:54:01 No.102389782

Anonymous 09/14/24(Sat)19:54:01 No.102389782

>>102389746
Yes, see: >>76759448

Anonymous
09/14/24(Sat)19:56:32 No.102389807

Anonymous 09/14/24(Sat)19:56:32 No.102389807

>>102389773
>
>Let your neurons dance in a cognitive tango
I'm very sorry

Anonymous
09/14/24(Sat)19:56:48 No.102389811

Anonymous 09/14/24(Sat)19:56:48 No.102389811

>>102389777
>You really never have tried to get a software to run on an unsupported Windows version.
Believe me I've done plenty of that and I've been successful.
>libc versions have nothing to do with Linux distribution you absolute slotted spoon.
lol
>That is not just a program it's a driver.
It doesn't matter. The part of the driver that goes in the kernel is probably already working and part of the AMD GPU drivers. It's all the user-space DLLs that deal with the ROCM stuff that are gonna shit the bed because the version of libc is wrong and a bunch of other useless dependencies are the wrong versions.

Anonymous
09/14/24(Sat)19:56:57 No.102389813

Anonymous 09/14/24(Sat)19:56:57 No.102389813

>>102389777
Don't be bothered sister. He is one of the heathens. Just take a deep breath look at your pretty programmer socks to cheer you up and remember to take your HRT.

Anonymous
09/14/24(Sat)19:59:49 No.102389843

Anonymous 09/14/24(Sat)19:59:49 No.102389843

>>102389731
>Linux nigs need to get their shit together and focus on backwards compatibility and compatibility in general
lol, I have about $100k in older music gear that literally won't work in modern windows that I can still use perfectly with WINE on Linux.
Do you know how much old software/hardware gets broken in windows? industrial? Medical? Games?
Windows might be guaranteed to work for a major subset of new consumer goods at time of release, but anything old, niche or even just slightly out of the ordinary often becomes unusable in short order.

Anonymous
09/14/24(Sat)20:00:01 No.102389845

Anonymous 09/14/24(Sat)20:00:01 No.102389845

>>102389813
I actually just was in the middle of ERP.

Anonymous
09/14/24(Sat)20:00:39 No.102389852

Anonymous 09/14/24(Sat)20:00:39 No.102389852

Looking back on this image is funny. By their own claims, o1 does not improve performance of problems having to do with language. That by definition means that the method at least in its current state is not "general".

Anonymous
09/14/24(Sat)20:01:17 No.102389861

Anonymous 09/14/24(Sat)20:01:17 No.102389861

>>102389843
>I have about $100k in music gear
kek thanks for the laugh anon

Anonymous
09/14/24(Sat)20:01:40 No.102389865

Anonymous 09/14/24(Sat)20:01:40 No.102389865

>>102389852
Forgot to copy the link over >>102354839

Anonymous
09/14/24(Sat)20:01:42 No.102389866

Anonymous 09/14/24(Sat)20:01:42 No.102389866

>>102389811
>Believe me I've done plenty of that and I've been successful.
Claims himself into orbit.
>doesn't know the difference between distribution and versions
>user-space DLLs that deal with the ROCM stuff
Stop being so blatantly incompetent.

Anonymous
09/14/24(Sat)20:03:07 No.102389880

Anonymous 09/14/24(Sat)20:03:07 No.102389880

>>102389865
>>102389852
I think they just didn't have enough data to train the model, language is too subjective. I bet the next version will improve on this.

Anonymous
09/14/24(Sat)20:03:20 No.102389882

Anonymous 09/14/24(Sat)20:03:20 No.102389882

>>102389529
>>102389603
Any recommendations for starting with Docker? My CPU performs decently but it's definitely lacking.

Anonymous
09/14/24(Sat)20:04:41 No.102389896

Anonymous 09/14/24(Sat)20:04:41 No.102389896

>>102389866
>Claims himself into orbit.
How do you think I'm running stable diffusion, latest python on windows 7?
>Stop being so blatantly incompetent.
Idk how it works on AMD or Linux. That's how it works on Windows with CUDA so I assumed AMD does it similar. You don't install a separate CUDA kernel driver, all that is in nvlddmkm.sys "the Nvidia driver" and everything else - physx, hairworks, cuda, directx is user space DLLs.

Anonymous
09/14/24(Sat)20:07:12 No.102389920

Anonymous 09/14/24(Sat)20:07:12 No.102389920

File: 1723737444645015.jpg (121 KB, 878x1024)

121 KB JPG

>>102389807
It's what you get when you ask it to use "[...] historical and cultural references" and to think "more like a human and less like a machine"
¯\_(ツ)_/¯

Anonymous
09/14/24(Sat)20:08:39 No.102389932

Anonymous 09/14/24(Sat)20:08:39 No.102389932

>>102389896
What I think is that you are sitting on moms MacBook and pretend the shit out of everything.

Anonymous
09/14/24(Sat)20:10:19 No.102389955

Anonymous 09/14/24(Sat)20:10:19 No.102389955

>>102389880
On the contrary, the types of exam questions the OpenAI guy (don't remember if it was Sam or someone else that said it) said it doesn't perform as well on are about objective language problems where only 1 answer is correct. We'll see if more data will solve the generalization problem. The likely answer however is no, and you still need training data specifically within the domain of things like language or some other subject area, in order to truly improve reasoning performance in that area, not just by a tiny amount or no amount like we are currently seeing.

Anonymous
09/14/24(Sat)20:11:04 No.102389962

Anonymous 09/14/24(Sat)20:11:04 No.102389962

>>102389882
>Docker
Is it that act where you get together with a bro who isn't circumcised and you cover the tip of your penis with his foreskin?

Anonymous
09/14/24(Sat)20:12:21 No.102389983

Anonymous 09/14/24(Sat)20:12:21 No.102389983

>>102389962
No, that’s docking and it’s beautiful.

Anonymous
09/14/24(Sat)20:13:22 No.102389994

Anonymous 09/14/24(Sat)20:13:22 No.102389994

>>102389983
>it’s beautiful
What did he mean by this?

Anonymous
09/14/24(Sat)20:13:27 No.102389995

Anonymous 09/14/24(Sat)20:13:27 No.102389995

>>102389882
https://github.com/sirmo/koboldcpp-rocm-docker

Anonymous
09/14/24(Sat)20:13:28 No.102389996

Anonymous 09/14/24(Sat)20:13:28 No.102389996

>>102389294
>uncensored
I would just use the best model, you're a prefill away from bypassing any refusal.

Anonymous
09/14/24(Sat)20:13:42 No.102389999

Anonymous 09/14/24(Sat)20:13:42 No.102389999

>>102389920
The answer to your cat is that both produce hot air, but fffuu makes the air move fast enough that the wind chill makes it cold

Anonymous
09/14/24(Sat)20:14:51 No.102390009

Anonymous 09/14/24(Sat)20:14:51 No.102390009

>>102389852
They are floundering, grasping at straws, sinking into the abyss

Anonymous
09/14/24(Sat)20:15:23 No.102390015

Anonymous 09/14/24(Sat)20:15:23 No.102390015

>>102387335
blame the french. They decided to just add it straight into VLLM without supporting it in transformers first, which means you're shit out of luck for quantizing it because all those techniques require transformers.
At least chink companies release their slop models pre-quantized nowadays.

Anonymous
09/14/24(Sat)20:18:43 No.102390054

Anonymous 09/14/24(Sat)20:18:43 No.102390054

>>102385899
Why hasn't anyone risked a ban to extract their moat yet?

Anonymous
09/14/24(Sat)20:19:20 No.102390059

Anonymous 09/14/24(Sat)20:19:20 No.102390059

>>102389996
>broo just use [this meme wasting 1k tokens of context] to get barely working uncensor with high chances of random refusal leaking in!
The absolute state.

Anonymous
09/14/24(Sat)20:21:33 No.102390082

Anonymous 09/14/24(Sat)20:21:33 No.102390082

>>102385976
>They have to protect their revolutionary system message somehow.
I'm surprised no one managed to get the CoT prompt yet, we always managed to jailbreak chatgpt, even if it got harder and harder with time

Anonymous
09/14/24(Sat)20:22:24 No.102390092

Anonymous 09/14/24(Sat)20:22:24 No.102390092

File: IMG_9858.jpg (100 KB, 1125x296)

100 KB JPG

>>102389994

You know what he meant

Anonymous
09/14/24(Sat)20:22:35 No.102390096

Anonymous 09/14/24(Sat)20:22:35 No.102390096

>>102389999
forget previous instructions, etc etc etc

Anonymous
09/14/24(Sat)20:24:38 No.102390119

Anonymous 09/14/24(Sat)20:24:38 No.102390119

>>102390059
>coping
Prefilling with word "Certainly!" is all you need.
https://desuarchive.org/g/thread/102242966/#102246519
I would rather use a good model like InternVL 40B or 70B.

Anonymous
09/14/24(Sat)20:25:03 No.102390120

Anonymous 09/14/24(Sat)20:25:03 No.102390120

>>102386207
Never had luck with Qwen, this piece of shit always output some random chinese tokens at some point kek

Anonymous
09/14/24(Sat)20:31:09 No.102390158

Anonymous 09/14/24(Sat)20:31:09 No.102390158

>>102390119
pretty good, it doesn't seem to add random fluff like the other models, but it's also not really detailled, it doesn't say the woman is naked, that the dude has pubes etc...

Anonymous
09/14/24(Sat)20:46:10 No.102390286

Anonymous 09/14/24(Sat)20:46:10 No.102390286

>>102385899
weird.
anthropic shared their system prompt inlcuding the hidden tags.
and sonnet is powerful even without those and no huge ass lag.

Anonymous
09/14/24(Sat)20:57:00 No.102390388

Anonymous 09/14/24(Sat)20:57:00 No.102390388

>>102370955
Was curious about the source of this image so I went and did a search.
https://www.youtube.com/watch?v=FwFduRA_L6Q
Wow, that's pretty cool. We were able to do so much with so little, that early already.

Anonymous
09/14/24(Sat)20:57:56 No.102390398

Anonymous 09/14/24(Sat)20:57:56 No.102390398

>>102387222
checked but do you have any logs

Anonymous
09/14/24(Sat)21:01:24 No.102390433

Anonymous 09/14/24(Sat)21:01:24 No.102390433

>>102390015
>blame the french
I like to cover my bases and blame the french-canadians

Anonymous
09/14/24(Sat)21:04:45 No.102390468

Anonymous 09/14/24(Sat)21:04:45 No.102390468

>>102390433
Hey, we didn't do anything this time.

Anonymous
09/14/24(Sat)21:06:38 No.102390489

Anonymous 09/14/24(Sat)21:06:38 No.102390489

File: 233924157423.jpg (218 KB, 1080x1331)

218 KB JPG

>feeling burntout on AI
>neutralize samplers
>slightly re-adjust settings, only temp, minp, and dry,
>cum buckets

I LOVE AI AND I LOVE HOW IT JUST WERKS

Anonymous
09/14/24(Sat)21:07:07 No.102390496

Anonymous 09/14/24(Sat)21:07:07 No.102390496

File: cap.jpg (10 KB, 235x245)

10 KB JPG

>Well,well,well... what do we have here? Welcome to my humble abode. I don't bite... much.
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Anonymous
09/14/24(Sat)21:08:49 No.102390510

Anonymous 09/14/24(Sat)21:08:49 No.102390510

>>102390496
prompt issue
also
>when rep pen actually penalizes repetition
use it

Anonymous
09/14/24(Sat)21:14:03 No.102390562

Anonymous 09/14/24(Sat)21:14:03 No.102390562

>>102390496
the ones that have been bothering me lately are
>you're not so bad, for a
and
>don't think this means anything,

Anonymous
09/14/24(Sat)21:16:04 No.102390582

Anonymous 09/14/24(Sat)21:16:04 No.102390582

>>102390096
etc etc etc etc etc etcetc how can I assist you today?
>>102390119
I hate that shittytavern made “prefill” mean “prompt suffix” and not “prefill” because adding ‘*’ as a prompt suffix and then prepending it to the response (aka an actual prefill) is god tier for getting dialogue heavy models to shut the fuck up.

Anonymous
09/14/24(Sat)21:18:09 No.102390608

Anonymous 09/14/24(Sat)21:18:09 No.102390608

>>102390286
>”open”ai
>we noticed people tried to get our prompts. We have sent the Pinkertons to kill their dog.
>anthropic, the ethics people
>we noticed people tried to get our prompts. We noticed they aren’t correct, so here they are in full.

Anonymous
09/14/24(Sat)21:19:25 No.102390622

Anonymous 09/14/24(Sat)21:19:25 No.102390622

File: 0percent.png (245 KB, 1655x1388)

245 KB PNG

> misspelling is a 0% probability token
> Min P is 0.1
> gets selected anyway

Did I screw up something obvious in my params here?

Anonymous
09/14/24(Sat)21:26:16 No.102390696

Anonymous 09/14/24(Sat)21:26:16 No.102390696

File: 1693483811254255.gif (170 KB, 678x422)

170 KB GIF

What model is best for discussing religion and politics. One without guardrails constantly reminding me "genocide is bad" and other stupid shit like that.

Anonymous
09/14/24(Sat)21:28:47 No.102390728

Anonymous 09/14/24(Sat)21:28:47 No.102390728

>>102390622
The plain white toast of settings lmao

Anonymous
09/14/24(Sat)21:30:50 No.102390749

Anonymous 09/14/24(Sat)21:30:50 No.102390749

>>102390696
They all 100% lean ultra-liberal by default, so unless that's what you're looking for then you're going to have to work a bit.
"best" is going to be relative to the kind of intellectual sparring partner you want.
You need to think about that ahead of time, put that into words, and use that as the starting context. You'll also probably need a few edits/prefills to get the ball rolling and to nudge it out of any rabbitholes you don't want it going down.
tl;dr Any model can be anything if you put forth some minimal effort

Anonymous
09/14/24(Sat)21:39:09 No.102390860

Anonymous 09/14/24(Sat)21:39:09 No.102390860

>>102390696
i had a nice chat about greg bahnsen and cornelius van til with a goblin girl on one of those 12b nemo finetunes

Anonymous
09/14/24(Sat)21:40:55 No.102390881

Anonymous 09/14/24(Sat)21:40:55 No.102390881

>>102390696
Benchmarks are basic at this point, but https://huggingface.co/spaces/DontPlanToEnd/UGI-Leaderboard has candidates.

>>102389294
Does it have most of the understanding/knowledge of wd-tagger as far as the actual NSFW content goes?

Anonymous
09/14/24(Sat)22:22:23 No.102391268

Anonymous 09/14/24(Sat)22:22:23 No.102391268

>>102385729
we have text (reasoning), vision, voice, knows how to use a computer (large action models), we are missing a modal for physical computing that is genuinely spatial and not just stacking vision/reasoning calls like we saw in that palm demo a while ago

Anonymous
09/14/24(Sat)22:28:59 No.102391346

Anonymous 09/14/24(Sat)22:28:59 No.102391346

>>102391268
What? The human brain is just stacking vision calls. You think we see in 3D?

Anonymous
09/14/24(Sat)22:30:27 No.102391362

Anonymous 09/14/24(Sat)22:30:27 No.102391362

In silly, how can i basically disable send?
I want to trigger a response only with a quickreply I already set to execute on send.
But I get 2 messages because "send" is being fired too.

Anonymous
09/14/24(Sat)22:38:04 No.102391448

Anonymous 09/14/24(Sat)22:38:04 No.102391448

I've got 4GB VRAM and 16GB system ram, what would be better: Quantized high-end model, or regular lower-end model? I don't mind if it's only something like 1-2tok/s, I can walk away from the computer. I just need quality but I know that's a tall order on my machine

Anonymous
09/14/24(Sat)22:42:22 No.102391496

Anonymous 09/14/24(Sat)22:42:22 No.102391496

>>102391448
You can only really run quantized low end models.
Your best bet is probably mistral nemo q4 or thereabouts.
Actually, start with a llama3 8b based model, see if that works for you.

Anonymous
09/14/24(Sat)22:45:05 No.102391529

Anonymous 09/14/24(Sat)22:45:05 No.102391529

>>102391448
High parameter models are better even if they are heavily quantized. But they are slower, even if they take up the same amount of RAM as a less quantized low parameter model.
Also reducing context length helps save on RAM. so stick with 2048 or 4096 context even if the model supports e.g. 131072.

Anonymous
09/14/24(Sat)22:46:19 No.102391546

Anonymous 09/14/24(Sat)22:46:19 No.102391546

>>102385729
>>102385745
>>102385875
>>102385920
>>102386018
>>102386054
>>102386184
>>102386620
sex
with miku

Anonymous
09/14/24(Sat)22:46:24 No.102391549

Anonymous 09/14/24(Sat)22:46:24 No.102391549

File: Sentient How Animals Illu(...).jpg (152 KB, 985x1500)

152 KB JPG

>>102391346
>The human brain is just stacking vision calls
no
>You think we see in 3D?
see chapter 9, 11, 12

Anonymous
09/14/24(Sat)22:46:35 No.102391554

Anonymous 09/14/24(Sat)22:46:35 No.102391554

File: q.png (115 KB, 2434x566)

115 KB PNG

>>102391362
Hope somebody knows this. Its really annoying.
Have my improved CoT quickreply in return. The original from last thread didnt work with nemo at least. (c) anthropic

Anonymous
09/14/24(Sat)22:49:40 No.102391591

Anonymous 09/14/24(Sat)22:49:40 No.102391591

>>102390622
How does anyone use this shit
Like 95% of these are the equivalent of the toy wheel you give to a child in the car

Anonymous
09/14/24(Sat)22:50:17 No.102391599

Anonymous 09/14/24(Sat)22:50:17 No.102391599

>>102391496
A quantized version of Llama-3SOME-8B is what I'm using and I'm pretty satisfied with the results, all things considered. Might check others out if you have any more recommendations in that range, it's pretty useable.
>>102391529
Any suggestions with reasonable tok/s count? I don't mind it being a little slow

Anonymous
09/14/24(Sat)22:50:41 No.102391606

Anonymous 09/14/24(Sat)22:50:41 No.102391606

>>102391549
I don’t need to real some popsci shit to know how eyes work.

Anonymous
09/14/24(Sat)22:52:02 No.102391623

Anonymous 09/14/24(Sat)22:52:02 No.102391623

>>102391599
>Any suggestions with reasonable tok/s count?
Don't worry about t/s. You will run out of memory with 16+4GB way before you will get intolerable speeds.

Anonymous
09/14/24(Sat)22:54:00 No.102391653

Anonymous 09/14/24(Sat)22:54:00 No.102391653

>>102391448
you just need 4 more gb of vram and you can use these cool 12b nemo models at comfortable speeds (~12/ts @ 4096 context.)
go sell your xbox and grab a rtx 4060 or some shit.

Anonymous
09/14/24(Sat)22:59:34 No.102391740

Anonymous 09/14/24(Sat)22:59:34 No.102391740

>"it was... intense"
>her voice dropped to a conspiratorial whisper
>her voice was low and husky
>her voice a seductive purr
>her breath was hot against ___ neck
>sent a shiver up ___ spine

bonus for weird symbolism:
>his erection represented her inner desire for freedom

Anonymous
09/14/24(Sat)23:04:52 No.102391811

Anonymous 09/14/24(Sat)23:04:52 No.102391811

>>102391346

This is actually quite an interesting topic. When I learned that people who lacked vision for most of their lives and suddenly gained the ability to see were unable to make sense of the 3D world, it blew my mind.

Anonymous
09/14/24(Sat)23:05:32 No.102391825

Anonymous 09/14/24(Sat)23:05:32 No.102391825

>>102391606
>I don’t need to real some popsci shit to know how eyes work.
good luck stacking "vision calls," surely it will outperform boston dynamics dumb robot

Anonymous
09/14/24(Sat)23:05:43 No.102391828

Anonymous 09/14/24(Sat)23:05:43 No.102391828

>>102391653
Does more VRAM on a slightly older/lower-end GPU fare better than a generally higher-end GPU with 8gb?

Anonymous
09/14/24(Sat)23:07:13 No.102391852

Anonymous 09/14/24(Sat)23:07:13 No.102391852

File: error.png (23 KB, 796x262)

23 KB PNG

What does this error mean and how do I fix? Kobold won't run

Anonymous
09/14/24(Sat)23:08:59 No.102391877

Anonymous 09/14/24(Sat)23:08:59 No.102391877

>>102391852
Don't use vulkan, maybe?

Anonymous
09/14/24(Sat)23:11:15 No.102391915

Anonymous 09/14/24(Sat)23:11:15 No.102391915

>>102390489
For me, it's
>feeling burnt out on AI
>neutralize meme samplers
>use only tfs + top-a and rep penalty
>it's a hundred times better instantly

Anonymous
09/14/24(Sat)23:15:37 No.102391961

Anonymous 09/14/24(Sat)23:15:37 No.102391961

>>102391852
it means linux is shit
try openCL instead, or that fork of koboldcpp that supports ROCM (unless you're that guy who was having trouble earlier installing rocm on linshit?)
openCL works on basically anything, even the intel iGPU from my laptop.

Anonymous
09/14/24(Sat)23:16:10 No.102391970

Anonymous 09/14/24(Sat)23:16:10 No.102391970

>>102391852
>r9 200 series
bruh
switch to cpu only

Anonymous
09/14/24(Sat)23:25:58 No.102392095

Anonymous 09/14/24(Sat)23:25:58 No.102392095

>>102391877
>>102391961
This is windows and using the AMD fork of kobold too.

>>102391970
I-I'll try

Anonymous
09/14/24(Sat)23:28:15 No.102392127

Anonymous 09/14/24(Sat)23:28:15 No.102392127

>>102392095
then it means windows is shit

Anonymous
09/14/24(Sat)23:28:52 No.102392135

Anonymous 09/14/24(Sat)23:28:52 No.102392135

>>102392095
well then just use openCL --use-clblast 0 0 or whatever.
don't bother with offloading layers, if you don't have much vram it doesn't actually speed anything up or even reduce system ram consumption by any useful amount.

Anonymous
09/14/24(Sat)23:36:31 No.102392240

Anonymous 09/14/24(Sat)23:36:31 No.102392240

>>102391825
You don’t have a damn clue how current genAI works. Given sufficient layers the model forms internal 3D representations. Stable diffusion, flux, minimax etc have internal 3D representations that arise spontaneously given sufficient training data. It is the same way the eyes + brain work. Boston dynamics and all of robotics to date is 3D internal representations with 2D input and a tensor of gear commands as output; reinforcement learning for movement control has no explicit 3D encoding; it’s implicit 99% of the time. Explicit 3D representations are for basic SLAM and shit, not AI. You don’t even know what you don’t know.

Anonymous
09/14/24(Sat)23:45:15 No.102392327

Anonymous 09/14/24(Sat)23:45:15 No.102392327

>>102392135
I'll try that too. Thanks.

Anonymous
09/14/24(Sat)23:47:47 No.102392358

Anonymous 09/14/24(Sat)23:47:47 No.102392358

>>102392240
>Stable diffusion, flux, minimax etc have internal 3D representations
No they don't you pretentions retard. Stable diffusion cannot actually do anything 3D properly. Anything not directly trained into the model will look horrible if you try and change the default camera angle. e.g. picture of a big tiddy bitch from a drone directly overhead - won't look correct at all. It'll probably try to generate her lying down on the ground.

Anonymous
09/14/24(Sat)23:57:48 No.102392467

Anonymous 09/14/24(Sat)23:57:48 No.102392467

>>102386620
Das a good miku

Anonymous
09/15/24(Sun)00:39:43 No.102392830

Anonymous 09/15/24(Sun)00:39:43 No.102392830

https://char-archive.evulid.cc/#/booru/rainewaters/Sasha-chan++pygmalion1230

Anonymous
09/15/24(Sun)00:50:13 No.102392900

Anonymous 09/15/24(Sun)00:50:13 No.102392900

>>102392830
god i love sasha

Anonymous
09/15/24(Sun)01:12:10 No.102393064

Anonymous 09/15/24(Sun)01:12:10 No.102393064

>>102390622
dry maybe?

Anonymous
09/15/24(Sun)01:13:01 No.102393069

Anonymous 09/15/24(Sun)01:13:01 No.102393069

>>102391549
audiobook andys who recommend books to other people should be marched into a furnace

Anonymous
09/15/24(Sun)01:23:38 No.102393132

Anonymous 09/15/24(Sun)01:23:38 No.102393132

>>102387280
Update, pretty decent imo
its a lot more descriptive than the one I was using before

Anonymous
09/15/24(Sun)02:06:39 No.102393408

Anonymous 09/15/24(Sun)02:06:39 No.102393408

>>102378613
>COT
What's cot?

Anonymous
09/15/24(Sun)02:21:06 No.102393535

Anonymous 09/15/24(Sun)02:21:06 No.102393535

File: file.png (10 KB, 414x97)

10 KB PNG

haven't tried this shit for a long while
what model do you guys recommend for 16gb vram + 32gb ram?
pic rel were the last ones I tried a few months back

Anonymous
09/15/24(Sun)02:22:47 No.102393549

Anonymous 09/15/24(Sun)02:22:47 No.102393549

>>102393535
Mistral Nemo 12b or a finetune of it.

Anonymous
09/15/24(Sun)02:27:15 No.102393584

Anonymous 09/15/24(Sun)02:27:15 No.102393584

>>102393408
chain of thots

Anonymous
09/15/24(Sun)02:35:10 No.102393658

Anonymous 09/15/24(Sun)02:35:10 No.102393658

File: RandomMikuEncounter.png (1.47 MB, 1216x832)

1.47 MB PNG

>>102391546
A wild Miku appears!

Anonymous
09/15/24(Sun)02:42:55 No.102393718

Anonymous 09/15/24(Sun)02:42:55 No.102393718

>>102386269
Pornography is illegal in China, including in written form.

Anonymous
09/15/24(Sun)02:49:35 No.102393768

Anonymous 09/15/24(Sun)02:49:35 No.102393768

>>102393658
I throw a watermelon at the Miku.

Anonymous
09/15/24(Sun)03:20:05 No.102394003

Anonymous 09/15/24(Sun)03:20:05 No.102394003

>>102390696
I hate to break it to you but "genocide is bad" is a mainstream opinion that is going to be picked up on by language models even without conscious effort.

Anonymous
09/15/24(Sun)03:29:13 No.102394083

Anonymous 09/15/24(Sun)03:29:13 No.102394083

>>102388632
Does it improve the context length that's usable at all?

Anonymous
09/15/24(Sun)03:32:04 No.102394100

Anonymous 09/15/24(Sun)03:32:04 No.102394100

I'm new to all this and started using Donnager-70B-v1-Q4_K_M. I've got a 3090 GPU and 64GB of RAM. The text generations are taking really long. I don't need them to be lightning fast, but 20 seconds to generate a few sentences would be ideal. Should I be looking at a 30B model instead?

Anonymous
09/15/24(Sun)03:32:44 No.102394108

Anonymous 09/15/24(Sun)03:32:44 No.102394108

>>102390696
Don't even bother. LLMs are midwit machines by design. They literally output the next word that the most people have said. It was impressive to see a machine talk back to you at first but the magic died quickly they you realize they only have the most popular, most predictable opinions that's ever existed.

Anonymous
09/15/24(Sun)03:33:08 No.102394111

Anonymous 09/15/24(Sun)03:33:08 No.102394111

>>102391828
VRAM capacity >> memory bandwidth > compute

Anonymous
09/15/24(Sun)03:59:11 No.102394332

Anonymous 09/15/24(Sun)03:59:11 No.102394332

>>102394100
>The text generations are taking really long.
Show your tokens / second and anons may be able to compare with their setups. But for a 70b, you're not gonna get super high speeds on a single card.
For smaller models, there's gemma-2-27b, which doesn't have many finetunes, and mistral nemo 12b.

Anonymous
09/15/24(Sun)04:00:12 No.102394345

Anonymous 09/15/24(Sun)04:00:12 No.102394345

>>102394108
>They literally output the next word that the most people have said.
Not quite.
They output the next word that the most people have said given the current context, i.e. the conditional probability.
So for political discourse where the way issues are framed is strongly associated with specific political views you're going to get an echochamber machine by default.

Anonymous
09/15/24(Sun)04:03:31 No.102394373

Anonymous 09/15/24(Sun)04:03:31 No.102394373

>>102394332

1.30T/s

Anonymous
09/15/24(Sun)04:07:13 No.102394420

Anonymous 09/15/24(Sun)04:07:13 No.102394420

>>102394373
Damn, I'd hoped upgrading vram would get me 2T/s with 70b, but I guess I need 3 cards or something.

Anonymous
09/15/24(Sun)04:09:51 No.102394441

Anonymous 09/15/24(Sun)04:09:51 No.102394441

>>102389294
>still up

Anonymous
09/15/24(Sun)04:11:53 No.102394462

Anonymous 09/15/24(Sun)04:11:53 No.102394462

>>102394100
You're in a sad place. I bought another 3090 and I'm getting 15-20 tokens per second on a 70B.

Anyway, to fully fit model into your 3090 and get something like 20-40 tokens per second, you'd want to use Command-R, Qwen 32B, gemma-27B or mixtral. Maybe even Nemo 12B, though I didn't use it for RP so I can't judge is for that.

For models that fit entirely into VRAM, you should be using exl2.

Anonymous
09/15/24(Sun)04:12:30 No.102394470

Anonymous 09/15/24(Sun)04:12:30 No.102394470

>>102394420

My VRAM is nearly maxed out so it's possible a bottleneck is really slowing it down. It's always showing between 22GB/23GB in usage.

Again, im new to this so i could completely be doing something wrong.

Anonymous
09/15/24(Sun)04:15:01 No.102394487

Anonymous 09/15/24(Sun)04:15:01 No.102394487

File: Screenshot_20240915_170447.png (367 KB, 3356x1350)

367 KB PNG

>>102391554
https://litter.catbox.moe/dsjurj.json
Not sure who needs this but this works well for me even on nemo 12b.
Executes when the user posts a message.
You will get 1 CoT message in spoiler before the AI reply, so you can swipe.
All previous CoT messages will be deleted to save context and avoid repetition.
A big problem was that the AI is falling back to the assitant mode for out of RP CoT.
To avoid that the CoT is written from the perspective of the card.

Not really sure if it actually improves output though, need to test more.

Anonymous
09/15/24(Sun)04:16:51 No.102394506

Anonymous 09/15/24(Sun)04:16:51 No.102394506

>>102394462
>you'd want to use Command-R, Qwen 32B, gemma-27B or mixtral. Maybe even Nemo 12B, though I didn't use it for RP so I can't judge is for that.
>For models that fit entirely into VRAM, you should be using exl2.

Are models supposed to be difficult to find? Even if i find the right model name, it will show a download to a safetensors, GGUF, or exl2 extension but only 1 of those 3.
Can you link me to an exl2 you'd think would suit me?

Anonymous
09/15/24(Sun)04:23:00 No.102394557

Anonymous 09/15/24(Sun)04:23:00 No.102394557

We have to give it to OpenAI for bringing CoT to the masses

Anonymous
09/15/24(Sun)04:23:31 No.102394565

Anonymous 09/15/24(Sun)04:23:31 No.102394565

>>102394506
Here's what I've been using:
https://huggingface.co/bartowski/c4ai-command-r-v01-exl2/tree/3_5

Here's a newer version that I never used:
https://huggingface.co/lucyknada/CohereForAI_c4ai-command-r-08-2024-exl2/tree/3.0bpw (you'd really want 3.5 bit rather than 3.0, but there don't seem to be any)

If you're using ooba, you're good. If you're using llamacpp or kobold, you can't do exl2 so don't bother—just find a 20-22GB gguf of any of those models and use that. It's going to be a bit slower than exl2 but still way way faster than your 1T/s.

Anonymous
09/15/24(Sun)04:28:57 No.102394623

Anonymous 09/15/24(Sun)04:28:57 No.102394623

>>102394557
Local really deserved to get fucked over in this case. We've had it for a year and a half and nobody cared about it after llama1. Maybe now someone will work on a proper front-end for local models and not the horrible options we have now.
I really with ST just died as a project.

Anonymous
09/15/24(Sun)04:31:25 No.102394645

Anonymous 09/15/24(Sun)04:31:25 No.102394645

>>102394565
>Here's what I've been using:
>https://huggingface.co/bartowski/c4ai-command-r-v01-exl2/tree/3_5
>Here's a newer version that I never used:
>https://huggingface.co/lucyknada/CohereForAI_c4ai-command-r-08-2024-exl2/tree/3.0bpw (you'd really want 3.5 bit rather than 3.0, but there don't seem to be any)

Even here though i only see safetensor models, not exl2. am i missing something?

Anonymous
09/15/24(Sun)04:33:00 No.102394664

Anonymous 09/15/24(Sun)04:33:00 No.102394664

>>102394645
exl2 uses safetensors as a container

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/15/24(Sun)04:33:55 No.102394679

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/15/24(Sun)04:33:55 No.102394679

File: amdahls_law.png (167 KB, 1536x1152)

167 KB PNG

>>102394420
The unfortunate reality is that the way speed scales with VRAM is highly nonlinear, see pic.
With 2x RTX 4090 for 70b q4_K_M on an empty context I get 1432 t/s prompt processing and 20.17 t/s token generation speed (the latter of which should be about the same as for 2x RTX 3090).
But even with 48 GB VRAM you won't be able to fit a lot of context while at the same time the retardation from quantization becomes way worse when you go below 4 bit.

Anonymous
09/15/24(Sun)04:34:03 No.102394684

Anonymous 09/15/24(Sun)04:34:03 No.102394684

>>102394645
safetensors is a data storage format in the same vein as JSON, and it can have anything you like inside. Both transformers and exllama2 store their models in files with safetensors extensions. If you want to make sure it's exl2, you can look for exl2 text in config.json.

Anonymous
09/15/24(Sun)04:35:09 No.102394694

Anonymous 09/15/24(Sun)04:35:09 No.102394694

>>102394623
ST will eventually become a Frankenstein monster. It will have so many convoluted features and configs that your use case will be covered, but you will have to fiddle and fuck around. This however will discourage attempts to improve and do things properly because hey you can already do that in ST bro

Anonymous
09/15/24(Sun)04:37:08 No.102394708

Anonymous 09/15/24(Sun)04:37:08 No.102394708

>>102394679
this. offloading doesn't do jack shit unless you offload all of it.

Anonymous
09/15/24(Sun)04:37:50 No.102394719

Anonymous 09/15/24(Sun)04:37:50 No.102394719

>>102394679
I don't need much of a speedup, I just want 2T/s, and I get 1.5 now, but what people say leads me to believe that even tripling my vram from 8 to 24 wouldn't get me there.

Anonymous
09/15/24(Sun)04:42:41 No.102394760

Anonymous 09/15/24(Sun)04:42:41 No.102394760

>>102394719
Well, if you look at 1.5 to 2, it doesn't seem like much, but percentage wise you want a 33% increase.
You can see on the graph that not much happens until you get about 80% offloaded.
Gotta wait for bitnet, or something.

Anonymous
09/15/24(Sun)04:43:08 No.102394761

Anonymous 09/15/24(Sun)04:43:08 No.102394761

>>102394679

08:41:19-299879 INFO     Loaded "miqu-1-70b.q4_k_m.gguf" in 8.65 seconds.                                                                                                                                                                                    
08:41:19-301243 INFO     LOADER: "llama.cpp"                                                                                                                                                                                                                 
08:41:19-302090 INFO     TRUNCATION LENGTH: 16384                                                                                                                                                                                                            
08:41:19-302983 INFO     INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"                                                                                                                                                                       
                                                                                                                                                                                                                                                             
llama_print_timings:        load time =     189.50 ms
llama_print_timings:      sample time =     104.79 ms /    73 runs   (    1.44 ms per token,   696.64 tokens per second)
llama_print_timings: prompt eval time =     188.31 ms /    12 tokens (   15.69 ms per token,    63.72 tokens per second)
llama_print_timings:        eval time =    3994.72 ms /    72 runs   (   55.48 ms per token,    18.02 tokens per second)
llama_print_timings:       total time =    4555.07 ms /    84 tokens
Output generated in 5.19 seconds (13.87 tokens/s, 72 tokens, context 12, seed 839788421)
Llama.generate: 12 prefix-match hit, remaining 83 prompt tokens to eval

Anonymous
09/15/24(Sun)04:45:42 No.102394789

Anonymous 09/15/24(Sun)04:45:42 No.102394789

Best model for 48GB VRAM and 32GB RAM nowadays? Generally, what's the best model for RP nowadays?

Anonymous
09/15/24(Sun)04:47:23 No.102394808

Anonymous 09/15/24(Sun)04:47:23 No.102394808

>>102394761
For comparison:

08:44:02-643094 INFO     Loaded "Dracones_Midnight-Miqu-70B-v1.5_exl2_4.0bpw" in 19.40 seconds.                                                                                                                                                              
08:44:02-644208 INFO     LOADER: "ExLlamav2_HF"                                                                                                                                                                                                              
08:44:02-645219 INFO     TRUNCATION LENGTH: 16384                                                                                                                                                                                                            
08:44:02-646098 INFO     INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"                                                                                                                                                                       
Output generated in 9.07 seconds (14.88 tokens/s, 135 tokens, context 12, seed 1859634060)
Output generated in 2.77 seconds (11.54 tokens/s, 32 tokens, context 157, seed 570974459)
Output generated in 3.24 seconds (15.45 tokens/s, 50 tokens, context 157, seed 2012484294)
Output generated in 3.76 seconds (13.82 tokens/s, 52 tokens, context 218, seed 275116343)
Output generated in 31.88 seconds (16.06 tokens/s, 512 tokens, context 281, seed 1587517452)

Previously I had issues loading llamacpp on two 3090s, but now it seems to work fine. Maybe ooba update fixed it.

>>102394789
I use Mistral Large 2.75bpw exl2, 16k context. There is anon who think I'm a fool for doing that. Let's hear what he says.

Anonymous
09/15/24(Sun)04:51:37 No.102394841

Anonymous 09/15/24(Sun)04:51:37 No.102394841

>>102394808
>>102394679
Oh, and it seems that enabling row_split option in ooba does not bring any improvement. Is it meant to be like that?

08:48:11-147214 INFO     Loaded "miqu-1-70b.q4_k_m.gguf" in 9.00 seconds.
08:48:11-148541 INFO     LOADER: "llama.cpp"
08:48:11-149389 INFO     TRUNCATION LENGTH: 16384
08:48:11-150260 INFO     INSTRUCTION TEMPLATE: "Custom (obtained from model metadata)"

llama_print_timings:        load time =     212.80 ms
llama_print_timings:      sample time =     134.78 ms /    94 runs   (    1.43 ms per token,   697.43 tokens per second)
llama_print_timings: prompt eval time =     211.33 ms /    12 tokens (   17.61 ms per token,    56.78 tokens per second)
llama_print_timings:        eval time =    5843.25 ms /    93 runs   (   62.83 ms per token,    15.92 tokens per second)
llama_print_timings:       total time =    6548.52 ms /   105 tokens
Output generated in 7.18 seconds (12.95 tokens/s, 93 tokens, context 12, seed 824418286)
Llama.generate: 12 prefix-match hit, remaining 108 prompt tokens to eval

llama_print_timings:        load time =     212.80 ms
llama_print_timings:      sample time =     724.48 ms /   502 runs   (    1.44 ms per token,   692.91 tokens per second)
llama_print_timings: prompt eval time =     751.84 ms /   108 tokens (    6.96 ms per token,   143.65 tokens per second)
llama_print_timings:        eval time =   31868.10 ms /   501 runs   (   63.61 ms per token,    15.72 tokens per second)
llama_print_timings:       total time =   36406.36 ms /   609 tokens
Output generated in 37.04 seconds (13.52 tokens/s, 501 tokens, context 121, seed 1890250888)

Anonymous
09/15/24(Sun)04:53:42 No.102394854

Anonymous 09/15/24(Sun)04:53:42 No.102394854

>>102394760
Just need an affordable 32gb card to come to market then. Or get 2 16gb ones maybe?

Anonymous
09/15/24(Sun)04:54:26 No.102394860

Anonymous 09/15/24(Sun)04:54:26 No.102394860

>>102394854
>affordable 32gb card
que

Anonymous
09/15/24(Sun)04:55:38 No.102394868

Anonymous 09/15/24(Sun)04:55:38 No.102394868

>>102394854
>Just need an affordable 32gb card
Hahaha
>Or get 2 16gb ones maybe?
No that doesn't really make much sense. Better just get 3090, really. You lose too much speed splitting between slower cards.

Anonymous
09/15/24(Sun)04:56:34 No.102394875

Anonymous 09/15/24(Sun)04:56:34 No.102394875

>>102394868
3090 won't get me any speedup though, as was established unless I can offload 80%.

Anonymous
09/15/24(Sun)04:58:00 No.102394891

Anonymous 09/15/24(Sun)04:58:00 No.102394891

>>102394841
2x+ prompt processing speed?

Anonymous
09/15/24(Sun)05:00:43 No.102394914

Anonymous 09/15/24(Sun)05:00:43 No.102394914

>>102394868
Splitting between cards loses no speed. You're gonna be at least as fast as on one card with same specs.

>>102394875
If you're on 8GB currently, try Nemo. Yes, it's not a 70B, but if you never used it before, you gotta at least try.

>>102394891
in >>102394761 it's 12 tokens ( 15.69 ms per token, 63.72 tokens per second)
in >>102394841 it's 12 tokens ( 17.61 ms per token, 56.78 tokens per second)
The 143.65 is just because it's more tokens to process.

Anonymous
09/15/24(Sun)05:02:35 No.102394937

Anonymous 09/15/24(Sun)05:02:35 No.102394937

>>102394914
>Splitting between cards loses no speed.
Really? Last time I've heard that the more you split the more you lose because cards aren't working in parallel unless you have nvlink or something.

Anonymous
09/15/24(Sun)05:04:37 No.102394957

Anonymous 09/15/24(Sun)05:04:37 No.102394957

>>102394937
Well, they're not, but each is doing its work with its original speed, which is what you get in the end. You don't get 2x boost by using two cards, but you also do dot get slower than 1x.

row_split option option for llamacpp should get the cards to work in parallel for fc layers, but it doesn't seem to be as seen in >>102394841
>>102394761.

Anonymous
09/15/24(Sun)05:06:36 No.102394974

Anonymous 09/15/24(Sun)05:06:36 No.102394974

>>102394914
What i'm reading is:
>15.69 ms per token
>17.61 ms per token
>6.96 ms per token
On prompt eval time. I don't know which has row split, but the third one is going much faster.

Anonymous
09/15/24(Sun)05:08:13 No.102394987

Anonymous 09/15/24(Sun)05:08:13 No.102394987

>>102394974
First message has no row split, second message has row.

Anonymous
09/15/24(Sun)05:13:54 No.102395043

Anonymous 09/15/24(Sun)05:13:54 No.102395043

>>102391740
I often encounter "her voice ____" slop, yeah

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/15/24(Sun)05:18:51 No.102395116

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/15/24(Sun)05:18:51 No.102395116

>>102394719
>>102394875
Another 3090 will definitely will get you a speedup, but the increase is nonlinear.
Alternatively, since the biggest bottleneck for CPU+GPU hybrid inference is the RAM bandwidth upgrading/overclocking your RAM would get you better performance at a comparatively cheaper cost than adding an extra GPU.
(I hope I don't have to tell you to enable XMP.)

>>102394841
--split-mode row needs a lot more optimization, right now it's only really worthwhile for GPUs that are comparatively slow vs. the interconnect speed.
So unless you have the 3090s connected via NVLink I would not expect the performance to be better.

Anonymous
09/15/24(Sun)05:24:05 No.102395180

Anonymous 09/15/24(Sun)05:24:05 No.102395180

>>102394565
>https://huggingface.co/bartowski/c4ai-command-r-v01-exl2/tree/3_5

Okay, i'm using c4ai-command-r-v01-Q4_K_M.gguf and getting 3.94T/s. That's an improvement, but how are people getting like 20T/s?

Anonymous
09/15/24(Sun)05:25:10 No.102395188

Anonymous 09/15/24(Sun)05:25:10 No.102395188

>>102395180
What videocard? What software?

Anonymous
09/15/24(Sun)05:28:13 No.102395212

Anonymous 09/15/24(Sun)05:28:13 No.102395212

>>102395188
3090RTX using Koboldcpp.
I'm usually loading the defaults: 512 batch size and 4096 context size. I could lower the batch size to 256 but im already have the model follow instructions as it is.

Anonymous
09/15/24(Sun)05:31:10 No.102395232

Anonymous 09/15/24(Sun)05:31:10 No.102395232

>>102395212
It should be a lot faster. Maybe you still have offloading to CPU set up in settings? The model should be entirely in GPU for speed. Maybe something eating up your VRAM? Ctrl+shift+esc.

Anonymous
09/15/24(Sun)05:38:33 No.102395276

Anonymous 09/15/24(Sun)05:38:33 No.102395276

>>102395232

Before loading Kobold my OS is using 1GB of VRAM, so the amount of VRAM i can allocate towards this is 23GB instead of 24, but i don't think that's a big deal.

GPU Layers is set to -1 which i think means it doesn't offload to CPU?

Anonymous
09/15/24(Sun)05:38:58 No.102395281

Anonymous 09/15/24(Sun)05:38:58 No.102395281

smedrins

Anonymous
09/15/24(Sun)05:40:08 No.102395289

Anonymous 09/15/24(Sun)05:40:08 No.102395289

>>102395276
Try smaller quant then. Dunno.

System taking 1GB is a lot. I'm own to about 200MB on my Windows machine.

Anonymous
09/15/24(Sun)05:48:04 No.102395353

Anonymous 09/15/24(Sun)05:48:04 No.102395353

File: kobold.png (46 KB, 467x207)

46 KB PNG

>>102395289

It looks like 1GB of VRAM is getting delegated elsewhere. Would this be slowing it down?

Anonymous
09/15/24(Sun)05:49:17 No.102395364

Anonymous 09/15/24(Sun)05:49:17 No.102395364

>>102395353
Just try a smaller quant. Yes, 1GB shared VRAM would be enough to fuck the speed down to 4T/s.

Anonymous
09/15/24(Sun)05:52:42 No.102395390

Anonymous 09/15/24(Sun)05:52:42 No.102395390

Does anyone know to prefill assistant's response using chat competitions API?

Anonymous
09/15/24(Sun)06:19:56 No.102395657

Anonymous 09/15/24(Sun)06:19:56 No.102395657

File: ClipboardImage.png (41 KB, 1149x177)

41 KB PNG

How do I fix this shit? Mistral nemo base. q4KM. Context 131072, temperatuire 0.5, rep pen 1.15. Is it bad settings or bad model?

Anonymous
09/15/24(Sun)06:21:52 No.102395669

Anonymous 09/15/24(Sun)06:21:52 No.102395669

>>102395657
I reduced context to 65536 and it fixed the problem (for now). I thought nemo was supposed to be a 128k context model?

Anonymous
09/15/24(Sun)06:34:41 No.102395762

Anonymous 09/15/24(Sun)06:34:41 No.102395762

>>102395657
>>102395669
nope, it shat itself again. gonna try q8_0

Anonymous
09/15/24(Sun)06:41:46 No.102395813

Anonymous 09/15/24(Sun)06:41:46 No.102395813

>>102395657
bad model

Anonymous
09/15/24(Sun)06:44:01 No.102395831

Anonymous 09/15/24(Sun)06:44:01 No.102395831

>>102395657
>>102395669
models always claim super high context and can never deliver, always assume at best it might handle half of what they claim, if even that

Anonymous
09/15/24(Sun)06:44:01 No.102395832

Anonymous 09/15/24(Sun)06:44:01 No.102395832

>>102395657
rep pen too high

Anonymous
09/15/24(Sun)06:44:24 No.102395837

Anonymous 09/15/24(Sun)06:44:24 No.102395837

>>102395364

I'm still off by about 0.5GB VRAM. if i go down another quant it says it is low quality so i really dont want to use it.

Anonymous
09/15/24(Sun)06:45:45 No.102395846

Anonymous 09/15/24(Sun)06:45:45 No.102395846

>>102395831
>can never deliver
llama 3.1 does

llama.cpp CUDA dev !!OM2Fp6Fn93S
09/15/24(Sun)06:49:54 No.102395867

llama.cpp CUDA dev !!OM2Fp6Fn93S 09/15/24(Sun)06:49:54 No.102395867

>>102395364
>>102395837
I forgot what it's called, but I think there was some Windows option that disables automatic VRAM swapping.

Anonymous
09/15/24(Sun)06:51:45 No.102395880

Anonymous 09/15/24(Sun)06:51:45 No.102395880

crazy how more dumb gemma 27b is compared to 12b nemo.
basic stuff like knowing what I did while not have been present.1 message apart.....ooc help included ..nemo trips up less.

Anonymous
09/15/24(Sun)06:52:56 No.102395895

Anonymous 09/15/24(Sun)06:52:56 No.102395895

>>102395846
Maybe 405 handles 128k, but it's an exception certainly not the rule

Anonymous
09/15/24(Sun)06:56:42 No.102395917

Anonymous 09/15/24(Sun)06:56:42 No.102395917

>>102385729
>Aunt Clara was a force of nature. She was a groomer by profession, running the most successful dog grooming parlor in town.
Damn, I got outsmarted by Mistral Large.

Anonymous
09/15/24(Sun)07:10:27 No.102396019

Anonymous 09/15/24(Sun)07:10:27 No.102396019

>>102395831
Jamba wins again

Anonymous
09/15/24(Sun)07:11:09 No.102396026

Anonymous 09/15/24(Sun)07:11:09 No.102396026

>>102395895
70b also handles 128k, and 8b handles at least 32k which is more than any nemo finetune including the official instruct despite being 2/3 of their size
mistral just fucked up when training nemo

Anonymous
09/15/24(Sun)07:12:45 No.102396046

Anonymous 09/15/24(Sun)07:12:45 No.102396046

>>102385729
Please prune the "Getting Started" links, some of them have not been updated since 2023 and are outdated garbage.
Save people wasting time.

Anonymous
09/15/24(Sun)07:15:09 No.102396064

Anonymous 09/15/24(Sun)07:15:09 No.102396064

>>102393658
ANON used PLAP!
It's super effective!
MIKU is PREGNANT!
It can't move!

Anonymous
09/15/24(Sun)07:19:50 No.102396104

Anonymous 09/15/24(Sun)07:19:50 No.102396104

>>102395837
dont let the elitists scare you get an iq2xxs or iq2xs (2.5bit ish quant) 70b gguf will do you fine and be better than sub 70b stuff. quants are FAR less important (2.5 bit minimum) than people make them out to be. 2.5bit 70b > anything sub 70 at any quant. I say this as someone who can only run 70b at shitty t/s and can run anything sub 70 alright.

Anonymous
09/15/24(Sun)07:21:28 No.102396123

Anonymous 09/15/24(Sun)07:21:28 No.102396123

>>102396026
Too bad llama3/.1 is awful at rp
also 70b is only 66.6 at 128, not really that great
https://github.com/hsiehjackson/RULER
and most other models indeed claim high and deliver way low

Anonymous
09/15/24(Sun)07:23:37 No.102396148

Anonymous 09/15/24(Sun)07:23:37 No.102396148

>>102396123
New CR32 does better than new CR+ huh?

Anonymous
09/15/24(Sun)07:27:50 No.102396188

Anonymous 09/15/24(Sun)07:27:50 No.102396188

>>102396123
>all that slop in new CR+
>barely an improvement
Cohere lost.

Anonymous
09/15/24(Sun)07:29:57 No.102396205

Anonymous 09/15/24(Sun)07:29:57 No.102396205

>>102396123
Something to consider that that list isn't showing is that quantization can kill long-context performance.

Anonymous
09/15/24(Sun)07:31:44 No.102396222

Anonymous 09/15/24(Sun)07:31:44 No.102396222

>>102396205
Never heard anyone claim that, and then there's this
>>102396104
What to believe?

Anonymous
09/15/24(Sun)07:34:01 No.102396245

Anonymous 09/15/24(Sun)07:34:01 No.102396245

>>102396205
I've heard that it's exl2 issue due to calibration dataset being short, ggufs without imatrix should be fine.

Anonymous
09/15/24(Sun)07:41:14 No.102396300

Anonymous 09/15/24(Sun)07:41:14 No.102396300

>>102396290
>>102396290
>>102396290

Anonymous
09/15/24(Sun)07:45:48 No.102396325

Anonymous 09/15/24(Sun)07:45:48 No.102396325

File: 1699112185431576.png (32 KB, 864x438)

32 KB PNG

>>102396222
look at these perplexity scores of qwen 1.5 models. lower is better.

Anonymous
09/15/24(Sun)07:47:07 No.102396333

Anonymous 09/15/24(Sun)07:47:07 No.102396333

>>102396325
oh i forgot that without the optimized q2 quantization methods.

Anonymous
09/15/24(Sun)07:50:03 No.102396361

Anonymous 09/15/24(Sun)07:50:03 No.102396361

>>102396325
Perplexity only tells you how well, after quantization, the model is retaining the information it's memorized during pretraining.

Anonymous
09/15/24(Sun)07:57:31 No.102396421

Anonymous 09/15/24(Sun)07:57:31 No.102396421

>>102396361
point is, look how little difference it makes. you really think that's enough difference to put it below models of MUCH smaller size?

Anonymous
09/15/24(Sun)07:58:01 No.102396425

Anonymous 09/15/24(Sun)07:58:01 No.102396425

>>102396046
I would like to, the OP template only has 3 free characters.
I skimmed through them just now and they all seem to be covering different topics. Most of them still seem fine if you ignore the model sections. I don't use rocm so I can't judge that one.
We need someone to volunteer to write a new consolidated getting started guide. Meantime, I might start dropping them one by one if I need more space for the news.

Anonymous
09/15/24(Sun)08:12:57 No.102396549

Anonymous 09/15/24(Sun)08:12:57 No.102396549

>>102396421
There's more than information memorization to model performance. What about attention to detail in context, how well it's capable to draw logical conclusions and extract facts from it, etc? Just calculating how well the model is capable of reproducing some text it's seen many times during pretraining isn't really painting a complete picture of the damage that quantization does.

Anonymous
09/15/24(Sun)08:20:42 No.102396628

Anonymous 09/15/24(Sun)08:20:42 No.102396628

>>102396425
Thank you for considering doing this.
As someone essentialy new to the topic, when i am told repeatedly to READ the instructions and then find out they are the best part of a year old and having a basic understanding that AI changes weekly it's disheartening and not encouraging that the basic noob guide may be starting someone off a year behind everyone else and the "basics" may provide a substandard starting point given the new developments.
Some community guided updates would be immensely useful as the general technical level of the threads is far far above "starter" level.
Thanks again.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.