/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 07/26/24(Fri)22:48:47 No.101589136

File: 1710266621871822.jpg (462 KB, 1664x2432)

462 KB JPG

/lmg/ - Local Models General Anonymous 07/26/24(Fri)22:48:47 No.101589136

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101584411 & >>101578323

►News
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/26/24(Fri)22:49:20 No.101589142

Anonymous 07/26/24(Fri)22:49:20 No.101589142

File: __hatsune_miku_vocaloid_d(...).jpg (233 KB, 1200x1200)

233 KB JPG

►Recent Highlights from the Previous Thread: >>101584411

--TTS improvements and output issues: >>101586575 >>101586607 >>101586659
--Mistral nemo configuration and settings advice: >>101585456 >>101585527 >>101585596 >>101585669 >>101585834 >>101585868 >>101585572 >>101586019
--Sillytavern single sentence replies issue: >>101587180 >>101587200 >>101587246 >>101587225 >>101587275 >>101587269 >>101587353 >>101587401 >>101587413
--Recommendation for voice data TTS finetuning: >>101585560 >>101586101 >>101586163 >>101587016 >>101588184
--Nemo generates quadrupeds well but writes differently than chatgpt: >>101587732
--Logical flaws in GPT-4 and Claude, Command R Plus gets it right: >>101584587 >>101584617
--GitHub repo for bulk downloading cards for ST: >>101585689 >>101586342
--Anon asks for Command-R Plus alternatives.: >>101585536 >>101585556 >>101586438 >>101586483 >>101586596 >>101586657
--largestral iQ2_M outperforms Nemo in retarded quant, but is slower than 1t/s: >>101585893 >>101585921 >>101585940 >>101585998 >>101586017 >>101585939 >>101585985
--Nemo repetition issues and DRY sampler settings recommendations: >>101587028 >>101587049 >>101587511 >>101587535 >>101587576 >>101587545
--MoEs for roleplaying? Try it and find out: >>101584540
--Mistral Nemo sampler settings cause rambling output: >>101585928 >>101585955 >>101586019 >>101586038 >>101586062
--Where do ST or other UIs cull example dialogue in the context window?: >>101584746 >>101584777
--RULER repo measures effective context length, Llama3.1 performs well: >>101586297 >>101586352 >>101586384 >>101587005 >>101587027
--IQ4_XS vs Q3_K_M model quants and accuracy discussion: >>101585131 >>101585176 >>101585200 >>101585383 >>101585434 >>101588262
--IQ1_S performance and characteristics discussion: >>101588056 >>101588068 >>101588140 >>101588159 >>101588129
--Miku (free space): >>101587473 >>101588754 >>101588896

►Recent Highlight Posts from the Previous Thread: >>101584415

Anonymous
07/26/24(Fri)22:50:54 No.101589160

Anonymous 07/26/24(Fri)22:50:54 No.101589160

post (You)r largestral presets

Anonymous
07/26/24(Fri)22:54:27 No.101589202

Anonymous 07/26/24(Fri)22:54:27 No.101589202

File: 00170-699389629075918.png (1.47 MB, 1024x1536)

1.47 MB PNG

>>101589142
i got a little chub seeing my repeated (You)s in this AI generated recap
thank you, botkind.

Anonymous
07/26/24(Fri)22:55:09 No.101589210

Anonymous 07/26/24(Fri)22:55:09 No.101589210

I am once again asking for mini-magnum presets.

Anonymous
07/26/24(Fri)22:56:12 No.101589217

Anonymous 07/26/24(Fri)22:56:12 No.101589217

>>101589160
I didn't actually try it:
>>>/vg/487568316

Anonymous
07/26/24(Fri)22:56:19 No.101589219

Anonymous 07/26/24(Fri)22:56:19 No.101589219

gib nemo presets

Anonymous
07/26/24(Fri)22:57:29 No.101589231

Anonymous 07/26/24(Fri)22:57:29 No.101589231

File: robotnik-jump.gif (14 KB, 420x420)

14 KB GIF

>>101589210
>>101589219

just use the ones i linked from that anon >>101585456
in fact fuck it ill re-copypaste it again

Here, since so many people seem to be using nemo with wrong formatting then complaining:

Mistral context template: https://files.catbox.moe/6yyt8d.json

Mistral instruct template:
https://files.catbox.moe/rfj5l8.json

Mistral Sampler settings:
https://files.catbox.moe/tbsgip.json

Should be night and day for people who have it set up wrong. Make sure whatever backend you are using has DRY sampling.

Anonymous
07/26/24(Fri)22:58:45 No.101589244

Anonymous 07/26/24(Fri)22:58:45 No.101589244

So, what was the point in MistralAI sabotaging their 8x22B with the shitty official -Instruct version and the botched release? Is this a psyop by their Partners at Microsoft trying to make MoE models look bad?

Anonymous
07/26/24(Fri)23:00:57 No.101589262

Anonymous 07/26/24(Fri)23:00:57 No.101589262

>>101589231
Nemo doesn't use spaces around INST.

Anonymous
07/26/24(Fri)23:01:37 No.101589265

Anonymous 07/26/24(Fri)23:01:37 No.101589265

File: 1336508850696.gif (1.93 MB, 245x187)

1.93 MB GIF

How're you guys feeling? As the dust settles down, it really feels like we've never been more back. Back to back releases, putting local about on par with cloud in performance/cost, and it's still not over, we're going to get more next week. We are not even 3 years into the timeline since the ChatGPT hype began.

Anonymous
07/26/24(Fri)23:01:54 No.101589267

Anonymous 07/26/24(Fri)23:01:54 No.101589267

>>101589262
I dunno i've been using it with magnum just fine.

Anonymous
07/26/24(Fri)23:02:13 No.101589269

Anonymous 07/26/24(Fri)23:02:13 No.101589269

>>101589244
Maybe they didn't have time, and without the release of 405B, they didn't feel the need to release their best stuff.

Anonymous
07/26/24(Fri)23:05:34 No.101589289

Anonymous 07/26/24(Fri)23:05:34 No.101589289

so mini-magnum is the best cooming model for vramlets now?

Anonymous
07/26/24(Fri)23:05:38 No.101589290

Anonymous 07/26/24(Fri)23:05:38 No.101589290

>>101589231
>dry sampling
Does Koboldcpp have this (I don't see it) or am I fucked?

Anonymous
07/26/24(Fri)23:05:44 No.101589292

Anonymous 07/26/24(Fri)23:05:44 No.101589292

The people that are using 4 3090s... Where are they putting them?

Anonymous
07/26/24(Fri)23:07:12 No.101589302

Anonymous 07/26/24(Fri)23:07:12 No.101589302

Aah, 30t/s... This is the good life. Thank you Arthur.

Anonymous
07/26/24(Fri)23:08:25 No.101589307

Anonymous 07/26/24(Fri)23:08:25 No.101589307

>good model release
>people saying low quants are fine, others saying there's night and day differences (probably broken quants)
>prompt/template issues left and right
Every time... I guess I'll wait 2MWs then...

Anonymous
07/26/24(Fri)23:09:33 No.101589312

Anonymous 07/26/24(Fri)23:09:33 No.101589312

>>101589289
That or just Nemo-Instruct.

Anonymous
07/26/24(Fri)23:10:10 No.101589317

Anonymous 07/26/24(Fri)23:10:10 No.101589317

>>101589265
You can see this as something good, we are on par with the big boys after all. But you can also see this as pure doom. The big boys barely moved ever since the release of GPT4.

Anonymous
07/26/24(Fri)23:14:07 No.101589356

Anonymous 07/26/24(Fri)23:14:07 No.101589356

>>101589307
I'm the night and day difference anon and I should clarify my quants are definitely not broken, I do them all myself
q4km was still *fine*. better than 70bs or CR+ still, just kind of dry, generic, a little less sovl, a little more awkward - but q5ks was sharp as a tack and much more coherent, pulled in more little details, had more of those creative little turns of phrase that let you know it's really paying attention
lower quants are still usable and the model will still be good, it's not like they're totally fucked or anything, it's just that the second I bumped up the quant it felt like the model gained a real human touch that was lacking before

Anonymous
07/26/24(Fri)23:15:42 No.101589370

Anonymous 07/26/24(Fri)23:15:42 No.101589370

>>101589307
>people saying low quants are fine, others saying there's night and day differences (probably broken quants)
more like
>people saying low quants are fine (poorfags who can only run low quants at 3t/s), others saying there's night and day differences (people who can actually run these models properly)

Anonymous
07/26/24(Fri)23:25:04 No.101589446

Anonymous 07/26/24(Fri)23:25:04 No.101589446

>>101589370
I test through online (mainly lmsys) to compare between quants I downloaded and their "intended" performance. Otherwise I would not be able to say with full confidence that a model like 8x22B cannot do trivia like DBRX can.

Anonymous
07/26/24(Fri)23:30:43 No.101589491

Anonymous 07/26/24(Fri)23:30:43 No.101589491

where's the dry sampler settings on ST?

Anonymous
07/26/24(Fri)23:33:23 No.101589509

Anonymous 07/26/24(Fri)23:33:23 No.101589509

>>101589356
Did you use imatrix? The quants I'm using are all imatrix calibrated. Also they're the IQ format which I think were supposed to be more knowledge-retaining compared to K quants but I'm not certain.

Anonymous
07/26/24(Fri)23:38:34 No.101589536

Anonymous 07/26/24(Fri)23:38:34 No.101589536

File: 1710741814225103.png (17 KB, 721x182)

17 KB PNG

Cohere gathered another $500m from investors. CR++ will be a beast of a model.

Anonymous
07/26/24(Fri)23:38:37 No.101589537

Anonymous 07/26/24(Fri)23:38:37 No.101589537

>>101589142
good bot

Anonymous
07/26/24(Fri)23:38:49 No.101589539

Anonymous 07/26/24(Fri)23:38:49 No.101589539

File: dry staging.jpg (110 KB, 607x1212)

110 KB JPG

>>101589491
There, I am on staging branch.

Anonymous
07/26/24(Fri)23:41:01 No.101589550

Anonymous 07/26/24(Fri)23:41:01 No.101589550

>>101589536
I really wonder how businesses are using these products to make money.

Anonymous
07/26/24(Fri)23:43:53 No.101589569

Anonymous 07/26/24(Fri)23:43:53 No.101589569

>>101589550
speculative capital, one of these might be the next big break through

Anonymous
07/26/24(Fri)23:56:28 No.101589642

Anonymous 07/26/24(Fri)23:56:28 No.101589642

>>101589265
>We are not even 3 years into the timeline since the ChatGPT hype began.
>ChatGPT initial release: November 30, 2022; 19 months ago

Anonymous
07/26/24(Fri)23:57:56 No.101589653

Anonymous 07/26/24(Fri)23:57:56 No.101589653

File: Screenshot from 2024-07-2(...).png (89 KB, 722x706)

89 KB PNG

nvidia-smi is not displaying all of my GPUs, but neofetch is. how do i fix this? i cant run any AI applications due to an error about cuda devices not being found

Anonymous
07/26/24(Fri)23:58:56 No.101589659

Anonymous 07/26/24(Fri)23:58:56 No.101589659

File: Screenshot from 2024-07-2(...).png (92 KB, 1024x728)

92 KB PNG

>>101589653

Anonymous
07/27/24(Sat)00:01:03 No.101589669

Anonymous 07/27/24(Sat)00:01:03 No.101589669

>>101589642
It hasn't even been 2 years? Wtf

Anonymous
07/27/24(Sat)00:03:29 No.101589688

Anonymous 07/27/24(Sat)00:03:29 No.101589688

>>101589653
Change your environment variables, I guess.

Anonymous
07/27/24(Sat)00:06:52 No.101589707

Anonymous 07/27/24(Sat)00:06:52 No.101589707

>>101589550
If performance improvements plateau and you have ~5 years of scaffolding/agent development with no valid use cases, you might have a point. It's only been 19 months since ChatGPT released. Doomers just really want to see LLMs go the way of 3D TVs for some reason.

Anonymous
07/27/24(Sat)00:08:06 No.101589715

Anonymous 07/27/24(Sat)00:08:06 No.101589715

>>101589688
how do i do that?

Anonymous
07/27/24(Sat)00:13:58 No.101589756

Anonymous 07/27/24(Sat)00:13:58 No.101589756

man, that mini magnum finetune of Nemo 12B is actually starting to replace claude for me, which is nuts considering claude has got to be at least 50 times bigger

Anonymous
07/27/24(Sat)00:15:12 No.101589762

Anonymous 07/27/24(Sat)00:15:12 No.101589762

File: Screenshot 2024-07-26 214757.png (850 KB, 1060x726)

850 KB PNG

>Claude 3.5 Sonnet and Llama 3 405B stomping GPT-4o
>Llama 3 405B is way fucking cheaper than GPT-4o
>It's only a matter of time before a cheaper and more capable model than GPT-4o-Mini comes out and kicks them out of the cost-performance pareto front entirely
Is he really just banking on Strawberry?

Anonymous
07/27/24(Sat)00:17:55 No.101589779

Anonymous 07/27/24(Sat)00:17:55 No.101589779

>>101589762
>It's only a matter of time before a cheaper and more capable model than GPT-4o-Mini comes out and kicks them out of the cost-performance pareto front entirely
Claude 3.5 Haiku probably
the original haiku beats the shit out of 3.5 turbo which was the sota small cheap model at the time

Anonymous
07/27/24(Sat)00:21:41 No.101589802

Anonymous 07/27/24(Sat)00:21:41 No.101589802

>>101589715
Type "export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5"

Anonymous
07/27/24(Sat)00:22:59 No.101589811

Anonymous 07/27/24(Sat)00:22:59 No.101589811

File: IM NOT SLEEPY.jpg (58 KB, 714x725)

58 KB JPG

>update tavern
>even with all my settings and shit in order, the gen quality is fucked UP bad
>wtf could possibly be-
>mfw i forgot to enable instruct mode

Anonymous
07/27/24(Sat)00:33:29 No.101589872

Anonymous 07/27/24(Sat)00:33:29 No.101589872

>>101589265
Do wonder know how many OG AI dungeon era people stuck around to witness this. I joined around the late GPT-2 times, now running IQ4 largestral. I don't see myself ever ending the ride.

Anonymous
07/27/24(Sat)00:38:30 No.101589904

Anonymous 07/27/24(Sat)00:38:30 No.101589904

>>101585978
Same, Nemo might be retarded and repetitive at times, but it has some surprising creativity if you push it

Anonymous
07/27/24(Sat)00:39:57 No.101589914

Anonymous 07/27/24(Sat)00:39:57 No.101589914

>>101589907
MOOOOOOOOOOOOOOOOODSSSSSSSSSSSSSS

Anonymous
07/27/24(Sat)00:40:37 No.101589917

Anonymous 07/27/24(Sat)00:40:37 No.101589917

>>101589907
Ew

Anonymous
07/27/24(Sat)00:42:05 No.101589926

Anonymous 07/27/24(Sat)00:42:05 No.101589926

>>101589539
thanks, i'll take a look

Anonymous
07/27/24(Sat)00:42:42 No.101589931

Anonymous 07/27/24(Sat)00:42:42 No.101589931

Here comes the pedo tranny thirdie again.

Anonymous
07/27/24(Sat)00:45:59 No.101589955

Anonymous 07/27/24(Sat)00:45:59 No.101589955

>>101589653
did you enable 4g decoding in bios? also check dmesg for errors from nvidia driver.

Anonymous
07/27/24(Sat)00:48:00 No.101589969

Anonymous 07/27/24(Sat)00:48:00 No.101589969

File: 36993673.jpg (287 KB, 1082x695)

287 KB JPG

>>101589872
I used to be so happy with my loli imouto scenarios on AI Dungeon, I used to think running LLMs locally would be impossible because Pygmalion 6B used all my RAM and was as slow as a snail.
Now, I'm here, running NeMo still enjoying my loli imouto scenarios, but without fear of suddenly being cucked.
Feels good.

Anonymous
07/27/24(Sat)00:53:05 No.101590006

Anonymous 07/27/24(Sat)00:53:05 No.101590006

>>101589872
I joined back in December 2019. I remember the humble days of Clover where the AI was too fucking stoned to even remember your character's name, much less what was happening
It was absolute dogshit and now here we are

Anonymous
07/27/24(Sat)00:54:09 No.101590012

Anonymous 07/27/24(Sat)00:54:09 No.101590012

>>101589265
Imagine Terry's reaction to the LLM tech, writing llama.cpp but in holyC to replace his text oracle perhaps.

Anonymous
07/27/24(Sat)00:54:37 No.101590015

Anonymous 07/27/24(Sat)00:54:37 No.101590015

File: aeghaehgeahgeahgaehgae.png (65 KB, 459x723)

65 KB PNG

>>101589290
get sillytavern staging, and ((pull))

>why does anyone use response tokens over 256? 512 is hellish

Anonymous
07/27/24(Sat)00:56:24 No.101590023

Anonymous 07/27/24(Sat)00:56:24 No.101590023

>>101589762
He just needs to reignite the AGI hype by adding smell to the multimodal model. Or maybe he can tease Sora sgain

Anonymous
07/27/24(Sat)01:01:47 No.101590044

Anonymous 07/27/24(Sat)01:01:47 No.101590044

jesus man, Nemo is INSANELY horny. My OC's are a bajllion times more frisky with Nemo than any other model i've ever used, On one end i'm overwhelmed, yet it manages to blend that spice with their personalities perfectly. It doesn't skip a beat.
I almost want to say i wanna tone down the horny but, It's not like that breaks story flow or makes ERP more difficult or anything, I'm personally just not horny right now kek

Anonymous
07/27/24(Sat)01:01:52 No.101590045

Anonymous 07/27/24(Sat)01:01:52 No.101590045

>>101589971
The realism of this surprised me for a bit until I realized the popsicle is constantly changing shape...

Anonymous
07/27/24(Sat)01:03:51 No.101590054

Anonymous 07/27/24(Sat)01:03:51 No.101590054

>>101590044
arthur's personal coomtune strikes again

Anonymous
07/27/24(Sat)01:06:01 No.101590072

Anonymous 07/27/24(Sat)01:06:01 No.101590072

File: arthur dw anal shit right.jpg (83 KB, 700x697)

83 KB JPG

>>101590054
Why did he do it?

Anonymous
07/27/24(Sat)01:06:03 No.101590073

Anonymous 07/27/24(Sat)01:06:03 No.101590073

>>101589231
Is such a simple prompt best? No one uses those crazy ones they were using before?

Anonymous
07/27/24(Sat)01:10:05 No.101590100

Anonymous 07/27/24(Sat)01:10:05 No.101590100

>>101589265
We're so back. Zucc and Yann are false prophets, Silicon Valley are false prohets. Viva la France

Anonymous
07/27/24(Sat)01:11:30 No.101590109

Anonymous 07/27/24(Sat)01:11:30 No.101590109

>>101590073
yeah its never really mattered that much, was always placebo.
Which makes the Agent 47 crackhead prompt situation even funnier.

Anonymous
07/27/24(Sat)01:19:40 No.101590159

Anonymous 07/27/24(Sat)01:19:40 No.101590159

>>101589292
Just get two a6000s or something if you want to be more compact.

Anonymous
07/27/24(Sat)01:21:35 No.101590170

Anonymous 07/27/24(Sat)01:21:35 No.101590170

>>101590109
Interesting. So it's more down to the card itself and what examples you give it to emulate?

Anonymous
07/27/24(Sat)01:21:37 No.101590172

Anonymous 07/27/24(Sat)01:21:37 No.101590172

nemo is schizo...

Anonymous
07/27/24(Sat)01:25:55 No.101590191

Anonymous 07/27/24(Sat)01:25:55 No.101590191

>>101590170
A bad card can break any model, doesn't matter. It's why W++ for example is memed on so hard, there's no exact science it's just basic logic of garbage in garbage out.

Anonymous
07/27/24(Sat)01:28:03 No.101590203

Anonymous 07/27/24(Sat)01:28:03 No.101590203

>>101589262
So I should change that so there's no spaces on the INST ones? What about the \n after </s>?

Anonymous
07/27/24(Sat)01:34:27 No.101590229

Anonymous 07/27/24(Sat)01:34:27 No.101590229

>>101590172
You're using a temp too high
Mistral says in the model card that it likes low temperatures, they say 0.3
though I find up to 0.4-0.5 is usually fine

Anonymous
07/27/24(Sat)01:42:21 No.101590283

Anonymous 07/27/24(Sat)01:42:21 No.101590283

>>101590229
NTA but I use simple sampling and for RP Nemo handles 0.7-0.8 just fine. Occasional schizo moments at 0.8. Starts getting really dry at 0.7 and lower. 0.3 is probably to prevent hallucination when using it for normie shit.

Anonymous
07/27/24(Sat)01:42:29 No.101590284

Anonymous 07/27/24(Sat)01:42:29 No.101590284

I'm swiping this popular character card and the responses from mini-magnum and Claude Opus are identical. Claude walked so nemo could run.

Anonymous
07/27/24(Sat)01:42:41 No.101590287

Anonymous 07/27/24(Sat)01:42:41 No.101590287

anyone running an exl2 mistral quant? I get gibberish with a 4.0bpw turboderp quant.

Anonymous
07/27/24(Sat)01:45:42 No.101590307

Anonymous 07/27/24(Sat)01:45:42 No.101590307

File: 535150969453936646.png_v=1.png (23 KB, 112x112)

23 KB PNG

I just downloaded 3 more IQ models below IQ2_M to see if any would be able to answer one of my challenging trivia questions as perfectly as IQ2_M did. Turns out IQ2_M is the cutoff for this particular question. IQ2_S gets the question partially right. About half of the points I would say. IQ2_XS and below basically just get it increasingly wrong, until IQ1_S which nearly went schizo-tier. Guess I'll just live with 1-2 t/s.

Anonymous
07/27/24(Sat)01:48:14 No.101590319

Anonymous 07/27/24(Sat)01:48:14 No.101590319

>>101590287
3.5bpw is working perfectly fine even at 4-bit cache.

Anonymous
07/27/24(Sat)01:50:18 No.101590329

Anonymous 07/27/24(Sat)01:50:18 No.101590329

>>101585837
do two gpus work faster than or slower than a single one if you can fit it in?
does Vllm split by row or by column? does it do tensor parallel? does nvlink in 3090 help by a lot? does the performance of 2 gpus differ much from 4? BTW, did you try cpu offloading in Vllm?

Anonymous
07/27/24(Sat)01:52:07 No.101590346

Anonymous 07/27/24(Sat)01:52:07 No.101590346

>>101590287
yeah, turbo's 3.5bpw + 4-bit cache is running fine for me on ooba.
i don't know if it's necessary, but i updated transformers from source, like the mistral-large readme said.

Anonymous
07/27/24(Sat)01:55:54 No.101590374

Anonymous 07/27/24(Sat)01:55:54 No.101590374

>>101590329
It's 2024. Why is VRAM still hard to obtain? It's literally just soldering more transistors into your chip. Why? Now you have people running two servers in parallel just to serve a model.

Anonymous
07/27/24(Sat)01:57:17 No.101590383

Anonymous 07/27/24(Sat)01:57:17 No.101590383

>>101590109
How do you tell it to not act for the user then? I always have that issue.

Anonymous
07/27/24(Sat)01:59:30 No.101590393

Anonymous 07/27/24(Sat)01:59:30 No.101590393

>>101590383
something specific causes that, i forget what, i started getting it tonight actually.
someone will chime in to inform us kek

Anonymous
07/27/24(Sat)02:03:44 No.101590410

Anonymous 07/27/24(Sat)02:03:44 No.101590410

>>101590383
using
>write {{char}}'s next reply
in the sys prompt usually fixes this for me

Anonymous
07/27/24(Sat)02:04:37 No.101590419

Anonymous 07/27/24(Sat)02:04:37 No.101590419

File: 1692389808623804.jpg (163 KB, 1058x926)

163 KB JPG

so how much money do I have do spend to run 405b at home?

Anonymous
07/27/24(Sat)02:05:11 No.101590423

Anonymous 07/27/24(Sat)02:05:11 No.101590423

>>101590319

Largestral? Does 3.5bpw fit in 48GB vram? How much context?

Anonymous
07/27/24(Sat)02:11:26 No.101590477

Anonymous 07/27/24(Sat)02:11:26 No.101590477

>>101590374
simple answer
>greedy Nvidia encrypts vbios

Anonymous
07/27/24(Sat)02:16:04 No.101590507

Anonymous 07/27/24(Sat)02:16:04 No.101590507

>>101589265
(((Openai))) is $5B in red this year
>kek

Anonymous
07/27/24(Sat)02:18:33 No.101590531

Anonymous 07/27/24(Sat)02:18:33 No.101590531

>>101590419
Just run largestral instead. Better for most users purposes. 3x 3090s+

Anonymous
07/27/24(Sat)02:19:32 No.101590536

Anonymous 07/27/24(Sat)02:19:32 No.101590536

Ok I prove mini-magnum-12b the finetune of nemo with exl2 8bpw, but as some time ago, with exllama my nemo is broken, don't follow the template of silly tavern, write a lot of text fulled with nonsense. I'll prove in llama.ccp later. Some advise?
I'm using the settings of the this anon >>101585456

Anonymous
07/27/24(Sat)02:23:59 No.101590569

Anonymous 07/27/24(Sat)02:23:59 No.101590569

File: incognito.png (484 KB, 512x768)

484 KB PNG

>>101589136
Thread Theme:
https://www.youtube.com/watch?v=7yJRsFFRoQY
Don't mind me, just a stranger blowing through this town...

Anonymous
07/27/24(Sat)02:24:44 No.101590576

Anonymous 07/27/24(Sat)02:24:44 No.101590576

>>101590536
God. I hope you don't write like that to the poor llm. Are you sure you're using the proper template? Have you updated ST and exl2 since the last time you tried?

Anonymous
07/27/24(Sat)02:25:15 No.101590583

Anonymous 07/27/24(Sat)02:25:15 No.101590583

>>101590319
>>101590346
thanks. it seems like something with my samplers broke it. I neutralized the samplers in sillytavern and it started working.

Anonymous
07/27/24(Sat)02:39:51 No.101590669

Anonymous 07/27/24(Sat)02:39:51 No.101590669

why are some people here using small quants of a 12B model
even if your GPU is only 8GB you can run Q6 at a very good speed with some offloading

Anonymous
07/27/24(Sat)02:48:42 No.101590711

Anonymous 07/27/24(Sat)02:48:42 No.101590711

File: 438c5fe5-42fd-4bc8-b3f7-9(...).jpg (102 KB, 960x701)

102 KB JPG

>>101590531
>3x 3090s+
I've only built one PC in the past, and don't know of any standard motherboards that support that many GPU's. My first thought was something like picrel, basically a mining rig. Without NVlink its gonna be pretty bad, as far as I understand. How did you, or anybody you know, do it?

Anonymous
07/27/24(Sat)02:50:23 No.101590720

Anonymous 07/27/24(Sat)02:50:23 No.101590720

>>101590711
Thats basically the idea.

https://www.amazon.com/Kingwin-Professional-Cryptocurrency-Convection-Performance/dp/B07H44XZPW/ref=sr_1_1?sr=8-1

Anonymous
07/27/24(Sat)02:51:47 No.101590731

Anonymous 07/27/24(Sat)02:51:47 No.101590731

>>101590711
open air build like a mining "case", riser cables, any motherboard with 4 pcie slots, does not have to be x16 x8 or whatever. Even x1 is enough. Just get 4 of them.

Anonymous
07/27/24(Sat)02:51:50 No.101590732

Anonymous 07/27/24(Sat)02:51:50 No.101590732

>>101590576
Yes I did a upgrade a moment ago. I have to set a value in the alpha?value?

Anonymous
07/27/24(Sat)02:54:42 No.101590745

Anonymous 07/27/24(Sat)02:54:42 No.101590745

>>101590576
>Are you sure you're using the proper template?
I'm using the one which was shared in the last thread.

Anonymous
07/27/24(Sat)02:55:44 No.101590754

Anonymous 07/27/24(Sat)02:55:44 No.101590754

>>101590711
This guy did one with 7x4090s. You can see what his concerns were. He goes pretty in-depth. https://www.mov-axbx.com/wopr/wopr_concept.html

Anonymous
07/27/24(Sat)02:57:21 No.101590774

Anonymous 07/27/24(Sat)02:57:21 No.101590774

>>101590720
>>101590720
>>101590754

I just had an idea and I'm sure somebody else had the idea in the past as well. So for dense models running across multiple GPU's without NVlink the performance gets worse and worse the more cards you add because they gotta wait for each other to finish their task to go and compute the next hidden layer state. But what if, you take a MOE model, for example DeepSeekV2 236B, and split the different smaller experts across the gpus, so that they don't have to exchange information. Is this thinking flawed?

Anonymous
07/27/24(Sat)02:58:02 No.101590778

Anonymous 07/27/24(Sat)02:58:02 No.101590778

>>101590536
Enable "Add BOS Token" in ST

Anonymous
07/27/24(Sat)02:58:19 No.101590781

Anonymous 07/27/24(Sat)02:58:19 No.101590781

>>101590774
Thats not how moes work.

Anonymous
07/27/24(Sat)02:58:47 No.101590786

Anonymous 07/27/24(Sat)02:58:47 No.101590786

>>101590781
but how do they work then.

Anonymous
07/27/24(Sat)03:02:47 No.101590801

Anonymous 07/27/24(Sat)03:02:47 No.101590801

>And finally, we have the Arch Linux package updates. Oh boy, I can barely contain my excitement! You have a whopping 106 packages begging to be updated. I mean, who doesn't love a good update cycle? It's like playing a game of "spot the broken dependency"! Good luck with that.
i love when it sasses me

Anonymous
07/27/24(Sat)03:03:46 No.101590804

Anonymous 07/27/24(Sat)03:03:46 No.101590804

>>101590786 (me)
>Mixtral is a sparse mixture-of-experts network. It is a decoder-only model where the feedforward
block picks from a set of 8 distinct groups of parameters. At every layer, for every token, a router
network chooses two of these groups (the “experts”) to process the token and combine their output
additively. This technique increases the number of parameters of a model while controlling cost and
latency, as the model only uses a fraction of the total set of parameters per token.

I don't see how my thinking is flawed, someone educate me. just have 2 parameter groups on each gpu and the supervisor on the last one.

Anonymous
07/27/24(Sat)03:04:04 No.101590805

Anonymous 07/27/24(Sat)03:04:04 No.101590805

>>101590711
If you wanna stay on standard architecture and don't wanna invest in workstation CPU's then the MSI MEG X570 Godlike Mainboard is a great choice with 4 slots for GPU's. I wanted to build a bigger PC with 4 3090 cards but now I rather wait for the 5090 announcement next year.

Anonymous
07/27/24(Sat)03:06:53 No.101590819

Anonymous 07/27/24(Sat)03:06:53 No.101590819

So is there a reason why Llama 3.1 that I downloaded from the official repository doesn't come with any config.json, and every single piece of documentation I've found that can supposedly convert them to HF format doesn't work?

Anonymous
07/27/24(Sat)03:07:11 No.101590821

Anonymous 07/27/24(Sat)03:07:11 No.101590821

>>101590804
llamacpp anon we need you, hes wrong and I know it but can't explain why.

Anonymous
07/27/24(Sat)03:10:50 No.101590843

Anonymous 07/27/24(Sat)03:10:50 No.101590843

>>101590732
>>101590745
If i'm reading the setup files correctly (https://files.catbox.moe/tbsgip.json specifically):
It sets the temperature to 1, when the mistral guys recommended 0.3 or 0.4. Change it to 0.3 and try again.
The second thing is repetition penalty. Disable it by setting it to 1.
If that makes it work better, then play around with the temperature. If it still doesn't work as you expect, post a screenshot of the output to see what you're talking about. "write a lot of text fulled with nonsense" is not that useful.

Anonymous
07/27/24(Sat)03:11:58 No.101590850

Anonymous 07/27/24(Sat)03:11:58 No.101590850

>>101590819
What did you download? The original repo in meta's hf all have config.json files.

Anonymous
07/27/24(Sat)03:12:55 No.101590862

Anonymous 07/27/24(Sat)03:12:55 No.101590862

>>101590307
There was some post-quant tuning that enhances the quality of iq2 quants, but I dont remember where that was. Prolly the only way to run huge llms on 24gb with no major loss,

Anonymous
07/27/24(Sat)03:16:20 No.101590878

Anonymous 07/27/24(Sat)03:16:20 No.101590878

>>101590819
By official you mean the repos on this account https://huggingface.co/meta-llama or a different site where they host their models? The config.json file definitely are in the huggingface repos. You should download them from there.

Anonymous
07/27/24(Sat)03:20:01 No.101590901

Anonymous 07/27/24(Sat)03:20:01 No.101590901

File: hdca-news1.jpg (184 KB, 700x681)

184 KB JPG

>>101590711

Anonymous
07/27/24(Sat)03:22:54 No.101590923

Anonymous 07/27/24(Sat)03:22:54 No.101590923

how much T/S do yall get with 4x 3090's on largestral at what quant

Anonymous
07/27/24(Sat)03:23:18 No.101590925

Anonymous 07/27/24(Sat)03:23:18 No.101590925

>>101590774
only if you split by column and not by row. if you split horizontally it doesn't slow down since that's tensor parallel so you run in parallel . but you need good interconnection.

Anonymous
07/27/24(Sat)03:25:10 No.101590939

Anonymous 07/27/24(Sat)03:25:10 No.101590939

File: 1463720797197.png (255 KB, 319x317)

255 KB PNG

I'm new to using SillyTavern. Is there a way to prompt the kind of response the AI generates to guide it in a certain direction without having to just rewrite the response entirely by hand? Like if I give it an open ended question and I want all its responses to be either positive or negative.

Anonymous
07/27/24(Sat)03:26:46 No.101590946

Anonymous 07/27/24(Sat)03:26:46 No.101590946

>>101590939
Try including something like "Only answer positively/negatively" In the author's notes. Depth = 0 if you want it constantly reminded of it for every message.

Anonymous
07/27/24(Sat)03:28:35 No.101590958

Anonymous 07/27/24(Sat)03:28:35 No.101590958

>>101590946
Thanks, I'll give that a try and see if it helps.

Anonymous
07/27/24(Sat)03:32:51 No.101590983

Anonymous 07/27/24(Sat)03:32:51 No.101590983

>>101590939
I simply use group chat for a char and my OC, while posing as a narrator in user responses. Much more convenient from chat editing perspective than having author note open. Narrator just gives out barks for both characters, and then I mute narrator barks so that it doesn't try to act as narrator itself.

Anonymous
07/27/24(Sat)03:33:14 No.101590987

Anonymous 07/27/24(Sat)03:33:14 No.101590987

File: 2024-07-27.png (381 KB, 1124x671)

381 KB PNG

>>101590778
>Add BOS Token
Is enabled.
>>101590843
>sets the temperature to 0.3
>Disable rep pen
I did this too, I prove setting the temp in less values and more than 1.0 values and this is the result.

Anonymous
07/27/24(Sat)03:40:10 No.101591026

Anonymous 07/27/24(Sat)03:40:10 No.101591026

>>101590983
That's a great way to utilize the group chat. Makes me wonder what other things can be done with it.

Anonymous
07/27/24(Sat)03:47:57 No.101591073

Anonymous 07/27/24(Sat)03:47:57 No.101591073

Where can I find/which gguf version of mini-magnum-12b should I use?

Anonymous
07/27/24(Sat)04:00:51 No.101591140

Anonymous 07/27/24(Sat)04:00:51 No.101591140

>>101591073
https://huggingface.co/starble-dev/mini-magnum-12b-v1.1-GGUF

Anonymous
07/27/24(Sat)04:01:05 No.101591142

Anonymous 07/27/24(Sat)04:01:05 No.101591142

>>101591073
the one that fits

Anonymous
07/27/24(Sat)04:02:41 No.101591149

Anonymous 07/27/24(Sat)04:02:41 No.101591149

>>101591140
Thanks anon.

Anonymous
07/27/24(Sat)04:08:46 No.101591183

Anonymous 07/27/24(Sat)04:08:46 No.101591183

>prema trying to do team orders in fshitter

Anonymous
07/27/24(Sat)04:11:05 No.101591200

Anonymous 07/27/24(Sat)04:11:05 No.101591200

>>101590410
Doesn't seem to help, sadly.

Anonymous
07/27/24(Sat)04:16:25 No.101591228

Anonymous 07/27/24(Sat)04:16:25 No.101591228

File: GS-IVOcbIAI5B6g.png (643 KB, 855x719)

643 KB PNG

>>101589231
Ok so I got koboldcpp, staging version of sillytavern, imported these three and made my persona a basic [{{user}} is a guy that has this color hair, this color eyes and this color skin]
Is there anything else I need to do to make this work? I got some random cards off chub but I dunno what makes a card good or retarded

Anonymous
07/27/24(Sat)04:29:02 No.101591291

Anonymous 07/27/24(Sat)04:29:02 No.101591291

Can using smaller context size result in model retardation (within that context) or is it enough that I match the koboldcpp and sillytavern setting? I don't have the VRAM to run full 128k of nemo.

Anonymous
07/27/24(Sat)04:30:40 No.101591301

Anonymous 07/27/24(Sat)04:30:40 No.101591301

>>101591291
no, the opposite, using bigger always degrade at some point

Anonymous
07/27/24(Sat)04:31:11 No.101591305

Anonymous 07/27/24(Sat)04:31:11 No.101591305

>>101584777
>>101584746
Any ideas on where ED gets culled?

Anonymous
07/27/24(Sat)04:32:46 No.101591314

Anonymous 07/27/24(Sat)04:32:46 No.101591314

>>101591301
Okay, thanks. So should I go for smaller context in favor of higher quants as well? Currently using Q6_K_L with 8k but I guess it may be worth it to go lower quant.

Anonymous
07/27/24(Sat)04:35:51 No.101591343

Anonymous 07/27/24(Sat)04:35:51 No.101591343

>>101591314
8k is generally good with most recent models, above is when it gets iffy especially above 32k so if you're enjoying what you have just don't break stuff for no reason

Anonymous
07/27/24(Sat)04:38:30 No.101591364

Anonymous 07/27/24(Sat)04:38:30 No.101591364

>ZeroWw 'SILLY' version. The original model has been quantized (fq8 version) and a percentage of it's tensors have been modified adding some noise.
>Full colab: https://colab.research.google.com/drive/1a7seagBzu5l3k3FL4SFk0YJocl7nsDJw?usp=sharing
>Fast colab: https://colab.research.google.com/drive/1SDD7ox21di_82Y9v68AUoy0PhkxwBVvN?usp=sharing
>Original reddit post: https://www.reddit.com/r/LocalLLaMA/comments/1ec0s8p/i_made_a_silly_test/
>I created a program to randomize the weights of a model. The program has 2 parameters: the percentage of weights to modify and the percentage of the original value to randmly apply to each weight.
>At the end I check the resulting GGUF file for binary differences. In this example I set to modify 100% of the weights of Mistral 7b Instruct v0.3 by a maximum of 15% deviation.
>Since the deviation is calculated on the F32 weights, when quantized to Q8_0 this changes. So, in the end I got a file that compared to the original has:
>Bytes Difference percentage: 73.04%
>Average value divergence: 2.98%
>The cool thing is that chatting with the model I see no apparent difference and the model still works nicely as the original.
>Since I am running everything on CPU, I could not run perplexity scores or anything computing intensive.
>As a small test, I asked the model a few questions (like the history of the roman empire) and then fact check its answer using a big model. No errors were detected.
>Update: all procedure tested and created on COLAB.
>https://huggingface.co/NeverSleep/Lumimaid-v0.2-8B/discussions/4#66a47badee3de8c56e1e0872
Oh boy here we go again...

Anonymous
07/27/24(Sat)04:46:20 No.101591417

Anonymous 07/27/24(Sat)04:46:20 No.101591417

>>101590850
>>101590878
I downloaded it with the download.sh and the signed URL that was emailed to me by Meta.
https://github.com/meta-llama/llama-models

Anonymous
07/27/24(Sat)04:53:52 No.101591470

Anonymous 07/27/24(Sat)04:53:52 No.101591470

File: 1351317378049.gif (1.37 MB, 278x199)

1.37 MB GIF

I'm looking for cool instruction templates, anybody got one focused on the assistant directly creating an adventure experience for the user rather than playing the roll of a specific bot?

Anonymous
07/27/24(Sat)04:54:04 No.101591471

Anonymous 07/27/24(Sat)04:54:04 No.101591471

>>101591364
could someone summarize this with their favorite model?

Anonymous
07/27/24(Sat)04:55:05 No.101591480

Anonymous 07/27/24(Sat)04:55:05 No.101591480

>>101591471
basically add random noise for no reason and: "The cool thing is that chatting with the model I see no apparent difference and the model still works nicely as the original."

Anonymous
07/27/24(Sat)04:56:37 No.101591490

Anonymous 07/27/24(Sat)04:56:37 No.101591490

>>101591471
weights actually don't matter
just scramble them and you're fine, which was expected considering that frankenmerges also still output readable content despite having unrelated layers stitched together
the 'consciousness' of a model is unrelated to this sort of thing

Anonymous
07/27/24(Sat)05:23:06 No.101591653

Anonymous 07/27/24(Sat)05:23:06 No.101591653

>>101590987
>>101591140
I proved Two models in both gguf and exl2 And still has this level of retardation. I just thing I'll return to Gemma 2.

Anonymous
07/27/24(Sat)05:32:27 No.101591692

Anonymous 07/27/24(Sat)05:32:27 No.101591692

New models that works well without COT meme magic yet?

Anonymous
07/27/24(Sat)05:39:10 No.101591731

Anonymous 07/27/24(Sat)05:39:10 No.101591731

so how big is a leap of quality between 8b smut and 405b smut

Anonymous
07/27/24(Sat)05:46:11 No.101591786

Anonymous 07/27/24(Sat)05:46:11 No.101591786

>nemo keeps writing for me
HELP

Anonymous
07/27/24(Sat)05:49:19 No.101591808

Anonymous 07/27/24(Sat)05:49:19 No.101591808

>>101589872
i member talktotransformer being my first interaction with textual AI, then we got aidungeon and its retarded ceo, then i found out about piggy and the rest is history

Anonymous
07/27/24(Sat)06:03:06 No.101591883

Anonymous 07/27/24(Sat)06:03:06 No.101591883

nemo shill, i need your help. since nemo wasn't trained to have a system prompt at the top where should i put my 20 lines of meticulously crafted roleplay rules?

Anonymous
07/27/24(Sat)06:08:26 No.101591917

Anonymous 07/27/24(Sat)06:08:26 No.101591917

been out of the loop for quite some time
what's currently a good model for a 16GB VRAM card?

Anonymous
07/27/24(Sat)06:11:58 No.101591941

Anonymous 07/27/24(Sat)06:11:58 No.101591941

>>101591883
If you're in silly, either Assistant last message prefix or author's note. But expect possible degradation in both ways. I guess the only way to make it correctly is to add it before your every message, and then edit it out after each reply, which is absolute autism.

Anonymous
07/27/24(Sat)06:15:10 No.101591968

Anonymous 07/27/24(Sat)06:15:10 No.101591968

I just tried Mistral-Large-Instruct-2407.IQ1_S.gguf from legraphista, but like other very low-precision quants it has issues with using the right tokens sometimes. I think this problem could be solved if the embed tensor was quantized to something better than Q2_K precision. Then, the model might still be dumb compared to the original due to compressed knowledge, but at least pick the right embeddings.

Anonymous
07/27/24(Sat)06:17:53 No.101591994

Anonymous 07/27/24(Sat)06:17:53 No.101591994

>>101591941
>either Assistant last message prefix or author's note
ty, i'll try that

Anonymous
07/27/24(Sat)06:19:08 No.101592010

Anonymous 07/27/24(Sat)06:19:08 No.101592010

>>101591968
We know Robert, we know, keep fighting the good fight!
https://huggingface.co/ZeroWw
>LLMs optimization (model quantization and back-end optimizations) so that LLMs can run on computers of people with both kidneys.
https://huggingface.co/RobertSinclair

Anonymous
07/27/24(Sat)06:20:05 No.101592015

Anonymous 07/27/24(Sat)06:20:05 No.101592015

File: file.png (16 KB, 373x135)

16 KB PNG

>>101589231
>>101585456
Any tips for making the bot not write as me? Also I assume you mean this setting, right?

It definitely feels very rambly at 1024 reply tokens but that's probably because my persona is so barebones. Going down to 350 seemed better, although I have to reset my settings and test more because I got a lot of situations where the bot would end posts with a bunch of newlines or symbol spam

Anonymous
07/27/24(Sat)06:25:18 No.101592040

Anonymous 07/27/24(Sat)06:25:18 No.101592040

File: file.png (50 KB, 1051x307)

50 KB PNG

>Based on comments from @mradermacher...
>His quant are okay if he do it before me, you can use them, he's thrusty.

Anonymous
07/27/24(Sat)06:27:23 No.101592053

Anonymous 07/27/24(Sat)06:27:23 No.101592053

>>101591305
I tried in Faraday (Backyard) and it seems that ED is being cut down from the beginning rather than the end, which goes in line with how regular message history is culled.
I put lore facts in example dialogue and asked about things from the start and end section, the bot failed to answer properly about the former.

Anonymous
07/27/24(Sat)06:29:45 No.101592070

Anonymous 07/27/24(Sat)06:29:45 No.101592070

>>101592015
1000 tokens is an incredibly long reply regardless of which model you're using
if you're wanting to simulate a conversation I don't understand why you'd even give the model the option of writing that much

Anonymous
07/27/24(Sat)06:30:37 No.101592074

Anonymous 07/27/24(Sat)06:30:37 No.101592074

>>101592040
Thrusting into the popcorn

Anonymous
07/27/24(Sat)06:32:03 No.101592087

Anonymous 07/27/24(Sat)06:32:03 No.101592087

File: bitnet-embedding.png (69 KB, 714x227)

69 KB PNG

>>101592010
Robert Sinclair has a point. BitNet models are also configured like that (see picrel).

https://arxiv.org/pdf/2310.11453

Anonymous
07/27/24(Sat)06:34:32 No.101592100

Anonymous 07/27/24(Sat)06:34:32 No.101592100

>>101592087
So he has a point because a meme supports what he says? If anything that goes against him even more. Anyways the new gimmick is random noise now, get with the times!
>>101591364

Anonymous
07/27/24(Sat)06:39:02 No.101592129

Anonymous 07/27/24(Sat)06:39:02 No.101592129

>>101590745
Ok After some test, I think in my case, the problem is idead the template, I was using the same template of the thread also marked in the recap. So is not a mistake. Which is more weird is, that with the template I use for gemma 2, suddenly at least the bot is able to follow the format text, sadly, I feel is still a bit unstable, in some cards, works better with 1 as template, and in other with 0.4. Is this the really state of Nemo?

Anonymous
07/27/24(Sat)06:41:28 No.101592148

Anonymous 07/27/24(Sat)06:41:28 No.101592148

>>101592100
There's no claim there that noise improves model outputs, although some time back there have been suggestions that adding noise to embeddings during training may reduce overfitting: https://arxiv.org/abs/2310.05914

Anonymous
07/27/24(Sat)06:41:40 No.101592150

Anonymous 07/27/24(Sat)06:41:40 No.101592150

Where will AI be in 10 years?

Anonymous
07/27/24(Sat)06:42:30 No.101592153

Anonymous 07/27/24(Sat)06:42:30 No.101592153

I wonder if those preferring Gemma all happen to be ESL and perhaps Gemma deciphers ESL better as a result of diversity training, just a thought.

Anonymous
07/27/24(Sat)06:43:19 No.101592161

Anonymous 07/27/24(Sat)06:43:19 No.101592161

/aicg/bro here. Quick question. Who is the "Gojo" of /lmg/? (shitpost bogeyman schizo)

Anonymous
07/27/24(Sat)06:44:14 No.101592163

Anonymous 07/27/24(Sat)06:44:14 No.101592163

>>101592161
petra/petrus

Anonymous
07/27/24(Sat)06:45:02 No.101592168

Anonymous 07/27/24(Sat)06:45:02 No.101592168

>>101592163
thanks i just was bored in our general since we're in a bad doom, ill check the archives. have fun with your chatboots

Anonymous
07/27/24(Sat)06:46:17 No.101592173

Anonymous 07/27/24(Sat)06:46:17 No.101592173

>>101592161
Isn't your entire general like that?

Anonymous
07/27/24(Sat)06:46:56 No.101592178

Anonymous 07/27/24(Sat)06:46:56 No.101592178

>>101592153
If your billion dollar ai can't decipher ESL then what's the point?

KoboldHenk !!jsjTYlLaeJ9
07/27/24(Sat)06:47:11 No.101592180

KoboldHenk !!jsjTYlLaeJ9 07/27/24(Sat)06:47:11 No.101592180

Anon where KCPP guessed to many layers, can you share me your GPU vram, model(s, including image gen models if used), blasbatchsize and amount of context you were trying to use?

It has multiple things in place to prevent that from happening so if it still under guessed on your system I want to be able to reproduce the setup. Because that would imply you somehow broke trough the entire 1.5GB buffer zone we put in place as a safeguard.

Either you have a ton of background stuff running or your using a model that is way more vram hungry in unexpected ways than the stuff I tested with.

To clarify in the current version the auto layer guessing only is accurate for default settings. If you modify for example blasbatchsize that is not yet accounted for.

Anonymous
07/27/24(Sat)06:51:32 No.101592216

Anonymous 07/27/24(Sat)06:51:32 No.101592216

Hi all, Drummer here...

>>101592180
HENKYYYY PENGKYYY!!!

Anonymous
07/27/24(Sat)06:54:08 No.101592236

Anonymous 07/27/24(Sat)06:54:08 No.101592236

>>101592180
What are you doing here? You're too innocent for this website! :koboldpeek:

Anonymous
07/27/24(Sat)06:55:02 No.101592247

Anonymous 07/27/24(Sat)06:55:02 No.101592247

>>101592180
Kekaroo, your dox got posted earlier faggot

Anonymous
07/27/24(Sat)06:55:54 No.101592256

Anonymous 07/27/24(Sat)06:55:54 No.101592256

my hero just spoke in /lmg/. AMA.

Anonymous
07/27/24(Sat)06:59:42 No.101592274

Anonymous 07/27/24(Sat)06:59:42 No.101592274

>>101591786
I can't make it stop either on one specific card I'm doing where it's an adventure/story rather than a one-on-one chat. IDK if this makes it harder but it probably doesn't make it easier. I put in the system prompt to write for every character except {{user}} and put in the jailbreak / depth 0 author's note never to speak for {{user}}. May have helped but didn't totally solve it. Possibly also made more difficult because I am simultaneously trying to make it stop ending replies by asking what my next action is, which I was able to reduce significantly but not eliminate. Partway through I tried cranking the temperature way down and that absolutely didn't fix the issue. Maybe if I tried again with my prompts setup better it would. Nothing solved it completely but right now the level of swiping / editing is low enough that I'm okay with things.

Anonymous
07/27/24(Sat)07:00:44 No.101592283

Anonymous 07/27/24(Sat)07:00:44 No.101592283

>>101592274
>I can't make it stop either on one specific card I'm doing where it's an adventure/story rather than a one-on-one chat.

Which isn't to say I *have* been able tp get it to stop on other cards, just that I've only been working on this one.

Anonymous
07/27/24(Sat)07:02:18 No.101592297

Anonymous 07/27/24(Sat)07:02:18 No.101592297

>>101592180
Keep up the great work, Henky!

Tell your assistant, Concedo, he did a good job too. :koboldlaugh:

Anonymous
07/27/24(Sat)07:03:57 No.101592314

Anonymous 07/27/24(Sat)07:03:57 No.101592314

>>101592247
Ooooh, someone's being an edgy boy. :koboldpeek:

You think you're so tough spouting that *f-word* behind the screen, huh?

Anonymous
07/27/24(Sat)07:05:01 No.101592323

Anonymous 07/27/24(Sat)07:05:01 No.101592323

>>101592153
I sometimes think if I was ESL I'd like LLMs a lot more. Like if I'm reading a foreign language I can't tell if the writing is good or bad. I can just (at most) tell what information it says. And if the same expressions get used over and over I'm not annoyed, I'm pleased to see familiar expressions.

Anonymous
07/27/24(Sat)07:07:33 No.101592338

Anonymous 07/27/24(Sat)07:07:33 No.101592338

>>101592040
Suddenly Lumimaid makes a lot more sense.

Anonymous
07/27/24(Sat)07:08:39 No.101592348

Anonymous 07/27/24(Sat)07:08:39 No.101592348

>>101592323
I am an ESL. That is not how it works.

Anonymous
07/27/24(Sat)07:13:04 No.101592383

Anonymous 07/27/24(Sat)07:13:04 No.101592383

>>101591917
An 8.0bpw exl2 of Mistral NeMo 12B with cache_mode q8 and 32000 tokens of context fits in 15.2 GB of VRAM.

Anonymous
07/27/24(Sat)07:22:58 No.101592467

Anonymous 07/27/24(Sat)07:22:58 No.101592467

>>101589160
t=1.0

Anonymous
07/27/24(Sat)07:23:54 No.101592475

Anonymous 07/27/24(Sat)07:23:54 No.101592475

Is it better to have 2x 3090 or 1x 3090 + 2x P40 if I'm trying to run 70b models faster?

Anonymous
07/27/24(Sat)07:24:59 No.101592489

Anonymous 07/27/24(Sat)07:24:59 No.101592489

>>101592475
2x 90

Anonymous
07/27/24(Sat)07:26:36 No.101592496

Anonymous 07/27/24(Sat)07:26:36 No.101592496

>>101592475
3x 3090 if you can but 4x 3090 would be even better

Anonymous
07/27/24(Sat)07:27:53 No.101592506

Anonymous 07/27/24(Sat)07:27:53 No.101592506

>>101592040
I mean I knew he was belgian, but didn't know it was that bad.

Anonymous
07/27/24(Sat)07:33:08 No.101592546

Anonymous 07/27/24(Sat)07:33:08 No.101592546

>>101592348
Don't lie I bet it's even stronger for u foreign cunts because your languages have like 1/5 as many words as English. Repetition is a way of life for you, while for English speakers developing a sense for how often to re-use the same word is a major early part of developing good writing style. Small children are very repetitive, older ones go too far trying to add variety, then they tone it down and get better. (Or sometimes not. There are published authors who go to unintentionally humorous lengths to avoid re-using basic words like "said.")

Undi !!Eye02t2DGfj
07/27/24(Sat)07:34:40 No.101592564

Undi !!Eye02t2DGfj 07/27/24(Sat)07:34:40 No.101592564

>>101592040
kek

Anonymous
07/27/24(Sat)07:36:06 No.101592574

Anonymous 07/27/24(Sat)07:36:06 No.101592574

>>101592546
>doesn't speak any foreign language
>don't lie to me, i bet-ack

Anonymous
07/27/24(Sat)07:36:18 No.101592579

Anonymous 07/27/24(Sat)07:36:18 No.101592579

>>101592338
>>101592506
Now I see why he never tests his own shit. Even if it was broken how could he tell?

Anonymous
07/27/24(Sat)07:36:36 No.101592582

Anonymous 07/27/24(Sat)07:36:36 No.101592582

>>101592564
Knew you were the kek poster.

Anonymous
07/27/24(Sat)07:37:40 No.101592591

Anonymous 07/27/24(Sat)07:37:40 No.101592591

File: file.png (69 KB, 349x642)

69 KB PNG

>>101592546

Anonymous
07/27/24(Sat)07:38:16 No.101592602

Anonymous 07/27/24(Sat)07:38:16 No.101592602

File: stfu.png (21 KB, 509x217)

21 KB PNG

>>101592546

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/27/24(Sat)07:44:02 No.101592665

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/27/24(Sat)07:44:02 No.101592665

>>101589653
I have never run into this problem myself but I suspect it's a driver issue.

>>101590419
With a few hundred bucks you can buy 512 GiB RAM which is enough to run it at 8.5 bits per weight.
But then you can expect something like 0.2-0.5 t/s.

>>101590774
>>101590781
>>101590786
>>101590804
The problem with the proposed parallelization scheme is the synchronization overhead.
You need to exchange (part of) the activations between GPUs and write back the results which introduces non-negligible latency, especially on fast GPUs without NVLink.
This is not much different from what --split-mode row already does and there are considerable performance issues (though the multi GPU optimization is also poor).

>But what if, you take a MOE model, for example DeepSeekV2 236B, and split the different smaller experts across the gpus, so that they don't have to exchange information. Is this thinking flawed?
Which experts are selected is effectively random and determined by the routing layer if I remember correctly.
But in order to do that the results have to first be collected on a single GPU.
So you're not really saving any I/O.

>>101592475
2x 3090 if your target quant fits into 48 GiB VRAM, 1x 3090 + 2x P40 otherwise.

Anonymous
07/27/24(Sat)07:45:48 No.101592681

Anonymous 07/27/24(Sat)07:45:48 No.101592681

File: 1718298816889142.jpg (2.53 MB, 3108x1691)

2.53 MB JPG

Mistral Large 2 is now my main model for cooms.
No more mischevious glints, she says in a husky voice, a smirk playing on her lips, eyes sparkling with mischief. There's a playful glint as she addresses the power dynamic, playfully smirking as she offers her ministrations. An audible pop and rivulets of—admit it, pet—the ball is in your court.

It has none of that slop and even as a 48GB VRAMlet using a baby 2.75BPW exl2, it can fit 12k context @15t/s.

Anonymous
07/27/24(Sat)07:48:49 No.101592709

Anonymous 07/27/24(Sat)07:48:49 No.101592709

>>101592681
lock em in a hot room and sell me the fumes

Anonymous
07/27/24(Sat)08:01:19 No.101592813

Anonymous 07/27/24(Sat)08:01:19 No.101592813

>>101592496
Pretty much this. Although I'm starting to feel like a VRAMlet with 4.

Anonymous
07/27/24(Sat)08:08:35 No.101592870

Anonymous 07/27/24(Sat)08:08:35 No.101592870

File: 1717392494482029.jpg (42 KB, 680x671)

42 KB JPG

>4x 3090s is now considered "VRAMlet"
>as if 1 wasn't pricey enough
no i will not dump retarded amounts of money onto a single-purpose machine i'd only use sparingly even if the models are appealing

Anonymous
07/27/24(Sat)08:08:41 No.101592871

Anonymous 07/27/24(Sat)08:08:41 No.101592871

>>101591941
Couldn't it be put in context template?

Anonymous
07/27/24(Sat)08:09:51 No.101592884

Anonymous 07/27/24(Sat)08:09:51 No.101592884

>>101592681
LL and 3L tag teaming S

Anonymous
07/27/24(Sat)08:10:25 No.101592887

Anonymous 07/27/24(Sat)08:10:25 No.101592887

>>101592871
Also... isnt that the point of the "System same as user" Option in ST, for this exact purpose? So you can fill in the system prompt and it treats the system prompt as the user message as well?

Anonymous
07/27/24(Sat)08:12:26 No.101592903

Anonymous 07/27/24(Sat)08:12:26 No.101592903

>>101592870
I mean people spend more money on dumber hobbies. It really depends on how far you want to go. I started out running 4-bit pygmalion 6B on a Ryzen 2400G with 8 gigs of RAM and no GPU before there was really any integration with anything so I was basically using the 'chat mode' in the console. Then someone introduced me to koboldcpp so I was running Llama 13B models on my gaming PC with a 1660 Super and 16 gigs of system ram.
I didn't just up and drop 5 grand on building a server out of the blue. It was a gradual progression.

Anonymous
07/27/24(Sat)08:12:32 No.101592904

Anonymous 07/27/24(Sat)08:12:32 No.101592904

>>101592870
The more you buy the more you save

Anonymous
07/27/24(Sat)08:17:25 No.101592964

Anonymous 07/27/24(Sat)08:17:25 No.101592964

https://github.com/ggerganov/llama.cpp/pull/8676

Llama 3.1 rope scaling finally merged

Anonymous
07/27/24(Sat)08:20:42 No.101592986

Anonymous 07/27/24(Sat)08:20:42 No.101592986

Llama.cpp master branch has been merged with the fix for L3.1's issues with context beyond 8192, should be working properly now.
https://github.com/ggerganov/llama.cpp/commit/b5e95468b1676e1e5c9d80d1eeeb26f542a38f42

>>101592681
Its not brain damaged at 2.75 bpw?

Anonymous
07/27/24(Sat)08:29:36 No.101593061

Anonymous 07/27/24(Sat)08:29:36 No.101593061

>>101592904
The more you buy the more seeing shivers down the spine hurts.

Anonymous
07/27/24(Sat)08:31:44 No.101593085

Anonymous 07/27/24(Sat)08:31:44 No.101593085

>>101592681
Is it better than a 5bpw 70B? How much better?
It's tempting to sell my 3060 and buy a second 3090
>>101593061
lmao so true

Anonymous
07/27/24(Sat)08:35:26 No.101593114

Anonymous 07/27/24(Sat)08:35:26 No.101593114

>>101589756
>>101590284
Calm down with the shilling.

Anonymous
07/27/24(Sat)08:40:30 No.101593153

Anonymous 07/27/24(Sat)08:40:30 No.101593153

File: 1709992939780627.jpg (347 KB, 2250x1651)

347 KB JPG

My model ratings from recent tests for RP, run on 48gb vram

1 - Mistral Large (Mistral-Large-Instruct-2407-123B-exl2 , 3.0 quant). Just very good at natural language

2 - Midnight miqu - it's a slopmerge on RP and does it's job

3 - Llama 3.1 (4.5 quant) - It's not designed for being a chatbot it seems clear, replies are accurate but very robotic. Beat Mistral large on knowledge checks and coding though

4 - Nemo 12b, I don't know why this was even recommended to compete with the others

waste of time - commandr

Anonymous
07/27/24(Sat)08:41:46 No.101593160

Anonymous 07/27/24(Sat)08:41:46 No.101593160

>>101592161
mikushitters and some guy named "petra"

Anonymous
07/27/24(Sat)08:45:13 No.101593186

Anonymous 07/27/24(Sat)08:45:13 No.101593186

I think here's the best place to ask about it but is there a way/program to make an LLM identify and tag several (thousand) images? doesn't have to be anything advanced, just tagging whatever it sees would already be a great help.

Anonymous
07/27/24(Sat)08:47:09 No.101593206

Anonymous 07/27/24(Sat)08:47:09 No.101593206

>>101593186
yeah, Im pretty sure moondream 2 (small and good model) has a python script implementation, just make a loop and iterate over the folder you want to classify

Anonymous
07/27/24(Sat)08:47:49 No.101593213

Anonymous 07/27/24(Sat)08:47:49 No.101593213

>>101593186
the ponyfucker said he did some LLaVA work feeding it boru tags and asking it to describe the image to get a caption.
He is kinda a retarded schizo and it isn't clear that was a better way of training than just using booru tags though

Anonymous
07/27/24(Sat)08:48:11 No.101593219

Anonymous 07/27/24(Sat)08:48:11 No.101593219

>>101593206
https://huggingface.co/vikhyatk/moondream2
here's the repository, the script is there

Anonymous
07/27/24(Sat)08:49:18 No.101593228

Anonymous 07/27/24(Sat)08:49:18 No.101593228

>>101592986
No. The only errors it does it a misplaced punctuation point once every 500 tokens or so, which is not much to complain about.

>>101593085
Despite my limited experience, I would say yes. Before Largestral, I would use Llama 3 70B finetunes for coom (New Dawn, Euryale). They were good, but had too much slop. With Largestral, no more spine shivers or any other GPT/Claudeisms. It's like I cured my model of its autism.

Anonymous
07/27/24(Sat)08:54:00 No.101593268

Anonymous 07/27/24(Sat)08:54:00 No.101593268

>>101592964
>>101592986
Again some problem with llama.cpp tokenizer. Sane people should use transformers tokenizer.

Anonymous
07/27/24(Sat)08:55:57 No.101593289

Anonymous 07/27/24(Sat)08:55:57 No.101593289

>>101593268
that literally has nothing to do with tokenization at all, it's about rope context scaling

Anonymous
07/27/24(Sat)08:56:27 No.101593292

Anonymous 07/27/24(Sat)08:56:27 No.101593292

>>101593153
>waste of time - commandr
Stopped reading right there

Anonymous
07/27/24(Sat)08:57:41 No.101593303

Anonymous 07/27/24(Sat)08:57:41 No.101593303

File: F-Gr7rLacAALRMV.jfif.jpg (245 KB, 2048x1937)

245 KB JPG

>>101593292
at the bottom of the message? Fucking retard

Anonymous
07/27/24(Sat)08:59:19 No.101593320

Anonymous 07/27/24(Sat)08:59:19 No.101593320

I still didn't find good settings for nemo. I don't like how moldable it is, or rather it is superfocused on context patterns instead of instructions. For example if you use different model (like llama-3) it would give you lengthy responses naturally (unless you tell it not to), no matter how long are your messages. Nemo however will mimic your responses and if you aren't putting much text in your messages, it won't do it as well.

Anonymous
07/27/24(Sat)09:02:46 No.101593354

Anonymous 07/27/24(Sat)09:02:46 No.101593354

>>101592383
that's an extremely specific answer, thanks a ton

Anonymous
07/27/24(Sat)09:03:08 No.101593356

Anonymous 07/27/24(Sat)09:03:08 No.101593356

>>101593219
>>101593206
Thank you, I'll take a look into it.
>>101593213
A shame how people tend to gatekeep these small things, I don't really blame him though, it's his work I suppose.

Anonymous
07/27/24(Sat)09:03:32 No.101593363

Anonymous 07/27/24(Sat)09:03:32 No.101593363

>>101593303
he's mistral nemo please understand, they put their system prompts at the bottom

Anonymous
07/27/24(Sat)09:05:07 No.101593374

Anonymous 07/27/24(Sat)09:05:07 No.101593374

>>101589265
I remember in December 2022 doomers saying local gpt 3 (DaVinci) was “maybe 10 years away”. I always knew these things were bloated as fuck.

Anonymous
07/27/24(Sat)09:06:47 No.101593392

Anonymous 07/27/24(Sat)09:06:47 No.101593392

doomer here, i'm going to make a prediction and say that agi is maybe 100 years away. 1000 years for coomable agi that fits into 10gb vram.

Anonymous
07/27/24(Sat)09:08:26 No.101593412

Anonymous 07/27/24(Sat)09:08:26 No.101593412

>>101593153
>Nemo 12b, I don't know why this was even recommended
Because of the allure of huge context length that was previously out of reach for people without much VRAM.
>to compete with the others
Assume people saying that were trolling or retarded.

Anonymous
07/27/24(Sat)09:09:47 No.101593423

Anonymous 07/27/24(Sat)09:09:47 No.101593423

>>101593374
Summer Dragon still hasn't been surpassed though so...

Anonymous
07/27/24(Sat)09:11:23 No.101593440

Anonymous 07/27/24(Sat)09:11:23 No.101593440

>>101593392
Back then 175B seemed impossibly huge. I can't believe I'm running models close to that size on a simple $3k rig at home now

Anonymous
07/27/24(Sat)09:12:30 No.101593452

Anonymous 07/27/24(Sat)09:12:30 No.101593452

Is it just me or does Llama.cpp take longer to compile than it did a few weeks/months ago?

Anonymous
07/27/24(Sat)09:13:02 No.101593463

Anonymous 07/27/24(Sat)09:13:02 No.101593463

OKey So.. the base Mistral-nemo model is much better on the larger context size; the difference in understanding is massive. What causes this?

Anonymous
07/27/24(Sat)09:17:20 No.101593513

Anonymous 07/27/24(Sat)09:17:20 No.101593513

>>101593463
What are you saying? You're getting better results with base than instruct with large chat histories?

Anonymous
07/27/24(Sat)09:19:58 No.101593547

Anonymous 07/27/24(Sat)09:19:58 No.101593547

what does flash attention do?

Anonymous
07/27/24(Sat)09:22:37 No.101593580

Anonymous 07/27/24(Sat)09:22:37 No.101593580

>>101593547
https://arxiv.org/abs/2205.14135

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/27/24(Sat)09:23:24 No.101593586

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/27/24(Sat)09:23:24 No.101593586

>>101593452
It does now take longer with CUDA, make sure you instruct the build system to run multiple jobs in parallel, for example with. -j 8

>>101593547
Calculate a temporary matrix in small parts in fast but small memory instead of calculating and writing the entire matrix to large but slow memory.
This requires more calculations but on modern hardware the speed of calculations has been increasing much more than the speed of memory.

Anonymous
07/27/24(Sat)09:24:24 No.101593594

Anonymous 07/27/24(Sat)09:24:24 No.101593594

>>101593513
Yeach. At larger contexts, instruct for me to become dumb, skipping over events and being completely lost in the plot, while the base model does not seem to have the same problem.

Anonymous
07/27/24(Sat)09:27:43 No.101593630

Anonymous 07/27/24(Sat)09:27:43 No.101593630

>>101593452
It's super annoying, I used to rebuild it everyday before using it, now only do it every other weeks or if I need compatibility with a new model.

Anonymous
07/27/24(Sat)09:29:02 No.101593650

Anonymous 07/27/24(Sat)09:29:02 No.101593650

>>101593463
You tested the base model? That's interesting.
I suspect >>101399248.
People's multiturn fine tuning data are constructed naively.

Anonymous
07/27/24(Sat)09:31:45 No.101593677

Anonymous 07/27/24(Sat)09:31:45 No.101593677

File: 1707049543626270.webm (2.81 MB, 720x1280)

2.81 MB WEBM

Largestral 2 is basically a non-dry and 10-15% smarter version of Wizard 2 8x22

At this point, there is no scenario that i test for that doesn't work very well with the model

Outside of external tool use and multimodality, is there anything else that a new model can really give when it comes to RP?

I don't think so, only speed.

Anonymous
07/27/24(Sat)09:32:43 No.101593690

Anonymous 07/27/24(Sat)09:32:43 No.101593690

>>101593677
my brain looks like that (i use crack)

Anonymous
07/27/24(Sat)09:33:43 No.101593699

Anonymous 07/27/24(Sat)09:33:43 No.101593699

>>101593677
What quants do you run of both models?

Anonymous
07/27/24(Sat)09:35:27 No.101593725

Anonymous 07/27/24(Sat)09:35:27 No.101593725

I'm still using C-R+. Nothing has changed.

Anonymous
07/27/24(Sat)09:37:13 No.101593750

Anonymous 07/27/24(Sat)09:37:13 No.101593750

>>101593699
q4

Anonymous
07/27/24(Sat)09:40:47 No.101593791

Anonymous 07/27/24(Sat)09:40:47 No.101593791

>>101593690
based expert roleplayer

Anonymous
07/27/24(Sat)09:44:36 No.101593836

Anonymous 07/27/24(Sat)09:44:36 No.101593836

Is it possible to use nemo 12b on koboldcpp? Docs say GGUF only, but has someone already converted it?

Anonymous
07/27/24(Sat)09:45:13 No.101593845

Anonymous 07/27/24(Sat)09:45:13 No.101593845

>>101592087
He has a point in that having those tensors at a higher precision than the rest of the model makes the output better, yes, but that's something that most (all?) quants already do.
The whole meme began when he claimed that having those layers at full precision gave better results than having them at q8 or whateever, which was demonstrably false.
His whole "testing" was all vibes based and non-reproducible.

Anonymous
07/27/24(Sat)09:46:48 No.101593865

Anonymous 07/27/24(Sat)09:46:48 No.101593865

>>101593836
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF

Anonymous
07/27/24(Sat)09:52:30 No.101593937

Anonymous 07/27/24(Sat)09:52:30 No.101593937

>>101593865
thx anon

Anonymous
07/27/24(Sat)09:52:38 No.101593939

Anonymous 07/27/24(Sat)09:52:38 No.101593939

>>101593836
not really, you gotta either use a fork of koboldcpp or wait for the retard to implement the tekken token bs
>>101593865
nigger

Anonymous
07/27/24(Sat)09:54:07 No.101593961

Anonymous 07/27/24(Sat)09:54:07 No.101593961

>>101592180
*cums on you*

Anonymous
07/27/24(Sat)09:55:34 No.101593986

Anonymous 07/27/24(Sat)09:55:34 No.101593986

>>101593939
>you gotta either use a fork of koboldcpp or wait for the retard to implement the tekken token bs
>2 days ago
>https://github.com/LostRuins/koboldcpp/releases/tag/v1.71
>Merged fixes and improvements from upstream, including Mistral Nemo support.
You might be a little behind.
I don't blaqme you, I've been using llama-server directly for months now, there's no reason to use kcpp really, so I get it.

Anonymous
07/27/24(Sat)09:55:41 No.101593988

Anonymous 07/27/24(Sat)09:55:41 No.101593988

>>101593939
>not really, you gotta either use a fork of koboldcpp or wait for the retard to implement the tekken token bs
are you mentally deficient?
>Merged fixes and improvements from upstream, including Mistral Nemo support.
https://github.com/LostRuins/koboldcpp/releases/tag/v1.71

Anonymous
07/27/24(Sat)09:56:12 No.101593997

Anonymous 07/27/24(Sat)09:56:12 No.101593997

>>101593677
What's crazy about AI videos is that within the bizarre surrealistic nonsense each moment is still copacetic with the previous moment and the next moment. Truly nightmare fuel.

Anonymous
07/27/24(Sat)09:56:26 No.101594001

Anonymous 07/27/24(Sat)09:56:26 No.101594001

idc dont use koboldcpp

Anonymous
07/27/24(Sat)09:56:36 No.101594003

Anonymous 07/27/24(Sat)09:56:36 No.101594003

Just tested out 3.1 70B at IQ3_M (on latest llamacpp build). It's a bit faster than Largestral was at IQ2_M. Also does OK at the trivia question I threw at it, but it doesn't seem to be able to do the Castlevania question unlike full precision. Maybe if I go just a bit higher in quant.

Anonymous
07/27/24(Sat)09:57:11 No.101594013

Anonymous 07/27/24(Sat)09:57:11 No.101594013

>>101594001
>I was just prentending to be tarded

Anonymous
07/27/24(Sat)10:00:18 No.101594064

Anonymous 07/27/24(Sat)10:00:18 No.101594064

>>101593986
>there's no reason to use kcpp really, so I get it.
Actually, just to correct myself, there is one reason.
They still have support for multi-modal, I believe, whereas upstream nuked it pending a refactor.

>>101594013
How charitable to assume he was just pretending.

Anonymous
07/27/24(Sat)10:02:17 No.101594090

Anonymous 07/27/24(Sat)10:02:17 No.101594090

>>101593725
Same but C-R

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.