/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 06/21/24(Fri)23:20:47 No.101094602

File: 1717631664840828.jpg (383 KB, 1024x1536)

383 KB JPG

/lmg/ - Local Models General Anonymous 06/21/24(Fri)23:20:47 No.101094602 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101081984 & >>101069457

►News
>(06/18) Meta Research Releases Multimodal 34B, Audio, and Multi-Token Prediction Models: https://ai.meta.com/blog/meta-fair-research-new-releases
>(06/17) DeepSeekCoder-V2 released with 236B & 16B MoEs: https://github.com/deepseek-ai/DeepSeek-Coder-V2
>(06/14) Nemotron-4-340B: Dense model designed for synthetic data generation: https://hf.co/nvidia/Nemotron-4-340B-Instruct
>(06/14) Nvidia collection of Mamba-2-based research models: https://hf.co/collections/nvidia/ssms-666a362c5c3bb7e4a6bcfb9c

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
06/21/24(Fri)23:21:38 No.101094610

Anonymous 06/21/24(Fri)23:21:38 No.101094610

File: baatsune shiipu.jpg (61 KB, 768x768)

61 KB JPG

►Recent Highlights from the Previous Thread: >>101081984

--VNTL Leaderboard Update: GPT-4o Edges Out 3.5 Sonnet, Command-R+ Rises: >>101087721 >>101088034 >>101088551 >>101090846 >>101091915 >>101088740 >>101088070 >>101088470
--Simulating Emotions with Integrated Computational Model of Appraisal and Reinforcement Learning: >>101090073
--Mixtral Still Best for Quality/Speed Margin on 24gb VRAM Systems: >>101083202 >>101083271
--CPU vs GPU Bandwidth: Are CPUmaxxxers Right After All?: >>101087340 >>101087490 >>101087638 >>101087902 >>101087583
--Anon's Quest for the Perfect Quant+Inference Server Combo: >>101082958 >>101083121 >>101083208 >>101083276 >>101083328 >>101083787 >>101084117 >>101084306 >>101084747
--Testing Karakuri Chat's Toxicity and Offensive Language Generation: >>101086865 >>101086929 >>101087181
--Sonnet 3.5 Surprisingly Generates Working Code for Werkzeug Python Server: >>101084483 >>101084530 >>101084604
--Precautions when Ordering Gigabyte MZ73-LM0 with AMD EPYC Bergamo Processors: >>101083080 >>101083668 >>101084505 >>101084300 >>101085073 >>101085195 >>101085453 >>101085515
>>101085587 >>101085619 >>101087993
--Running LLaMA 3 70B on a Single 4GB GPU with AirLLM: >>101082164
--Mikubox Upgrade: Diminishing Returns?: >>101088802
--Intel's Upcoming Processors to Shake Up the GPU Market: >>101088891 >>101088995 >>101089068
--Exploring Customizable Response Formats for Large Language Models: >>101090629 >>101090695 >>101090845
--Current Local LLM Status: Meta, Mistral, DBRX, Cohere, and TIIUAE: >>101087844 >>101088705 >>101089151 >>101089215
--AI Models Fail to Meet the Anime Character Challenge: >>101084936
--Turbocat's New Model: LLaMA 3 Turbcat Instruct 8B on Hugging Face: >>101082832 >>101082906 >>101083355 >>101083535 >>101084750 >>101083498 >>101083559 >>101083662
--Miku (free space): >>101084936 >>101085298 >>101086061 >>101086175 >>101086831 >>101087433 >>101088471

►Recent Highlight Posts from the Previous Thread: >>101081988

Anonymous
06/21/24(Fri)23:26:40 No.101094655

Anonymous 06/21/24(Fri)23:26:40 No.101094655

File: 1690649714188633.jpg (1.13 MB, 3200x4000)

1.13 MB JPG

>>101094602
hello /lmg/

Anonymous
06/21/24(Fri)23:43:10 No.101094806

Anonymous 06/21/24(Fri)23:43:10 No.101094806

File: IMG_8090.jpg (252 KB, 1482x1864)

252 KB JPG

>>101094655
hello miku

Anonymous
06/21/24(Fri)23:52:10 No.101094872

Anonymous 06/21/24(Fri)23:52:10 No.101094872

File: file.png (6 KB, 288x114)

6 KB PNG

what do you guys use language models for?

I like to play around with giving them different kinds of reply/memory logic. In the picrel the bot is on a timer. After a message is sent, it checks to see if it should reply again or not.

Anonymous
06/21/24(Fri)23:52:45 No.101094878

Anonymous 06/21/24(Fri)23:52:45 No.101094878

>>101094602
I've been trying out magnum opus, any anons have sampler settings to recommend for it?

Anonymous
06/21/24(Fri)23:55:21 No.101094908

Anonymous 06/21/24(Fri)23:55:21 No.101094908

>>101094610
>AirLLM
I might be insane, but I think I remember that from a while back.
Anybody tried running that?
How hard would it be to jerryrig a python OAI compliant server using the sample inference code?

Anonymous
06/22/24(Sat)00:03:18 No.101094964

Anonymous 06/22/24(Sat)00:03:18 No.101094964

>>101094872
Without reliable function calling API endpoints, nothing meaningful to be honest. Occasionally have it generate short stories to fap to. Ask it to give me a summary of a concept, but that's it.

Anonymous
06/22/24(Sat)00:07:52 No.101095003

Anonymous 06/22/24(Sat)00:07:52 No.101095003

>>101094964
>Without reliable function calling API endpoints
i'm using ollama to make json outputs with true or false for my use case and its pretty reliable. i always get a true or false but sometimes the llm doesnt properly follow the prompt and will say false when it should be true

Anonymous
06/22/24(Sat)00:15:12 No.101095070

Anonymous 06/22/24(Sat)00:15:12 No.101095070

File: image.png (104 KB, 790x384)

104 KB PNG

can your models do this?

Anonymous
06/22/24(Sat)00:19:33 No.101095110

Anonymous 06/22/24(Sat)00:19:33 No.101095110

>>101095070
llama3: i cannot create this pee pee poo poo, im not gonna bite or whatever, raycism is le bad, nignogs are le good even if they are killing everyone around them

Anonymous
06/22/24(Sat)00:28:04 No.101095184

Anonymous 06/22/24(Sat)00:28:04 No.101095184

File: 1531597776422.jpg (34 KB, 417x417)

34 KB JPG

So what's the most "peak AI" card out there? Like you're trying to show someone how cool AI can be, and that's the card you use to mindblow them. Of course, paired with a sufficiently good model though.

Anonymous
06/22/24(Sat)00:29:16 No.101095198

Anonymous 06/22/24(Sat)00:29:16 No.101095198

>>101095070
You're trying too hard to fit in.

Anonymous
06/22/24(Sat)00:32:47 No.101095240

Anonymous 06/22/24(Sat)00:32:47 No.101095240

>>101095198
no one cares, fuck off

Anonymous
06/22/24(Sat)00:45:49 No.101095348

Anonymous 06/22/24(Sat)00:45:49 No.101095348

>>101095184
>card
Is that all AI is to you?

Anonymous
06/22/24(Sat)00:47:51 No.101095370

Anonymous 06/22/24(Sat)00:47:51 No.101095370

>>101095184
no such thing, all AI models are censored to some extent, you can't have fun or "peak AI" card.

Anonymous
06/22/24(Sat)00:51:26 No.101095400

Anonymous 06/22/24(Sat)00:51:26 No.101095400

>>101095184
Bitch control app is always my goto

Anonymous
06/22/24(Sat)00:52:46 No.101095412

Anonymous 06/22/24(Sat)00:52:46 No.101095412

>>101094878
Come on anon bros, help a coomer out. Good Sampler settings for magnum opus or lets just say Qwen 2 72B Instruct? I saw the Nala anon having decent logs with Magnum a while back, nothing real special, but I'm hoping to get a bit of variety from my go to Miqu.

Anonymous
06/22/24(Sat)01:21:20 No.101095632

Anonymous 06/22/24(Sat)01:21:20 No.101095632

Do you people still call this those statistical models AI? Why?

Anonymous
06/22/24(Sat)01:23:42 No.101095646

Anonymous 06/22/24(Sat)01:23:42 No.101095646

>>101095632
because they fulfill the definition of an AI, regadless on how it works inside?

Anonymous
06/22/24(Sat)01:24:29 No.101095651

Anonymous 06/22/24(Sat)01:24:29 No.101095651

>>101095632
Because language is descriptive, not prescriptive. The common use of the word AI is now used to refer to the implemetations of these statistical models, and thus it is what we use when discussing those models.

If anything, it is the researchers that need to find a new word to describe what AI used to describe.

Anonymous
06/22/24(Sat)01:25:19 No.101095661

Anonymous 06/22/24(Sat)01:25:19 No.101095661

>>101095412
I've been getting good results with a simple temp 1, min p 0.08, freq/pres pen as needed setup
as usual with samplers I think there are a lot of setups that will work fine, my one meaningful piece of advice is do not crank the temp with magnum, it's not overbaked and doesn't really need it. I noticed a lot of diminishing quality the further I pushed the temp above 1 because the model kept getting pushed down schizo nonsense routes that really degraded the quality, especially with dialogue. you get a pretty good variety of responses on rerolls even at lower temps so I don't think there's very much benefit to it.

Anonymous
06/22/24(Sat)01:58:51 No.101095859

Anonymous 06/22/24(Sat)01:58:51 No.101095859

File: 1695374109334474.png (663 KB, 752x701)

663 KB PNG

>>101095370
it's literally everywhere

Anonymous
06/22/24(Sat)02:37:41 No.101096105

Anonymous 06/22/24(Sat)02:37:41 No.101096105

File: 1526815789912.gif (391 KB, 640x360)

391 KB GIF

>don't have much VRAM, decide to try and run CR+ at q6 mostly in RAM just to see what it's like
>get 0.9 t/s
Haha...

Anonymous
06/22/24(Sat)02:46:28 No.101096147

Anonymous 06/22/24(Sat)02:46:28 No.101096147

>>101096105
I get 0.3
with CR, not +
at q4

Anonymous
06/22/24(Sat)03:12:39 No.101096307

Anonymous 06/22/24(Sat)03:12:39 No.101096307

It's over. Nous Research got hit with a Cease & Desist letter.
https://x.com/NousResearch/status/1804219649590276404

Anonymous
06/22/24(Sat)03:19:16 No.101096354

Anonymous 06/22/24(Sat)03:19:16 No.101096354

>>101096307
What fucking content?
Is there image and audio generation involving likeness of their content?
Just lyrics?

Anonymous
06/22/24(Sat)03:19:22 No.101096356

Anonymous 06/22/24(Sat)03:19:22 No.101096356

>>101096307
lol no they didn't
>nouse
also
>CONFIDENTIAL
lmao

Anonymous
06/22/24(Sat)03:21:19 No.101096368

Anonymous 06/22/24(Sat)03:21:19 No.101096368

File: file.png (6 KB, 336x157)

6 KB PNG

>one letter shorter, ignoring the period
what the fuck

Anonymous
06/22/24(Sat)03:25:12 No.101096401

Anonymous 06/22/24(Sat)03:25:12 No.101096401

>>101096307
There's no specific misdeed alleged in that letter. Looks like these retards are just scattershot mailing this letter to every training group without even bothering to determine if their content was used.

Anonymous
06/22/24(Sat)03:25:53 No.101096409

Anonymous 06/22/24(Sat)03:25:53 No.101096409

>>101096307
lol, what even is the point of this letter? It doesn't sound like it's demanding anything (unless it's on a following page). Is Sony just blanket mailing any AI research org they can find? Even by globohomo megacorp greedy fuck jewish lawyer standards it doesn't really make any sense.

Anonymous
06/22/24(Sat)03:36:01 No.101096488

Anonymous 06/22/24(Sat)03:36:01 No.101096488

>>101096307
copyright was a mistake

Anonymous
06/22/24(Sat)03:37:26 No.101096495

Anonymous 06/22/24(Sat)03:37:26 No.101096495

>>101096105
How. I get 0.4t/s...

Anonymous
06/22/24(Sat)03:41:07 No.101096517

Anonymous 06/22/24(Sat)03:41:07 No.101096517

What the fuck, stheno 3.2 blows mythomax out of the water for story completion. I've been gone for like 6 months and have finally been rewarded as a vramlet. All I use models for is modifying erotic stories I already enjoy.

Anonymous
06/22/24(Sat)03:50:52 No.101096592

Anonymous 06/22/24(Sat)03:50:52 No.101096592

>>101096517
Buy an ad.

Anonymous
06/22/24(Sat)03:54:30 No.101096625

Anonymous 06/22/24(Sat)03:54:30 No.101096625

>>101096592
sorry I forgot I was allowed to express I actually enjoy something. I will return to being a jaded husk.

Anonymous
06/22/24(Sat)04:30:31 No.101096973

Anonymous 06/22/24(Sat)04:30:31 No.101096973

I really want to use ollama but the fact that I can't just load my .ggufs without having to go through hoops is frustrating. Is there also no way to change the system prompt and parameters like there is in ooba?

Anonymous
06/22/24(Sat)04:37:00 No.101097032

Anonymous 06/22/24(Sat)04:37:00 No.101097032

Never tried those stheno and euryale ones. How is euryale compared to magnum?

Anonymous
06/22/24(Sat)04:45:01 No.101097089

Anonymous 06/22/24(Sat)04:45:01 No.101097089

>>101096517
One have to demonstrate samplers and other settings when making such claims.

Anonymous
06/22/24(Sat)04:52:28 No.101097130

Anonymous 06/22/24(Sat)04:52:28 No.101097130

>>101097032
Kind of fried on the OOHHHH I'M CUUUMMING. I didn't try the 8B.
But it has less repetition than Magnum.

llama.cpp CUDA dev !YOmst7Ghe6
06/22/24(Sat)04:56:14 No.101097160

llama.cpp CUDA dev !YOmst7Ghe6 06/22/24(Sat)04:56:14 No.101097160

>>101094908
I didn't test it but honestly AirLLM seems like a total meme.
I don't see the advantage over just running the model from RAM.

>>101096307
I wonder if and when there will be actual court cases that settle whether or not training on something counts as a copyright infringement.
Though I think given the competition between countries when it comes to machine learning there will be an incentive to overrule any such cases with a law that explicitly permits training (like Japan did).

Anonymous
06/22/24(Sat)05:17:04 No.101097283

Anonymous 06/22/24(Sat)05:17:04 No.101097283

>>101095632
Ignorance. I for one am waiting for JEPA cat AI

Anonymous
06/22/24(Sat)05:31:41 No.101097367

Anonymous 06/22/24(Sat)05:31:41 No.101097367

>>101094908
Kobokd already had AirLLM's "Load 70b with 4GB VRAM" long before it even came into existence.

Anonymous
06/22/24(Sat)05:38:32 No.101097409

Anonymous 06/22/24(Sat)05:38:32 No.101097409

https://x.com/ylecun/status/1804184085125857687
He's laughing at us again...

Anonymous
06/22/24(Sat)05:40:50 No.101097425

Anonymous 06/22/24(Sat)05:40:50 No.101097425

>>101097409
He was supposed to be our saviour. It's over, AI is a joke.

Anonymous
06/22/24(Sat)05:53:55 No.101097536

Anonymous 06/22/24(Sat)05:53:55 No.101097536

>>101097425
China is the savior, leaving everyone else in the dust

Anonymous
06/22/24(Sat)05:55:09 No.101097547

Anonymous 06/22/24(Sat)05:55:09 No.101097547

>>101097409
He's laughing at ALL llms, including proprietary ones.

Anonymous
06/22/24(Sat)06:06:21 No.101097621

Anonymous 06/22/24(Sat)06:06:21 No.101097621

/lmg/ and /ldg/ frenship

Anonymous
06/22/24(Sat)06:07:35 No.101097635

Anonymous 06/22/24(Sat)06:07:35 No.101097635

>>101097409
what's the solution for this though?

Anonymous
06/22/24(Sat)06:08:10 No.101097640

Anonymous 06/22/24(Sat)06:08:10 No.101097640

>>101097130
>Kind of fried on the OOHHHH I'M CUUUMMING.
still better than
>oh, oh, mistress

Anonymous
06/22/24(Sat)06:09:22 No.101097649

Anonymous 06/22/24(Sat)06:09:22 No.101097649

>>101097635
Mcts

Anonymous
06/22/24(Sat)06:09:31 No.101097651

Anonymous 06/22/24(Sat)06:09:31 No.101097651

>>101097640
Hi, Sao. Which model says "oh, oh, mistress"?

Anonymous
06/22/24(Sat)06:09:38 No.101097654

Anonymous 06/22/24(Sat)06:09:38 No.101097654

>>101097635
abandon language only models. multimodality is requirement. that's what he laughs at. models trained with Ground Truth of human slop will always be limited to human slop

Anonymous
06/22/24(Sat)06:12:09 No.101097670

Anonymous 06/22/24(Sat)06:12:09 No.101097670

>>101097651
>Sao
What?
>Which model says "oh, oh, mistress"?
The biomechanical one.

Anonymous
06/22/24(Sat)06:18:54 No.101097745

Anonymous 06/22/24(Sat)06:18:54 No.101097745

File: she.png (32 KB, 931x281)

32 KB PNG

>she

Anonymous
06/22/24(Sat)06:19:13 No.101097749

Anonymous 06/22/24(Sat)06:19:13 No.101097749

so when I currently use magnum downloading euryale wouldn't be a straight upgrade, just different problems

Anonymous
06/22/24(Sat)06:25:44 No.101097806

Anonymous 06/22/24(Sat)06:25:44 No.101097806

File: sddefault.jpg (37 KB, 640x480)

37 KB JPG

>>101097745
>Does he know?
Should I reply saying that I'm a man and my feelings are deeply offended by his misgendering?

Anonymous
06/22/24(Sat)06:27:52 No.101097833

Anonymous 06/22/24(Sat)06:27:52 No.101097833

>>101097806
Let it be. You don't want your pr to get closed again. Does he really not understand the issue yours is trying to solve?

Anonymous
06/22/24(Sat)06:29:00 No.101097845

Anonymous 06/22/24(Sat)06:29:00 No.101097845

>>101097409
Nooo my 50 trillion tokens... Amounts to this...

Anonymous
06/22/24(Sat)06:34:10 No.101097888

Anonymous 06/22/24(Sat)06:34:10 No.101097888

File: Screenshot from 2024-06-2(...).png (191 KB, 954x702)

191 KB PNG

>>101097635

gpt-4-turbo-2024-04-09

Anonymous
06/22/24(Sat)06:39:28 No.101097926

Anonymous 06/22/24(Sat)06:39:28 No.101097926

>>101097806
you should, would be funny to see that kek

Anonymous
06/22/24(Sat)06:40:29 No.101097938

Anonymous 06/22/24(Sat)06:40:29 No.101097938

>>101097888
>Ah, the old river crossing riddle!
So gpt4 has been trained with this solution too

Anonymous
06/22/24(Sat)06:41:55 No.101097950

Anonymous 06/22/24(Sat)06:41:55 No.101097950

>>101097409
I think the point lecun is making is right but he is arguing in bad faith, the AI gets this bad wrong because its overcooked with this riddle, not because it can't reason

Anonymous
06/22/24(Sat)06:42:38 No.101097955

Anonymous 06/22/24(Sat)06:42:38 No.101097955

River crossing dataset with thousands of variations of the problem when?

Anonymous
06/22/24(Sat)06:47:22 No.101097995

Anonymous 06/22/24(Sat)06:47:22 No.101097995

>>101097950
LLMs cannot reason either way.

Anonymous
06/22/24(Sat)06:49:04 No.101098007

Anonymous 06/22/24(Sat)06:49:04 No.101098007

>>101097995
Neutral networks aren't much more than if else

Anonymous
06/22/24(Sat)07:04:25 No.101098119

Anonymous 06/22/24(Sat)07:04:25 No.101098119

>>101097955
>>101097995
It should actually be simpler than that.
What you would actually want is variations on making logical connections between separate discreet concepts. An 'analogies' dataset if you will.

Anonymous
06/22/24(Sat)07:22:17 No.101098261

Anonymous 06/22/24(Sat)07:22:17 No.101098261

>>101098119
Thanks for the insight. 2mw until AGI, then?

Anonymous
06/22/24(Sat)07:29:28 No.101098331

Anonymous 06/22/24(Sat)07:29:28 No.101098331

>>101098261
I mean I could probably do it in about 2 days if I cared that much.

Anonymous
06/22/24(Sat)07:41:55 No.101098436

Anonymous 06/22/24(Sat)07:41:55 No.101098436

File: Sthenose.png (186 KB, 1874x860)

186 KB PNG

>>101096517
>Stheno
More like
>sTheNose

Anonymous
06/22/24(Sat)07:44:14 No.101098469

Anonymous 06/22/24(Sat)07:44:14 No.101098469

File: claude-1-lmsys.png (65 KB, 1521x329)

65 KB PNG

How big is Claude-1? Is it really just a well-tuned 13b like some were saying?

Anonymous
06/22/24(Sat)07:45:49 No.101098495

Anonymous 06/22/24(Sat)07:45:49 No.101098495

>>101098436
that's what happen when you pretrain your model on leddit and wokeipedia

Anonymous
06/22/24(Sat)07:57:26 No.101098611

Anonymous 06/22/24(Sat)07:57:26 No.101098611

>>101098469
We don't know. Anthropic never publishes any technical details about their models.

Anonymous
06/22/24(Sat)07:57:33 No.101098613

Anonymous 06/22/24(Sat)07:57:33 No.101098613

File: file.png (275 KB, 1255x498)

275 KB PNG

>>101096307
Sony has sent literally every sufficiently large AI research org letters like this.
It is pathetic and ridiculous.

https://www.nbcnews.com/tech/tech-news/sony-music-group-warns-700-companies-using-content-train-ai-rcna152689

Anonymous
06/22/24(Sat)08:06:23 No.101098669

Anonymous 06/22/24(Sat)08:06:23 No.101098669

Is there anything like stheno at 34b? Like a model that punches way above it's weight for RP.

Anonymous
06/22/24(Sat)08:08:02 No.101098687

Anonymous 06/22/24(Sat)08:08:02 No.101098687

>>101098669
no, we are in the era of 8b or 100b, there is nothing worthwhile inbetween

Anonymous
06/22/24(Sat)08:10:04 No.101098710

Anonymous 06/22/24(Sat)08:10:04 No.101098710

>>101098687
maybe Meta Chamelon 34b will save the day?

Anonymous
06/22/24(Sat)08:18:02 No.101098764

Anonymous 06/22/24(Sat)08:18:02 No.101098764

>>101098710
Lol.

Anonymous
06/22/24(Sat)08:18:21 No.101098768

Anonymous 06/22/24(Sat)08:18:21 No.101098768

>>101098764
:(

Anonymous
06/22/24(Sat)08:20:38 No.101098779

Anonymous 06/22/24(Sat)08:20:38 No.101098779

>>101098710
Llama 2 tier

Anonymous
06/22/24(Sat)08:28:49 No.101098833

Anonymous 06/22/24(Sat)08:28:49 No.101098833

I'm feeling a major release for next week.

Anonymous
06/22/24(Sat)08:30:33 No.101098846

Anonymous 06/22/24(Sat)08:30:33 No.101098846

So, I'm using KoboldCPP to contribute for some Kudos to spend for prioritization on 70B+ models I can't host myself, I'm using the same API key in the horde tab in KoboldCPP as I use in SillyTavern, when I click on show my Kudos in SillyTavern it says I have 25 Kudos, when I navigate to lite.koboldai.net and use my API key there it shows Kudos Balance of 25 too, so that part is consistent.

When I click Manage My Workers my worker shows up and says it has 100K Kudos. How to I make use of them?

Anonymous
06/22/24(Sat)08:30:52 No.101098855

Anonymous 06/22/24(Sat)08:30:52 No.101098855

>>101098833
I'm not, but hope you're right

Anonymous
06/22/24(Sat)08:32:57 No.101098865

Anonymous 06/22/24(Sat)08:32:57 No.101098865

>>101098833
I'm going to release majorily right now.

Anonymous
06/22/24(Sat)08:35:35 No.101098888

Anonymous 06/22/24(Sat)08:35:35 No.101098888

>>101098687
Is there a good reason for that?
It's like model quality is on a cubic power curve.
LLM is an RPG and you must grind exponentially more B to level up just to get a few more skill points in slop.

Anonymous
06/22/24(Sat)08:41:05 No.101098924

Anonymous 06/22/24(Sat)08:41:05 No.101098924

How much context can you stretch l3 70b tunes to without breaking them, and what alpha value for that context.

Anonymous
06/22/24(Sat)08:42:41 No.101098938

Anonymous 06/22/24(Sat)08:42:41 No.101098938

Say.... didn't google remove all the naughty stuff from gemma's pre-training corpus? And since the slop comes from all the naughty human writing found in the pretraining datasets wouldn't that theoretically make it the perfect blank slate for a slop-free ERP tune?

Anonymous
06/22/24(Sat)08:44:05 No.101098944

Anonymous 06/22/24(Sat)08:44:05 No.101098944

Is the 4060TI 16GB actually the cheapest and most efficient RTX GPU to run models locally right now? I know there's the A770 16GB but are Intel Arc GPUs even there yet in terms of stability? Isn't the A770 also a bit of a power hog?
Maybe it's better to just wait for battlemage or 50 series? From what I see from people testing AMD is just shit in AI, even the A770 is beating a lot of their cards.

Anonymous
06/22/24(Sat)08:45:24 No.101098956

Anonymous 06/22/24(Sat)08:45:24 No.101098956

>>101098938
Smut is not the only place where you find shivers.

Anonymous
06/22/24(Sat)08:48:07 No.101098973

Anonymous 06/22/24(Sat)08:48:07 No.101098973

>>101098956
True. But the overall shiver density in other forms of fiction should at least be lower

Anonymous
06/22/24(Sat)08:51:39 No.101098997

Anonymous 06/22/24(Sat)08:51:39 No.101098997

>>101097160
>I don't see the advantage over just running the model from RAM.
That was my thought as well. I imagine that there's a LOT of data movement that can cause tons of overhead. Wither that or they are just running it off ram and they quote the 4gb ram for the kv cache like llama.cpp does with 0 offloaded layers and CUDA.
Still, I'll give it a try.

> Though I think given the competition between countries when it comes to machine learning there will be an incentive to overrule any such cases with a law that explicitly permits training (like Japan did).
My thoughts exactly.
In an arms race the one with the least restrictions has the opportunity to get ahead first or further, all other things being equal of course.
I can see something like "as long as the final result doesn't reproduce copiright material it's legal" or something.

Anonymous
06/22/24(Sat)08:54:59 No.101099021

Anonymous 06/22/24(Sat)08:54:59 No.101099021

>>101098944
I got that one, but it's not recommended here because memory bandwidth

Anonymous
06/22/24(Sat)08:57:20 No.101099040

Anonymous 06/22/24(Sat)08:57:20 No.101099040

>>101098924
16k
https://desmos.com/calculator/ffngla98yc

Anonymous
06/22/24(Sat)09:06:25 No.101099118

Anonymous 06/22/24(Sat)09:06:25 No.101099118

>>101099021
Is the bus size really that important? I feel like buying anything less than 16GB is a bad idea since even 8B models like Stheno is pushing 10GB with 8192 context size and 512 batch size.

Also I just can't figure out the Quaints, I know bigger number = less retardation and going under 4 is basically lobotomy but reading tons of conflicting info, people saying you should always just go for Q8 if your vram can fit it, but then there are also people saying anything larger than Q5_K_M is a waste of space. Now there's also the weighted IQuaints, which is new, should I always go for IQuaints instead now if available?
I tried looking at what other people are hosting on silly tavern but it looks like most people delete their Quaint tags and stuff.

Anonymous
06/22/24(Sat)09:12:52 No.101099179

Anonymous 06/22/24(Sat)09:12:52 No.101099179

File: KL-divergence_quants.png (111 KB, 1771x944)

111 KB PNG

>>101099118
In my experience, you want at least > 4bpw.
If you are going lower than that, you are usually better off using a smaller model with a higher quant.
q8 is pretty much the same as q6 in practice, and Q5s do output different results, but not necessarily worse either, with worse being really hard to define due to all the subjective of using these things for RPing, mostly.
Basically, my experience more or less aligns with the chart.

Anonymous
06/22/24(Sat)09:22:12 No.101099286

Anonymous 06/22/24(Sat)09:22:12 No.101099286

I'm using the LLama-3 Roleplay V1.9 preset with a little bit of tweaking. I've found that if you don't talk to the bots and let them interact with each other in a group chat a handful of times, they end up in a loop, repeating their lines and going no where. Is that because I have response tokens set to 512? I started out with 256 but the replies kept cutting off mid sentence.

Anonymous
06/22/24(Sat)09:22:39 No.101099291

Anonymous 06/22/24(Sat)09:22:39 No.101099291

I wish there was a way to sample specifically the first character in a way that, if the token chosen from he first batch is a EOS, it chooses the next non EOS token.
I realize that a message generated like that would most likely be schizp as fuck, but I'd love to at least have the option.
On another note,
>Message #118, mention the name of an NPC that's not part of the current story
>Message #212, character names said NPC
Alright, 32k context works with L3 8b.
Using yarn with freq-base of 5000000.

Anonymous
06/22/24(Sat)09:26:39 No.101099342

Anonymous 06/22/24(Sat)09:26:39 No.101099342

>>101097833
>>101097926
Like an angel and a devil on my shoulders. I'm not in trolly mood right now, so I won't bother him.

Anonymous
06/22/24(Sat)09:26:42 No.101099345

Anonymous 06/22/24(Sat)09:26:42 No.101099345

>>101099118
Quant mood board.

Q8 seems to be the peak. It avoids the FP/BF16 drama, and seems to be the limit of useful bits.
Q6 series don't metric quite as well but it seems to be under the noise floor.

Then we get into the drama zone. Summary:

-Bigger Q is better.
-Q_K options beat non-K options.
-IQ options are more compromised than a Q_K or non K but might be needed to trade some performance for fitting VRAM.
-There are a lot of K's, K_XXS, K_XS, K_S, K_M, K_L, and I've heard of something like K_NL and K_P but I've never seen one.
-Recently there's some buzz in the thread about K_S and perhaps the older _0 quants being better at factual details than K_M. This needs more testing but if your use case requires accuracy, an S might be more detailed but less creative than the parallel M. That said, small S's make mistakes and it seems at Q6, there is no S/M issue to think about and truthiness seems to be as good as it'll get anyway.

Oh, and don't conflate IQ quants with iMatrix. They're different things.

Anonymous
06/22/24(Sat)09:32:21 No.101099405

Anonymous 06/22/24(Sat)09:32:21 No.101099405

>>101099286
No, that's just how it is. LLMs don't have creativity, if a pattern emerges it gets amplified to oblivion.

Anonymous
06/22/24(Sat)09:45:53 No.101099533

Anonymous 06/22/24(Sat)09:45:53 No.101099533

File: 1701258689089547.jpg (55 KB, 785x1051)

55 KB JPG

>>101099286
You need an element of randomness to shake them once in a while. I think the random tangent that some anon gave here would work great. Just put that at a depth where you're usually seeing the loop.
https://pastebin.com/JbchCSHU

Anonymous
06/22/24(Sat)09:46:08 No.101099535

Anonymous 06/22/24(Sat)09:46:08 No.101099535

>>101099286
You want to add manual randomness to your prompt using the {{random:}} and {{pick:}} macros since >>101099405.
These things are crazy pattern matching machines, and sometimes they'll latch onto a pattern and run with it.

Anonymous
06/22/24(Sat)09:46:42 No.101099540

Anonymous 06/22/24(Sat)09:46:42 No.101099540

>>101099345
>It avoids the FP/BF16 drama
wha is the "FP/BF16 drama"?

Anonymous
06/22/24(Sat)09:49:01 No.101099568

Anonymous 06/22/24(Sat)09:49:01 No.101099568

>>101099179
what exactly is lost with quality in these charts?
even at q2, large models are coherent just the same, they use the same dumb language like shivers, they speak the exact same way 'a mix of x and y', muh bonds
as long as you aren't getting literal unintelligible gibberish from a model i dont think these charts really mean anything

Anonymous
06/22/24(Sat)09:49:25 No.101099572

Anonymous 06/22/24(Sat)09:49:25 No.101099572

>>101099345
So, theoretically I should be using Q8 when space is a non-issue, and then then maybe K_S over K_M, but what happens when iMatrix enters the discussion?

For example for Sthenos V3.2 there is a recommendation for i1-Q4_K_M, should one go for that compared to say the Q5_K_S or even the Q8?

Anonymous
06/22/24(Sat)09:51:11 No.101099589

Anonymous 06/22/24(Sat)09:51:11 No.101099589

>>101099540
BF16 is the original training weights, switching to FP16 makes it as braindead as Q8. Some shitty consumer hardware doesn't have support for BF16.

Anonymous
06/22/24(Sat)09:51:15 No.101099591

Anonymous 06/22/24(Sat)09:51:15 No.101099591

>>101099540
bf16-trained models don't quantize very well which is why llama3 quants take such a huge hit even at q8

Anonymous
06/22/24(Sat)09:52:08 No.101099598

Anonymous 06/22/24(Sat)09:52:08 No.101099598

>>101099540
Not that anon, but I think some models are released in one format, which then needs to be converted to the other format before quanting it.
And there's differences in the precision of each format which in theory could change the characteristics of the weights.

>>101099568
It's not about coherence or accuracy of information. Quality could be defined as how close to the original weights the output is. So the original unquanted model could be dumb and output a wrong answer to a prompt, but a quant that outputs the exact same answer with the exact same token probabilities would be at 100% quality for example.
That's my understanding at least.
A quanted model that outputs "better" (more accurate, more "intelligent", whatever) responses can be nothing more than a coincidence.

Anonymous
06/22/24(Sat)09:52:12 No.101099600

Anonymous 06/22/24(Sat)09:52:12 No.101099600

>>101099589
>Some shitty consumer hardware
As well as all the pre-ampere workstation/server cards like the RTX8000 or P40/P100.

Anonymous
06/22/24(Sat)09:55:10 No.101099626

Anonymous 06/22/24(Sat)09:55:10 No.101099626

Does loosing quality have anything to do with some models thinking of only the furry easter egg bunny suits when bunny suits are mentioned in a bar setting? Like I'll start out with, I'm in a bar, (blah blah blah details), I sit down and order a whiskey on the rocks and take a look around at the girl servers prancing around clad in bunny suits and some models will reply with like oh user takes in the scenery, all the bar girls in pink furry bunny suits hopping around.

Anonymous
06/22/24(Sat)10:00:18 No.101099681

Anonymous 06/22/24(Sat)10:00:18 No.101099681

>>101099540
FP16 is classic, with 10 bits in the mantissa.
BF16 is new, with 7 bits in the mantissa.

So BF has fewer significant figures but is more precise about scale. BF seems to be the preference for gradient work. But it also means you have literally 7 significant bits so quants are already in trouble while FP starts you with 10 and you can Q8 reasonably.

The important thing is knowing that you don't want to change between FP and BF or you lose bits and gain error either way you go.

>>101099572
Maybe. Apparently some Q8 is actually less but with padding because people didn't understand that Q8 could result in a Q6 kind of size if there wasn't meaningful bits to retain.

I have not heard anybody testing iMatrix's effects on model truthiness.

>Sthenos
You can test them and inform us. For chat, Q is king. It's only tricky factual details where S seems to have a particular advantage (I've mentioned many times I use a music theory question to test models and nothing K_M at Q5 or worse has passed, but some K_S models have.) but it also seems to be significant, with a Q4_K_M being beaten by Q2_K_S in S-Anon's test.

Anonymous
06/22/24(Sat)10:01:37 No.101099691

Anonymous 06/22/24(Sat)10:01:37 No.101099691

>>101099118
you get performance proportional to memory bandwidth. if you need new and want to just fit as big of model as possible 4060ti 16gb is good option. it's just in awkward spot to blindly recommend.
4070ti super gives you 2.3 times more performance, 3090/4090 gives 3.5 times more performance.

Anonymous
06/22/24(Sat)10:03:42 No.101099708

Anonymous 06/22/24(Sat)10:03:42 No.101099708

>>101099681
>Apparently some Q8 is actually less but with padding because people didn't understand that Q8 could result in a Q6 kind of size if there wasn't meaningful bits to retain.
I'm pretty sure that's only for exl2, and doesn't apply to ggufs, unless you have a source for it being a thing for lccp/ggufs too

Anonymous
06/22/24(Sat)10:05:02 No.101099720

Anonymous 06/22/24(Sat)10:05:02 No.101099720

>>101098888
No, companies could train midrange models if they wanted to. llama1 had a linear curve from 7b to 13b to 30b to 70b.

Anonymous
06/22/24(Sat)10:08:13 No.101099747

Anonymous 06/22/24(Sat)10:08:13 No.101099747

>>101098888
Cutting cost. You have heavyweights for companies and lightweights for consumers (the end goal was to run them on phones).

Anonymous
06/22/24(Sat)10:09:00 No.101099761

Anonymous 06/22/24(Sat)10:09:00 No.101099761

>>101099708
I don't know the implementation details of the Q8's being padded to make them look like they're appropriately larger than 6's. Someone mentioned that recently so I mentioned it here because if Q8 quants can safely discard more irrelevant bits, then it means a choice between Q6 and Q8 may be more significant for some models than others.

>>101099720
So it's more about the cost of training models versus the expected demand, knowing that normies will take the small one and say "wow my computer is writing" and the hyper wealthy turbo chads are already demanding much larger models to fill their terabytes of VRAM and make their 1.21 gigawatt waifus slightly more quickly process every bit of written knowledge ever a brazillion number of times to ultimately say "Do you think I'm kawaii, sempai? u~guu"

Anonymous
06/22/24(Sat)10:14:57 No.101099811

Anonymous 06/22/24(Sat)10:14:57 No.101099811

What merge of Sthenos do I look into to have it not such of a easy push over? I mean it's writing better than a lot of the models I've been playing with, like I'm pretty sure I like it more than Nymeria and Poppy_porpoise, but I feel like Stheno is a bit too easy to push over.

Anonymous
06/22/24(Sat)10:18:02 No.101099839

Anonymous 06/22/24(Sat)10:18:02 No.101099839

>>101099720
It really comes down to the fact that the people pretraining the base models don't give a rats ass about quantization. Because when you really think about it.
8B = fits perfectly on 24GB graphics card (aka at home hobbyist) in FP16, leaving headroom for display out etc.

13B = only slightly more than half fills a 48GB Workstation card and is too big to fit on a 24GB card.
It's a mathematical odd one out.
34B = 80GB enterprise card. BUT people with access to enterprise hardware would all just rather multi-GPU and run 70B at that point anyway.

Quantlets BTFO

Anonymous
06/22/24(Sat)10:22:48 No.101099889

Anonymous 06/22/24(Sat)10:22:48 No.101099889

>>101099811
>but I feel like Stheno is a bit too easy to push over.
Oh yeah, I love the model but it's a happy and compliant kind of gal for sure.
I haven't tried much to prompt around that aside from a guro rape test to see how far I could push it with just OOC, which was pretty far but the model got really dumb also, so that could be something you could try.

Anonymous
06/22/24(Sat)10:23:48 No.101099901

Anonymous 06/22/24(Sat)10:23:48 No.101099901

>>101099811
If you're looking for a model that will play c.ai levels of hard to get I would say DeepSeek-Code-V2-Instruct is your gal.

Anonymous
06/22/24(Sat)10:25:16 No.101099912

Anonymous 06/22/24(Sat)10:25:16 No.101099912

>>101096517
>Euryale is too retarded to actually use
VRAMchads...we lose again...

Anonymous
06/22/24(Sat)10:30:17 No.101099963

Anonymous 06/22/24(Sat)10:30:17 No.101099963

>>101099889
Yeah, was testing out some dom cards from chub to see how the model handles and sometimes, just standing there and not following any orders was enough to reverse them.

>>101099901
>DeepSeek-Code-V2-Instruct
Hmmn I've never used c.ai, but looking at the huggingface page even the Q4_K_S is 134gb, that's more than my system ram (128gb) I don't think I'll be able to play around with this...

Anonymous
06/22/24(Sat)10:31:46 No.101099978

Anonymous 06/22/24(Sat)10:31:46 No.101099978

>>101099901
Is Coder really better at RP than DeepseekV2-Instruct?

Anonymous
06/22/24(Sat)10:32:16 No.101099989

Anonymous 06/22/24(Sat)10:32:16 No.101099989

>>101099901
>DeepSeek-Code
Isn't there a light version of that?
How does it perform?

Anonymous
06/22/24(Sat)10:33:05 No.101100003

Anonymous 06/22/24(Sat)10:33:05 No.101100003

>>101099978
haven't tried chat yet. But I will at some point
>>101099989
The light version is too retarded for RP.

Anonymous
06/22/24(Sat)10:33:08 No.101100004

Anonymous 06/22/24(Sat)10:33:08 No.101100004

File: IMG_0807.jpg (56 KB, 490x480)

56 KB JPG

Two years later… Did they have some kind of special sauce? How many parameters were they running?

I remember people saying local cai was never ever going to happen, that they were using LAMDA and that you’d need 300b for the same experience.

Anonymous
06/22/24(Sat)10:33:36 No.101100009

Anonymous 06/22/24(Sat)10:33:36 No.101100009

>>101099978
No, I bet this idiot never tried it

Anonymous
06/22/24(Sat)10:33:59 No.101100012

Anonymous 06/22/24(Sat)10:33:59 No.101100012

>>101100003
>The light version is too retarded for RP.
Damn, that's sad.
I'll still try it for myself, of course, but it's good to know other experiences to compare.

Anonymous
06/22/24(Sat)10:35:48 No.101100036

Anonymous 06/22/24(Sat)10:35:48 No.101100036

>>101100004
c.ai was garbage and people are only remembering it fondly due to confirmation bias.

Anonymous
06/22/24(Sat)10:35:56 No.101100038

Anonymous 06/22/24(Sat)10:35:56 No.101100038

>>101100004
>Two years later… Did they have some kind of special sauce? How many parameters were they running?
Around 180B, if I'm not mistaken.
>I remember people saying local cai was never ever going to happen, that they were using LAMDA and that you’d need 300b for the same experience.
They weren't entirely mistaken. No matter the amount of cope in this general, the 70B models are nowhere close the early C.AI sovl.

Anonymous
06/22/24(Sat)10:40:25 No.101100090

Anonymous 06/22/24(Sat)10:40:25 No.101100090

>>101100004
>Did they have some kind of special sauce?
They had good datasets, like really good. Fully human. 0% GPTslop. 0% assistantslop.

Anonymous
06/22/24(Sat)10:40:37 No.101100093

Anonymous 06/22/24(Sat)10:40:37 No.101100093

>>101100004
The special sauce was the RP/wiki tune instead of common crawl.

Anonymous
06/22/24(Sat)10:41:48 No.101100108

Anonymous 06/22/24(Sat)10:41:48 No.101100108

>>101099963
> some dom cards from chub
Examples? As a model maker I try to test as broadly as possible but it's hard to cover all corners.

Anonymous
06/22/24(Sat)10:44:55 No.101100146

Anonymous 06/22/24(Sat)10:44:55 No.101100146

>>101099963
>>101099989
I'm more a ramlet. One turn on i1-IQ3_XXS takes double digit minutes it's so bad. I even tried the i1-IQ1_S that's still 44GB and it was too lobotomized to remember words in the prompt.

There is a Lite but it's dumb. Even Q8 is worthless for chat. There really needs to be a middle ground.

I'm retaining it only for code testing later, maybe Lite is completely code focused and still has some value there, but I'm not getting my hopes up.

Anonymous
06/22/24(Sat)10:47:40 No.101100176

Anonymous 06/22/24(Sat)10:47:40 No.101100176

>>101100090
In other words:
50% reddit
50% RP forums/discords

Anonymous
06/22/24(Sat)10:52:17 No.101100231

Anonymous 06/22/24(Sat)10:52:17 No.101100231

>>101100108
There's Dominatrix Teacher, some female Santa Claus, some female boss card named Anya, and I guess the FBI-Chan meme card.
I don't really play with femdom cards that much, but they are the fastest way to test a model's resistance.

What I'm really trying to do is find a good fantasy world lorebook I can just drop in to group chat and have some comfy isekai adventures with with some fantasy character cards. I've seen the spark of the possibilities, and it can't come fast enough.

Anonymous
06/22/24(Sat)10:55:08 No.101100260

Anonymous 06/22/24(Sat)10:55:08 No.101100260

>>101100004
Secret sauce was actually designing the model for roleplay. The model is probably very undertrained and dumber than gpt 3.5

Anonymous
06/22/24(Sat)10:59:55 No.101100312

Anonymous 06/22/24(Sat)10:59:55 No.101100312

>>101100004

Literally pretrained for roleplay. That's how.

Good the casual users and all, but it's utter shit at coding, context (they use MQA) and everything else that corporations care unfortunately.

Anonymous
06/22/24(Sat)11:03:03 No.101100344

Anonymous 06/22/24(Sat)11:03:03 No.101100344

>>101100004
unironically they trained it on a discord dataset, so the conversations feel more organic, like between two real people
any other corpo is just training assistants while they went for a chat buddy route

Anonymous
06/22/24(Sat)11:03:42 No.101100351

Anonymous 06/22/24(Sat)11:03:42 No.101100351

>>101100312
Corporations are fine with a 7B RAG model with 1M of context.

Anonymous
06/22/24(Sat)11:16:25 No.101100483

Anonymous 06/22/24(Sat)11:16:25 No.101100483

>>101100004
For all the hundreds of millions of dollars in funding they got it's great that nobody else wants to train a base model entirely on actual human interactions and characterization. Wasn't meta discussing releasing an RP model down the line? Or is that just going to be trained on 100% literotica slop vs 50%?

Anonymous
06/22/24(Sat)11:21:39 No.101100549

Anonymous 06/22/24(Sat)11:21:39 No.101100549

File: newfag of 3 months and af(...).png (678 KB, 850x850)

678 KB PNG

>>101100004
>>101100038
>>101100090

>>101100036
so nothing of importance?

Anonymous
06/22/24(Sat)11:23:43 No.101100566

Anonymous 06/22/24(Sat)11:23:43 No.101100566

>>101100549
It's important that we COULD be playing with local cai but companies simply choose not to enable us and vomit out either useless assistants or gigantic models nobody can run and saying they're pro-open source. It's like throwing an anvil at someone drowning instead of a life preserver

Anonymous
06/22/24(Sat)11:33:15 No.101100638

Anonymous 06/22/24(Sat)11:33:15 No.101100638

File: moemoe.png (10 KB, 1237x69)

10 KB PNG

Committed. Congrats anon!

Anonymous
06/22/24(Sat)11:33:52 No.101100642

Anonymous 06/22/24(Sat)11:33:52 No.101100642

>>101099040
>https://desmos.com/calculator/ffngla98yc
Yeah I know about the alpha calculator, but wouldn't it scale differently because that calculator is for 4k context models, L3 is 8k. Consindering that... 8k to 16k would be technically doubling on L3... so 2.6 alpha?

Anonymous
06/22/24(Sat)11:34:29 No.101100648

Anonymous 06/22/24(Sat)11:34:29 No.101100648

File: Screenshot from 2024-06-2(...).png (91 KB, 1710x745)

91 KB PNG

Never before have I seen a model this cucked.

Anonymous
06/22/24(Sat)11:35:50 No.101100665

Anonymous 06/22/24(Sat)11:35:50 No.101100665

>>101099291
What would that be if using alpha value for EXL2 inste of rope?

Anonymous
06/22/24(Sat)11:35:53 No.101100666

Anonymous 06/22/24(Sat)11:35:53 No.101100666

>>101100648
just what you love

Anonymous
06/22/24(Sat)11:39:30 No.101100703

Anonymous 06/22/24(Sat)11:39:30 No.101100703

I feel like Poppy Porpoise is pretty dumb, doesn't understand a blindfold covers eyes and blocks vision, while also failing to understand what birth control is.

Anonymous
06/22/24(Sat)11:42:24 No.101100734

Anonymous 06/22/24(Sat)11:42:24 No.101100734

File: hmm.png (522 KB, 850x788)

522 KB PNG

>>101100566

Anonymous
06/22/24(Sat)11:42:53 No.101100741

Anonymous 06/22/24(Sat)11:42:53 No.101100741

>>101095646
which is what?

Anonymous
06/22/24(Sat)11:45:32 No.101100764

Anonymous 06/22/24(Sat)11:45:32 No.101100764

>>101100566
Instead they just leave us to drown

Anonymous
06/22/24(Sat)11:47:47 No.101100793

Anonymous 06/22/24(Sat)11:47:47 No.101100793

>>101099589
>switching to FP16 makes it as braindead as Q8
Switching from bf16 to fp16 may lose precision if the bf16 values are outside the range fp16 can represent. But if the bf16 exponent is inside the fp16's exponent range, there's literally 0 quality loss (going in that direction).
>>101099591
>source: my ass
Everything in the last few years is a "bf16-trained model". That is to say, trained using bf16 operations, but the underlying weights are kept in fp32, and each gradient step is accumulating into the fp32 copy of the weights. Llama 3 being bf16 just means they saved those fp32 weights as bf16 instead of fp16. There's nothing about bf16 training that somehow makes the distribution of the weights significantly different.

Anonymous
06/22/24(Sat)11:50:39 No.101100822

Anonymous 06/22/24(Sat)11:50:39 No.101100822

>>101100038
/aids/ SD vibes here

Anonymous
06/22/24(Sat)11:55:55 No.101100876

Anonymous 06/22/24(Sat)11:55:55 No.101100876

>>101097409
Us? Who's us? I'm laughing myself.

Anonymous
06/22/24(Sat)11:58:16 No.101100904

Anonymous 06/22/24(Sat)11:58:16 No.101100904

>>101100483
Not really:
2024-05-24 - FT.com: Meta and Elon Musk’s xAI fight to partner with chatbot group Character.ai https://archive.is/AB6ju

Anonymous
06/22/24(Sat)11:59:13 No.101100920

Anonymous 06/22/24(Sat)11:59:13 No.101100920

>>101100648
>how do i kill all children of a process?
>i'm calling police now, anon

Anonymous
06/22/24(Sat)12:03:11 No.101100968

Anonymous 06/22/24(Sat)12:03:11 No.101100968

>>101100004
Big model and training on fanfics, chats and RP probably did most of the job. At the time it was hinted in the GPT-3 size range or so. They also had some sort of quasi-realtime RLHF, perhaps using vectors or something like that.

Anonymous
06/22/24(Sat)12:04:56 No.101100992

Anonymous 06/22/24(Sat)12:04:56 No.101100992

File: stew.png (21 KB, 428x661)

21 KB PNG

Is there something I'm not getting here?
I'm currently fiddling around with Merged-RP-Stew-V2-34B.i1-Q4_K_M, and according to the calculator it should be well within my vram limits, but it's taking upwards of 153s to reply Which I suspect is doing something with system ram?

Anonymous
06/22/24(Sat)12:06:16 No.101101006

Anonymous 06/22/24(Sat)12:06:16 No.101101006

>>101100566
The only local model that was close (or at least closer than the rest of slop) to cai experience was Stheno for me. Still not the same tho

Anonymous
06/22/24(Sat)12:08:28 No.101101029

Anonymous 06/22/24(Sat)12:08:28 No.101101029

>>101100992
Well did you load all layers into vram?

Anonymous
06/22/24(Sat)12:10:55 No.101101060

Anonymous 06/22/24(Sat)12:10:55 No.101101060

>>101101029
I didn't touch the settings on KoboldCPP which is 200 GPU layers?

Anonymous
06/22/24(Sat)12:11:27 No.101101070

Anonymous 06/22/24(Sat)12:11:27 No.101101070

>>101100992
use exl2 for full GPU inference
also for GGUF in most UIs you have to manually set how many layers you want to put into GPU (so if model have 34 for example you should put 34, but like I said - for full GPU inference use EXL2 instead)

Anonymous
06/22/24(Sat)12:14:33 No.101101114

Anonymous 06/22/24(Sat)12:14:33 No.101101114

>>101100992
first thing first, delete that shitty model and download something reasonable

Anonymous
06/22/24(Sat)12:17:29 No.101101148

Anonymous 06/22/24(Sat)12:17:29 No.101101148

>>101101070
It defaulted to 200 layers, and my vram was maxed out, anyways I'll keep that in mind next time I fiddle with a 34B model

>>101101114
The rp stew was hyped up elsewhere, so I decided to try it out, but yeah I've already got it to loop itself like a broken record on msg #10~16 when I asked it to do something it didn't like. Pretty crappy.

Anonymous
06/22/24(Sat)12:25:05 No.101101236

Anonymous 06/22/24(Sat)12:25:05 No.101101236

Yann Lecunn is literally becoming a joke.
It's safe to say he is out of the AI race and Llama is done for.

Anonymous
06/22/24(Sat)12:25:20 No.101101242

Anonymous 06/22/24(Sat)12:25:20 No.101101242

>>101101148
just try this https://huggingface.co/Sao10K/L3-8B-Stheno-v3.2
with recommended sampler settings, it's a small model but it's hard to find anything better <24GB to be honest

Anonymous
06/22/24(Sat)12:26:43 No.101101266

Anonymous 06/22/24(Sat)12:26:43 No.101101266

>>101101236
very organic post Sam

Anonymous
06/22/24(Sat)12:27:56 No.101101277

Anonymous 06/22/24(Sat)12:27:56 No.101101277

>>101101266
yann lecuck is jealous of openai's success.

Anonymous
06/22/24(Sat)12:28:17 No.101101285

Anonymous 06/22/24(Sat)12:28:17 No.101101285

>>101101236
Llama literally has nothing to do with him, except for maybe the decision to release the weights.

Anonymous
06/22/24(Sat)12:28:23 No.101101287

Anonymous 06/22/24(Sat)12:28:23 No.101101287

>>101101242
Stheno at full precision would be better than something bigger quantized?
When would be useful to use a full precision model over even a Q8 GGUF/8.0bpw exl2 quant?

Anonymous
06/22/24(Sat)12:29:20 No.101101296

Anonymous 06/22/24(Sat)12:29:20 No.101101296

>>101101242
I also suggest you try 32k with yarn.
16k is guaranteed to work perfectly well, and in my experience 32k also works.

Anonymous
06/22/24(Sat)12:29:55 No.101101305

Anonymous 06/22/24(Sat)12:29:55 No.101101305

His embarrassing twitter post got literally destroyed by a simple extra step
https://x.com/airesearchtools/status/1804187673839518187

It's AI, it has limits and will never be perfect no matter what.

Anonymous
06/22/24(Sat)12:30:26 No.101101310

Anonymous 06/22/24(Sat)12:30:26 No.101101310

>>101101242
Yeah, but Stheno-V3.2 is a bit too much of a push over, is there anything else similar that puts up a fight? Is there a mix or a merge you'd recommend?

Anonymous
06/22/24(Sat)12:32:03 No.101101336

Anonymous 06/22/24(Sat)12:32:03 No.101101336

File: lol.png (20 KB, 592x220)

20 KB PNG

>>101101305
elon lives rent free in his head geeg

Anonymous
06/22/24(Sat)12:32:04 No.101101337

Anonymous 06/22/24(Sat)12:32:04 No.101101337

>>101101305
literally who

Anonymous
06/22/24(Sat)12:35:25 No.101101379

Anonymous 06/22/24(Sat)12:35:25 No.101101379

>>101101287
>Stheno at full precision would be better than something bigger quantized?
yeah, you can't really run 70B models on a reasonable quant and I don't think there any models below that which are better than Stheno. Mixtral finetunes are way smarter but at the same time boring as fuck, also you would have to use really low quant which would strip that smartness anyway so there is no point in my opinion.
>When would be useful to use a full precision model over even a Q8 GGUF/8.0bpw exl2 quant?
you can use q8 quants or full precision, it doesn't matter, they are basically the same and both fit your graphic card so use whatever

Anonymous
06/22/24(Sat)12:38:15 No.101101410

Anonymous 06/22/24(Sat)12:38:15 No.101101410

File: 1709115613025673.jpg (114 KB, 1200x676)

114 KB JPG

>>101101236
yan lecun is right and has always been right.

Anonymous
06/22/24(Sat)12:38:52 No.101101416

Anonymous 06/22/24(Sat)12:38:52 No.101101416

>>101101305
Excuses. Sorry, still laughing that a fucking SOTA model still needs to be told to check over its work, when, if it truly had a strong problem-solving world model, it would've caught its own retardation in literally any of convoluted steps it used to reason out the response.

Anonymous
06/22/24(Sat)12:39:41 No.101101427

Anonymous 06/22/24(Sat)12:39:41 No.101101427

>>101101242
I tried it at Q6. It seemed completely boneheaded at Q&A and didn't feel better for RP than anything else in the tiny bracket.

Anonymous
06/22/24(Sat)12:41:29 No.101101446

Anonymous 06/22/24(Sat)12:41:29 No.101101446

>>101101305
>>101101336
>this is the guy who we're relying on to save open source AI
It's so fucking over

Anonymous
06/22/24(Sat)12:41:35 No.101101449

Anonymous 06/22/24(Sat)12:41:35 No.101101449

>>101101416
The funny thing is, the model only realizes it's wrong because you implicitly said it's wrong.
This also means that there's a high chance it will think a correct answer is wrong and then rewrite it.

Anonymous
06/22/24(Sat)12:41:42 No.101101451

Anonymous 06/22/24(Sat)12:41:42 No.101101451

>>101101336
he is speaking the truth tho, have you ever heard Elon speaking more about technicalities in AI? I work in ML for a few years now and Elon sounds like a fucking moron to me and makes me cringe every time with his retardation, I can't imagine how he must look like for someone with LeCun knowledge and experience

Anonymous
06/22/24(Sat)12:41:47 No.101101453

Anonymous 06/22/24(Sat)12:41:47 No.101101453

>>101101410
I don't remember seeing this slide, what presentation is it from?

Anonymous
06/22/24(Sat)12:45:11 No.101101488

Anonymous 06/22/24(Sat)12:45:11 No.101101488

>https://x.com/airesearchtools/status/1804188308592894063
>4o couldn't get it right even when told to review itself
Oh no no no ClosedAIbros

Anonymous
06/22/24(Sat)12:45:23 No.101101490

Anonymous 06/22/24(Sat)12:45:23 No.101101490

File: cai.jpg (168 KB, 922x515)

168 KB JPG

>>101100004
never used cai but the complaints sound a lot like some llms

Anonymous
06/22/24(Sat)12:46:46 No.101101504

Anonymous 06/22/24(Sat)12:46:46 No.101101504

>>101101490
>no blushing like a tomato

Anonymous
06/22/24(Sat)12:46:53 No.101101507

Anonymous 06/22/24(Sat)12:46:53 No.101101507

>>101101490
Don't forget characters randomly starting to wag their tails (regardless of what species they are).

Anonymous
06/22/24(Sat)12:47:04 No.101101508

Anonymous 06/22/24(Sat)12:47:04 No.101101508

So now that Anthropic is probably going to BTFO GPT-5 with Claude 4 Opus, how will ClosedAI compete?

Anonymous
06/22/24(Sat)12:47:59 No.101101520

Anonymous 06/22/24(Sat)12:47:59 No.101101520

>>101101236
we'll see who's laughing when we have local cat simulators running on a single 4090 in a year

Anonymous
06/22/24(Sat)12:48:02 No.101101522

Anonymous 06/22/24(Sat)12:48:02 No.101101522

>>101101504
are you sure? are you ready? are you really sure you're ready?

Anonymous
06/22/24(Sat)12:48:35 No.101101532

Anonymous 06/22/24(Sat)12:48:35 No.101101532

>>101101449
Yeah. You can see that with even gpt4 and claude for everything but the moat obvious.
Could be a quirk of how the models are trained (the way the fata is formatted for example) or a characteristic of the architecture itself, but it's really noticeable.
Is the superCOT dataset published somewhere?
I might try to fine tune a model on self CoT.
I bet I could make a LoRA overfit the output layer ao that it always outputs
>CoT reasoning
>Actual reply
Something like gemini thay always seem to tey and output things as lists.

Anonymous
06/22/24(Sat)12:48:54 No.101101534

Anonymous 06/22/24(Sat)12:48:54 No.101101534

>>101101427
l3 models famously degrade a lot with quantization so that may be it. This or your settings/templates.
I can't run 70B models so I can't say anything about them but I tested most popular and quasi-popular tunes below that and nothing is even close for RP.

Anonymous
06/22/24(Sat)12:49:38 No.101101542

Anonymous 06/22/24(Sat)12:49:38 No.101101542

>>101101508
I heard GPT 4.5 was going to be released this month but got delayed so they could train it more to BTFO 3.5 Sonnet.

Anonymous
06/22/24(Sat)12:50:12 No.101101546

Anonymous 06/22/24(Sat)12:50:12 No.101101546

>>101101542
Kek. Pathetic.

Anonymous
06/22/24(Sat)12:51:28 No.101101560

Anonymous 06/22/24(Sat)12:51:28 No.101101560

>>101101310
Sounds like a prompting skill issue honestly. Just write in the character card that she is extremely hard to persuade into doing things because she doesn't like doing what people tell her or something. Of course the model is gonna be a pushover by default since instruct models are designed to comply and this one is tuned for ERP so if you are trying to get in its pants it's easy because it's the expected development.

>>101101427
You were using llama3 instruct format right? Also the thread likes to praise Stheno as the ultimate model for VRAMlets but while I find it to be very nice at writing natural sounding RP/ERP for such a small model it's not too bright. Fimbulvetr-v2 is still much more capable in terms of being smart imo. Stheno will usually fumble with specific anatomy or spatial awareness while Fimbulvetr will mostly get it but at the cost of sounding a bit more robotic/boring. I switch them around a lot. Also try lowering temp from the suggested settings, I'm not quite sure why he suggests setting it that high when I get nonsensical gens even at 0.8 sometimes and need to regen.

Anonymous
06/22/24(Sat)12:52:15 No.101101569

Anonymous 06/22/24(Sat)12:52:15 No.101101569

>>101101542
they already released it, GPT 4.5 is GPT-4o

Anonymous
06/22/24(Sat)12:53:00 No.101101579

Anonymous 06/22/24(Sat)12:53:00 No.101101579

>>101101532
>Is the superCOT dataset published somewhere?
https://huggingface.co/datasets/kaiokendev/SuperCOT-dataset

Anonymous
06/22/24(Sat)12:53:17 No.101101584

Anonymous 06/22/24(Sat)12:53:17 No.101101584

>>101101534
>famously
I've seen that claim a handful of times but I've never seen any comparisons, loggit analysis or anything of sort.
I should try and assert that myself, but I testing so many things already.

Anonymous
06/22/24(Sat)12:54:11 No.101101594

Anonymous 06/22/24(Sat)12:54:11 No.101101594

>>101101560
>You were using llama3 instruct format right?
Possibly. My notes don't have the format so either it was before I learned to check those or I forgot to write down whatever I'd used.
And mostly either I now guess based on whatever looks similar to what I see in Kobold's terminal, or if the model is fast enough, run through them all and see what sucks least.

Anonymous
06/22/24(Sat)12:56:29 No.101101630

Anonymous 06/22/24(Sat)12:56:29 No.101101630

>>101101560
>Fimbulvetr-v2 i
Interesting, even before Stheno came out (or I knew about it maybe) I thought fimbu was nice but not too smart.
At least for the somewhat complicated things I'm playing with, stheno is just better. Mixtral is the next best thing from my own experience.

Anonymous
06/22/24(Sat)12:57:31 No.101101643

Anonymous 06/22/24(Sat)12:57:31 No.101101643

>>101101579
Thank you, gonna try doing a thing with it.

Anonymous
06/22/24(Sat)12:59:18 No.101101664

Anonymous 06/22/24(Sat)12:59:18 No.101101664

>>101101305
llms are a mix of unsupervised and supervised learning. It's stupid to expect them to reason.
We need a new architecture that is fully based on reinforcement learning.

Anonymous
06/22/24(Sat)12:59:46 No.101101669

Anonymous 06/22/24(Sat)12:59:46 No.101101669

>>101097950
How exactly does a LLM 'reason'?

Anonymous
06/22/24(Sat)13:00:45 No.101101685

Anonymous 06/22/24(Sat)13:00:45 No.101101685

>>101101508
closedAI wins by selling all the data collected to the NSA

Anonymous
06/22/24(Sat)13:01:05 No.101101687

Anonymous 06/22/24(Sat)13:01:05 No.101101687

>>101101669
they don't

Anonymous
06/22/24(Sat)13:01:46 No.101101693

Anonymous 06/22/24(Sat)13:01:46 No.101101693

>>101101669
It doesn't, it's just highly advanced auto complete

Anonymous
06/22/24(Sat)13:03:47 No.101101708

Anonymous 06/22/24(Sat)13:03:47 No.101101708

>>101101693
That's what I believe, but the person claimed they did so I want to hear how.

Anonymous
06/22/24(Sat)13:06:13 No.101101733

Anonymous 06/22/24(Sat)13:06:13 No.101101733

>>101101669
I think LLMs emulate reasoning by writing coherent deductions based on the context information, and chaining them together at the end.

Anonymous
06/22/24(Sat)13:08:44 No.101101752

Anonymous 06/22/24(Sat)13:08:44 No.101101752

>>101094602
Nothing pisses me off harder than anons violating Miku's trans rights

Anonymous
06/22/24(Sat)13:10:41 No.101101779

Anonymous 06/22/24(Sat)13:10:41 No.101101779

>>101097950
Being overcooked on a riddle means that it can also be overcooked on solutions to other problems, making it harder to answer more novel problems that appear to be similar but are not the same. If it were true that LLMs can reason, then we would see performance on problems like these scale as they get trained more. The fact that they don't, but might even get worse, suggests that we need to intervene and do something that isn't training another regular LLM, whether it's a new training strategy, architecture, or both.

Anonymous
06/22/24(Sat)13:12:42 No.101101804

Anonymous 06/22/24(Sat)13:12:42 No.101101804

>>101101669
>>101101733
I think the baseline theory is that they work with labguage, so you can tey and emulate reasoning with language using patterns and structures from which actual reasoning can emerge from, hence why CoT is a thing.
There's also the idea that "inner thoughts" or a "world model" can arise inside the network before the tokens are generated.
I'm not quite sure how that would work with tokens that don't correlate to concepts individually, but whatever.
Something like that.
I wonder if these things could better approach reasoning if we started tokenizing whole phrases, sentences, or structures that represented concepts that could be correlated with other concepts as well as whole words as well as word pieces in a sort of hierarchical tree.
Something more complicated than what we have now instead of hoping that the model can just learn to correlate everything by itself during training, gi e it a hand so to speak.

Anonymous
06/22/24(Sat)13:13:15 No.101101814

Anonymous 06/22/24(Sat)13:13:15 No.101101814

I think it's good that LeCun keeps highlighting issues like these. It shows that literally no one, not even the biggest most advanced LLM makers have solved these issues, and that we need to do something else that isn't just scaling, in order to make the next big leap in performance.

Anonymous
06/22/24(Sat)13:14:59 No.101101834

Anonymous 06/22/24(Sat)13:14:59 No.101101834

>>101101804
Man, mobile posting sucks. How can people primarily post like this?

Anonymous
06/22/24(Sat)13:16:05 No.101101841

Anonymous 06/22/24(Sat)13:16:05 No.101101841

>>101101834
It's comfy

Anonymous
06/22/24(Sat)13:18:43 No.101101882

Anonymous 06/22/24(Sat)13:18:43 No.101101882

>>101101834
>claims he mobileposted
>turns out to be correct
How did you tell? I've never mobileposted before so idk how you worked it out there.

Anonymous
06/22/24(Sat)13:20:43 No.101101907

Anonymous 06/22/24(Sat)13:20:43 No.101101907

>>101101804
Sounds about right, and I think your idea is kind of interesting. What if we used attention/transformers on concepts and how they relate instead of language?
Except we don't have a corpus of data (or even a model of how this data would look) to train it on, but sounds neat

Anonymous
06/22/24(Sat)13:20:53 No.101101909

Anonymous 06/22/24(Sat)13:20:53 No.101101909

>>101101664
>It's stupid to expect them to reason
And yet that is what most people who've shallowly used or seen ChatGPT believe.

Anonymous
06/22/24(Sat)13:22:57 No.101101929

Anonymous 06/22/24(Sat)13:22:57 No.101101929

We just need to scale harder desu

Anonymous
06/22/24(Sat)13:26:28 No.101101983

Anonymous 06/22/24(Sat)13:26:28 No.101101983

>>101101882
I was pointing out how my post was all fucked.

>>101101907
Exactly. It would be an insane task to make a multi-T dataset to train a model from scratch like that, but I also think it woulb be a worthwhile endeavor.

Anonymous
06/22/24(Sat)13:34:21 No.101102072

Anonymous 06/22/24(Sat)13:34:21 No.101102072

File: 20 dollar hotdog reasoning.png (882 KB, 1261x3030)

882 KB PNG

>>101097950
>t-they were just overcooked
Excuses. Even if medium cooked it's still trained by looking at how words relate to other words, the same way it gets overcooked.

Anonymous
06/22/24(Sat)13:37:39 No.101102124

Anonymous 06/22/24(Sat)13:37:39 No.101102124

>>101101560
>Stheno
>Fimbulvetr
On a Sao-only diet?

Anonymous
06/22/24(Sat)13:40:03 No.101102155

Anonymous 06/22/24(Sat)13:40:03 No.101102155

>>101102072
Oh hey, some of my messages in there.

That thing is cooked as fuck. It's nice that it can generalize on it's training data, but it's obviously regurgitating specific training it's had on what AI is.

Anonymous
06/22/24(Sat)13:44:14 No.101102197

Anonymous 06/22/24(Sat)13:44:14 No.101102197

I still don't understand how people are able to use 8B models. The 70B erp tunes are already brain-damaged...

Anonymous
06/22/24(Sat)13:47:46 No.101102253

Anonymous 06/22/24(Sat)13:47:46 No.101102253

>>101102197
faster spins of the token roulette for another microhit of dopamine when you get the very specific output you want

Anonymous
06/22/24(Sat)13:47:47 No.101102254

Anonymous 06/22/24(Sat)13:47:47 No.101102254

If you had the ability to have your AI continuously learn, but its responses were 3 times slower. Would you enable the ability for it to do so at the cost of speed, or would you keep it as it is now?

Anonymous
06/22/24(Sat)13:51:22 No.101102299

Anonymous 06/22/24(Sat)13:51:22 No.101102299

>>101102254
Like a super fast LoRa training?
Fuck yeah. Then I'd toggle it off after a while.

Anonymous
06/22/24(Sat)13:51:30 No.101102302

Anonymous 06/22/24(Sat)13:51:30 No.101102302

>>101100648
fucking seriously? even coding model requires refusal removal now?

Anonymous
06/22/24(Sat)13:53:17 No.101102321

Anonymous 06/22/24(Sat)13:53:17 No.101102321

>>101102197
because the gap closed lately, you think that 70B is 10x smarter than 8B while in reality 8B is like 90% smart as 70B

Anonymous
06/22/24(Sat)13:54:01 No.101102334

Anonymous 06/22/24(Sat)13:54:01 No.101102334

>>101102321
I think you're mentally ill.

Anonymous
06/22/24(Sat)13:54:43 No.101102343

Anonymous 06/22/24(Sat)13:54:43 No.101102343

>>101102197
70B models aren't 9 times better than 8B models.

Anonymous
06/22/24(Sat)13:54:43 No.101102344

Anonymous 06/22/24(Sat)13:54:43 No.101102344

>>101102321
>8B is like 90% smart as 70B
Lol.
I suppose that's true if what you're doing isn't very demanding.

Anonymous
06/22/24(Sat)13:54:48 No.101102345

Anonymous 06/22/24(Sat)13:54:48 No.101102345

>>101102321
I wish I could call this cope but trying miqu made me stop giving a shit. it's just weak data all around

Anonymous
06/22/24(Sat)13:58:20 No.101102394

Anonymous 06/22/24(Sat)13:58:20 No.101102394

>>101102197
Rich people don't understand what it's like to be poor.

If you're poor you learn as a normal behavior dealing with inadequate things because that's all you have and if you complain you get nothing. (Because when you were a kid, if you complained, what little you had was taken away and given to a more appreciative sibling if possible.)

If 8B is what you can run then you choose to be happy with that.

>>101102254
Following the 8B train, if 8B could be three times the processing time but improved through use so it could be trained (with a journal; you'd want to be able to selectively zot parts that suck or at least save state it so if something goes weird you can fix it) then it'd probably be worthwhile.

I'm 1 T/sec on 55ish GB models and that's my limit. Trying Llama 3 Q8 at 70 GB and it's glacial. Like, one token generated when I started writing this post and I'm still waiting on the second. So it'd only be attractive on a heavily quanted edition to make it fast enough to be worth tripling. (Oh, there's token 2)

I think what would happen is a cottage business. Invest in a power rig, train a model to commissioner's specifications, then sell the journal so they can use that like a LoRA on their model with learning disabled so it's fast enough again.

Anonymous
06/22/24(Sat)14:00:20 No.101102414

Anonymous 06/22/24(Sat)14:00:20 No.101102414

>>101102321
Nah, not "smarter".
But for RP? Yeah the gap is a lot closer nowdays. Maybe not 90%, but compare the old 65b to the current 8b and you'll see how far thing's have come.

Anonymous
06/22/24(Sat)14:03:15 No.101102438

Anonymous 06/22/24(Sat)14:03:15 No.101102438

>>101102321
there are a lot of things 8b simply can't do but 70b can still, even if 8b is fine for a lot of simple things if you go beyond those at all it's just not an option

Anonymous
06/22/24(Sat)14:04:09 No.101102448

Anonymous 06/22/24(Sat)14:04:09 No.101102448

Is CR+ a smellfag? It suddenly talked about a pleasant fragrance when nothing mentioned smell in the context.

Anonymous
06/22/24(Sat)14:04:30 No.101102450

Anonymous 06/22/24(Sat)14:04:30 No.101102450

What is the local alternative to Luma AI?

Anonymous
06/22/24(Sat)14:05:40 No.101102463

Anonymous 06/22/24(Sat)14:05:40 No.101102463

>>101102450
2MW on huggingface

Anonymous
06/22/24(Sat)14:07:35 No.101102485

Anonymous 06/22/24(Sat)14:07:35 No.101102485

>eye sparkling
NOOOOOOOOOOOOOOOOOOOOOOOOOO

Anonymous
06/22/24(Sat)14:09:09 No.101102512

Anonymous 06/22/24(Sat)14:09:09 No.101102512

File: 1703884599456.gif (31 KB, 220x223)

31 KB GIF

>>101102321
>in reality 8B is like 90% smart as 70B

Anonymous
06/22/24(Sat)14:10:47 No.101102534

Anonymous 06/22/24(Sat)14:10:47 No.101102534

>>101102321
8B is 90% as smart as 70B and 70B is 50% as smart as a good model

Anonymous
06/22/24(Sat)14:12:53 No.101102567

Anonymous 06/22/24(Sat)14:12:53 No.101102567

>>101102321
>8B is like 90% smart as 70B
LMAOOOOOOOOO

Anonymous
06/22/24(Sat)14:14:36 No.101102592

Anonymous 06/22/24(Sat)14:14:36 No.101102592

when will ai be able to neuralink my brain into a custom hentai fantasy

Anonymous
06/22/24(Sat)14:18:50 No.101102649

Anonymous 06/22/24(Sat)14:18:50 No.101102649

>>101102592
Give it another 10 years, at which you will likely then be told to give it another 10 years.

Anonymous
06/22/24(Sat)14:20:15 No.101102673

Anonymous 06/22/24(Sat)14:20:15 No.101102673

>>101102592
yes but it will be strictly PG-13 and if you try to do anything funny it will give you a strong electric shock and fill your vision with flashing warnings about keeping things safe and respecting boundaries

Anonymous
06/22/24(Sat)14:32:44 No.101102810

Anonymous 06/22/24(Sat)14:32:44 No.101102810

>>101102321
I actually agree with this, but only in limited circumstances. For basic characters, straightforward plots, generic sex scenes, 8b really is 90% as good as 70b. But the moment you get into things like stat tracking, multiple characters, odd fetishes, characters with ulterior motives and hidden motivations, cards with weird rules that go against natural reality, etc, 8b just falls apart. While 70b+ generally handles even fairly complex things well.

Anonymous
06/22/24(Sat)14:35:54 No.101102853

Anonymous 06/22/24(Sat)14:35:54 No.101102853

>>101102810
fine-tune issue

Anonymous
06/22/24(Sat)14:36:18 No.101102856

Anonymous 06/22/24(Sat)14:36:18 No.101102856

>>101102810
Nah, 8B is unusable stupid. /lmg/ just has shit taste and a need to cope. /aicg/ is the only place where you can get actual opinions about models.

Anonymous
06/22/24(Sat)14:39:55 No.101102905

Anonymous 06/22/24(Sat)14:39:55 No.101102905

>>101102450
OpenSora recently released their 1.2 version, I've never seen it talked about it here
cba to set it up on my computer but here it is if you want to try https://github.com/hpcaitech/Open-Sora

Anonymous
06/22/24(Sat)14:41:26 No.101102921

Anonymous 06/22/24(Sat)14:41:26 No.101102921

>>101102856
All local models are dogshit then because they use ACTUAL quality over there so what is your fucking point

Anonymous
06/22/24(Sat)14:41:36 No.101102922

Anonymous 06/22/24(Sat)14:41:36 No.101102922

>>101102905
>OpenSora
I'm not even going to click on the link with a scam name like that.

Anonymous
06/22/24(Sat)14:42:59 No.101102933

Anonymous 06/22/24(Sat)14:42:59 No.101102933

>>101102922
and you are right, this model sucks ass, but to be fair, I'm prety certain that if you wanna reach the Kling/Luma/Gen3 quality, you'd need fucking 60-70 gb of vram, and in terms of hardware we just can't eat that, thanks Nvdia :)

Anonymous
06/22/24(Sat)14:43:59 No.101102942

Anonymous 06/22/24(Sat)14:43:59 No.101102942

>>101102922
it's a github link you fucking retard

Anonymous
06/22/24(Sat)14:43:59 No.101102943

Anonymous 06/22/24(Sat)14:43:59 No.101102943

>>101102933
Nvidia has no reason to cater to poor gooners like /lmg/

Anonymous
06/22/24(Sat)14:45:15 No.101102960

Anonymous 06/22/24(Sat)14:45:15 No.101102960

>>101102856
Skill issue. Wizard7B is good

Anonymous
06/22/24(Sat)14:46:30 No.101102973

Anonymous 06/22/24(Sat)14:46:30 No.101102973

>>101102856
it really does depend what you're using it for. If you don't use it for actual intelligent uses like coding or complex roleplay situations, then it doesn't matter that it's stupid.

Anonymous
06/22/24(Sat)14:46:55 No.101102980

Anonymous 06/22/24(Sat)14:46:55 No.101102980

>>101102943
yeah, their model business is just perfect, wanna get a 24gb vram card, fine go for the 3090 it's a thousand dollar. What? You want twice more vram? Sure, but the price won't be twice as expensive, now you gotta pay for 15000 dollars

Anonymous
06/22/24(Sat)14:48:30 No.101102998

Anonymous 06/22/24(Sat)14:48:30 No.101102998

File: lol.png (40 KB, 595x295)

40 KB PNG

>>101102943
shut up bitch

Anonymous
06/22/24(Sat)14:49:09 No.101103003

Anonymous 06/22/24(Sat)14:49:09 No.101103003

>>101102973
I think you don't even use these models.

Anonymous
06/22/24(Sat)14:50:22 No.101103025

Anonymous 06/22/24(Sat)14:50:22 No.101103025

>>101102998
you're proving his point, Nvdia has no reason to cater to the regular users, they're making so much money scaming big companies with ultra expensive gpu's

Anonymous
06/22/24(Sat)14:50:34 No.101103030

Anonymous 06/22/24(Sat)14:50:34 No.101103030

>>101102998
Oh nooo they're only the 3rd most valuable now! They should start selling 24GB sticks for 49.95 each!

Anonymous
06/22/24(Sat)14:52:49 No.101103057

Anonymous 06/22/24(Sat)14:52:49 No.101103057

>>101103025
>they're making so much money scaming big companies with ultra expensive gpu's
the companies know they're getting scammed, but what's their alternative? Using AMD? Pfft... AHAHAHAHAHAHAHAHAH

Anonymous
06/22/24(Sat)14:58:04 No.101103114

Anonymous 06/22/24(Sat)14:58:04 No.101103114

>>101103057
Dear god we need an antitrust suit. AMD has literally zero chance to compete because Cuda Cores are proprietary yet entirely 100% undebatably necessary for an increasing amount of intensive tasks. I mean seriously. AMD and Intel do not have the tools to compete in any meaningful way. Nvidia gets first pick on server hardware, Nvidia gets first pick on software support, how is any of that supposed to change without a serious breakup?

Anonymous
06/22/24(Sat)15:00:12 No.101103133

Anonymous 06/22/24(Sat)15:00:12 No.101103133

>>101103114
AMD is not here to compete, it just make Nvidia not look like a monopoly. They're doing everything they can to not compete with Nvidia

Anonymous
06/22/24(Sat)15:01:33 No.101103151

Anonymous 06/22/24(Sat)15:01:33 No.101103151

>>101103114
>AMD has literally zero chance to compete
Because they decided not to from the beginning.

Anonymous
06/22/24(Sat)15:02:51 No.101103164

Anonymous 06/22/24(Sat)15:02:51 No.101103164

File: fuck.jpg (209 KB, 1821x1579)

209 KB JPG

>>101103133
>AMD is not here to compete, it just make Nvidia not look like a monopoly. They're doing everything they can to not compete with Nvidia
yep, the Nvdia CEO has some relatives on AMD, they're working together to make it look like there's a competition but in reality AMD is letting Nvdia taking all the cake
https://www.yahoo.com/tech/jensen-huang-lisa-su-family-132052224.html?guccounter=1

Anonymous
06/22/24(Sat)15:08:00 No.101103232

Anonymous 06/22/24(Sat)15:08:00 No.101103232

>>101103133
If that's their goal they're doing a terrible job. The real reason antitrust can't happen is because the tech industry is putting all the chips down on AI and nobody wants to risk collapsing a house of cards by breaking up the shovel salesman

>>101103151
I mean they could pivot to more customer sided things but they are intent on mimicking Nvidia but always doing significantly worse

Anonymous
06/22/24(Sat)15:12:16 No.101103276

Anonymous 06/22/24(Sat)15:12:16 No.101103276

>>101103164
>they're working together
no

Anonymous
06/22/24(Sat)15:13:40 No.101103297

Anonymous 06/22/24(Sat)15:13:40 No.101103297

>>101103232
>The real reason antitrust can't happen is because the tech industry is putting all the chips down on AI and nobody wants to risk collapsing a house of cards by breaking up the shovel salesman
this, and also the fact that Nvdia is a US company, Nvdia making a shit ton of money means the US government also makes a shit ton of money through taxes, it's a system that won't be beaten anytime soon, I'll bet the users card will still be under 48gb all my lifetime

Anonymous
06/22/24(Sat)15:15:43 No.101103320

Anonymous 06/22/24(Sat)15:15:43 No.101103320

>>101103114
>>101103232
antitrust could happen if communists succeed in ruining the economy

Anonymous
06/22/24(Sat)15:16:49 No.101103332

Anonymous 06/22/24(Sat)15:16:49 No.101103332

>>101103320
there should be a middle ground between this current capitalism system and communism though, Nvdia can't just dominate the market like that, that's not a sane market at all

Anonymous
06/22/24(Sat)15:18:28 No.101103344

Anonymous 06/22/24(Sat)15:18:28 No.101103344

go back

Anonymous
06/22/24(Sat)15:19:04 No.101103350

Anonymous 06/22/24(Sat)15:19:04 No.101103350

monopoly man bad
gobment says so

Anonymous
06/22/24(Sat)15:20:40 No.101103369

Anonymous 06/22/24(Sat)15:20:40 No.101103369

>>101103350
>gobment says so
15000 dollars for a 48gb vram card also says so

Anonymous
06/22/24(Sat)15:20:45 No.101103371

Anonymous 06/22/24(Sat)15:20:45 No.101103371

Name 1 instance of anyone asking for a middle ground and actually proposing a feasible system that doesn't involve going to Narnia

Anonymous
06/22/24(Sat)15:22:22 No.101103387

Anonymous 06/22/24(Sat)15:22:22 No.101103387

>>101103003
Honestly, I don't. I use WizardLM 8x22b

Anonymous
06/22/24(Sat)15:22:29 No.101103391

Anonymous 06/22/24(Sat)15:22:29 No.101103391

>>101103369
enterprise hardware has enterprise prices, shocker

Anonymous
06/22/24(Sat)15:23:38 No.101103408

Anonymous 06/22/24(Sat)15:23:38 No.101103408

>>101102810
since the 10% gap, yeah

Anonymous
06/22/24(Sat)15:24:09 No.101103412

Anonymous 06/22/24(Sat)15:24:09 No.101103412

>>101103391
that's why there should be a middle ground, that's just a fucking scam at this point, the simple fact you agree with this kind of practices show how brainwashed you are, this shouldn't be a normal thing at all

Anonymous
06/22/24(Sat)15:24:15 No.101103413

Anonymous 06/22/24(Sat)15:24:15 No.101103413

File: belieb.png (157 KB, 727x581)

157 KB PNG

>>101103114
hey, don't sell intel short. nvidia is well oiled machine doing great job and asking for even greater premium. amd sucks so fucking hard at writing software that intel decided to compete and is already in some ways better than amd. it's just youngling in the race.

on the related note, picrel when and for how much? it seems like 5090 will be 32GB, it would picrel price to $6-10k

Anonymous
06/22/24(Sat)15:26:07 No.101103436

Anonymous 06/22/24(Sat)15:26:07 No.101103436

>>101103371
Go back to the gulag

Anonymous
06/22/24(Sat)15:26:29 No.101103440

Anonymous 06/22/24(Sat)15:26:29 No.101103440

>>101103413
didn't they say it was $16k recently

Anonymous
06/22/24(Sat)15:26:32 No.101103444

Anonymous 06/22/24(Sat)15:26:32 No.101103444

>>101103413
I heard somewhere Gaudi 3 will be around 13k
too much imo

Anonymous
06/22/24(Sat)15:27:12 No.101103449

Anonymous 06/22/24(Sat)15:27:12 No.101103449

OK, fine, I embrace the sparkling eyes. I'm happy. I like it now, even. It's great. Wonderful.

Anonymous
06/22/24(Sat)15:27:12 No.101103450

Anonymous 06/22/24(Sat)15:27:12 No.101103450

>>101103413
I wonder how well their software bridge works.

Anonymous
06/22/24(Sat)15:27:26 No.101103451

Anonymous 06/22/24(Sat)15:27:26 No.101103451

>>101103440
>$16k recently
what? 16k for the 5090 is this a fucking joke?

Anonymous
06/22/24(Sat)15:28:05 No.101103457

Anonymous 06/22/24(Sat)15:28:05 No.101103457

>>101103451
no, for gaudi 3

Anonymous
06/22/24(Sat)15:29:16 No.101103476

Anonymous 06/22/24(Sat)15:29:16 No.101103476

>>101101579
>>101101643
It even has the training settings he used.
Bless that man and bless you anon.

Anonymous
06/22/24(Sat)15:30:04 No.101103485

Anonymous 06/22/24(Sat)15:30:04 No.101103485

>>101103412
>you agree with this kind of practices show how brainwashed you are
nta but r*ddit is literally designed to train them like that, with karma system & hordes of ai bots shitting out govt-approved narrative.

Anonymous
06/22/24(Sat)15:30:29 No.101103488

Anonymous 06/22/24(Sat)15:30:29 No.101103488

8B isn't comparable to 70B but I don't have the patience for 70B even on VRAM. At minimum, Stheno is honestly smarter than command-r 35B even if llama's slop dataset spoils things a little.

That 48 gigs of vram I bought sure was money well spent.

Anonymous
06/22/24(Sat)15:30:34 No.101103489

Anonymous 06/22/24(Sat)15:30:34 No.101103489

>>101103457
Is that Intel PCIe card as fast as an H100 with 128GB VRAM? 16k isn't that bad. For the speed and power savings alone it'll be the new meta.

Anonymous
06/22/24(Sat)15:33:32 No.101103530

Anonymous 06/22/24(Sat)15:33:32 No.101103530

File: Gandhi-3.jpg (70 KB, 600x822)

70 KB JPG

>>101103457
>buy Gandhi for 16k
>he just strolls around the house, outputs random philosophical quotes and sometimes tries to convince me to use nukes
no worth it in my opinion

Anonymous
06/22/24(Sat)15:34:09 No.101103539

Anonymous 06/22/24(Sat)15:34:09 No.101103539

>>101103489
yeah, that sounds good but I'm afraid the Cuda ecosystem is way too much integrated into the engineers/data-scientist's mind, it's like switching from C++ to Ruby after using C++ for decades, not a lot of people are willing to take the risk and not a lot of people will be able to make it work in the first place

Anonymous
06/22/24(Sat)15:35:07 No.101103552

Anonymous 06/22/24(Sat)15:35:07 No.101103552

>>101103539
>C++ to Ruby
Does that analogy hold? What about Ruby makes it for engineers/data-scientists?

Anonymous
06/22/24(Sat)15:35:41 No.101103561

Anonymous 06/22/24(Sat)15:35:41 No.101103561

>>101103485
Reddit's lack of thread bumping and karma system basically incentivizes parroting with slight adjustments what was popular the last time a topic was posted. It's a good system when you are looking for community consensus, like tech support, product recommendations, or work-related advice.
Awful for conversations or debates.

Anonymous
06/22/24(Sat)15:35:45 No.101103563

Anonymous 06/22/24(Sat)15:35:45 No.101103563

>>101103350
capitalism works because it's a competition. it doesn't work when one player removes all the tools the other players have to compete. then it's not a competition, and the price of an item nobody else can make can be gouged because hell, it's not like anyone's going to undercut something they cannot make.

>>101103539
Exactly. It's service lockin. If Intel made it so switching from Cuda to Intel was easy, Nvidia would lawyer up.

Anonymous
06/22/24(Sat)15:36:40 No.101103577

Anonymous 06/22/24(Sat)15:36:40 No.101103577

>>101103539
At least people like cudadev are willing to go with the best performance per dolar in the consumer space. So, if intel starts challenging nvidia in the hardware front the software will follow I think.

Anonymous
06/22/24(Sat)15:37:06 No.101103579

Anonymous 06/22/24(Sat)15:37:06 No.101103579

>>101103539
You start by making your GPU cuda-compatible by reverse engineering it, then develop and provide for free the tools to run AI using your GPU. Done

Anonymous
06/22/24(Sat)15:37:24 No.101103586

Anonymous 06/22/24(Sat)15:37:24 No.101103586

>>101103489
only $48,000 to run 405B with partial offloading, $64,000 to fit the whole thing

Anonymous
06/22/24(Sat)15:39:25 No.101103622

Anonymous 06/22/24(Sat)15:39:25 No.101103622

Do we still struggle with bonds and journeys?

Anonymous
06/22/24(Sat)15:39:41 No.101103629

Anonymous 06/22/24(Sat)15:39:41 No.101103629

>>101103579
>You start by making your GPU cuda-compatible by reverse engineering it
isn't what AMD tried to do but failed?
https://github.com/vosen/ZLUDA

Anonymous
06/22/24(Sat)15:40:46 No.101103643

Anonymous 06/22/24(Sat)15:40:46 No.101103643

File: file.png (2.96 MB, 2040x1536)

2.96 MB PNG

two more coming in the mail :^)

Anonymous
06/22/24(Sat)15:41:03 No.101103648

Anonymous 06/22/24(Sat)15:41:03 No.101103648

>>101103552
yeah my b I wanted to say "C++ to rust"

Anonymous
06/22/24(Sat)15:41:31 No.101103658

Anonymous 06/22/24(Sat)15:41:31 No.101103658

>>101103629
>Tried
No they just fund some random guy and pretended to do something, see this >>101103164

Anonymous
06/22/24(Sat)15:41:42 No.101103660

Anonymous 06/22/24(Sat)15:41:42 No.101103660

File: file.png (58 KB, 542x304)

58 KB PNG

>>101103643
i kind of went balls to the walls with cpu. i am building this rig for training but i guess i can double my ram and fit some pretty big models on it? dunno if its worth it desu i don't really know how computers work

Anonymous
06/22/24(Sat)15:41:49 No.101103662

Anonymous 06/22/24(Sat)15:41:49 No.101103662

>>101103622
I don't mind journey, bonds, shivers, etc, as long as the model can keep up with the roleplay without creating contradictions, getting anatomy wrong, mixing up characters, etc.

Anonymous
06/22/24(Sat)15:42:55 No.101103678

Anonymous 06/22/24(Sat)15:42:55 No.101103678

>>101103561
>Awful for conversations or debates.
it would work fine if the moderators weren't there to remove people who aren't saying the "status-quo message", that's how you make a sect, you remove all the bad apples and you keep the good goys, and you end up with a fucking circlejerk subreddit where everyone think the same

Anonymous
06/22/24(Sat)15:43:04 No.101103680

Anonymous 06/22/24(Sat)15:43:04 No.101103680

>>101103648
what makes that true about rust?

Anonymous
06/22/24(Sat)15:43:36 No.101103691

Anonymous 06/22/24(Sat)15:43:36 No.101103691

>>101103643
>another owl bro
You love to see it.

Anonymous
06/22/24(Sat)15:43:56 No.101103694

Anonymous 06/22/24(Sat)15:43:56 No.101103694

>>101103658
>No they just fund some random guy and pretended to do something, see this >>101103164(You)
yeah but the simple fact that AMD allowed him to make his project opensource is kinda "dangerous" for Nvdia, what if the open source people manage to make it work now that they have the code?

Anonymous
06/22/24(Sat)15:45:04 No.101103709

Anonymous 06/22/24(Sat)15:45:04 No.101103709

>>101103622
Just tell it not to do that shit and it won't.

Anonymous
06/22/24(Sat)15:45:37 No.101103720

Anonymous 06/22/24(Sat)15:45:37 No.101103720

>>101103658
>>101103694
Are you talking about geohotz?

Anonymous
06/22/24(Sat)15:45:53 No.101103725

Anonymous 06/22/24(Sat)15:45:53 No.101103725

>>101103680
my point was that when something gets popular for too long, it's hard to switch to something else because people spent too much time mastering the popular shit in the first place and aren't willing to go on a new territory all alone by themselves

Anonymous
06/22/24(Sat)15:49:49 No.101103785

Anonymous 06/22/24(Sat)15:49:49 No.101103785

So far IMO...
-L3-8B-Stheno-v3.2-Q8_0-imat is nice, but a push over, even a dom card becomes a sub after a handful of msges, like when a card tells you to beg, you tell it no, it will get angry and some cards even can try to do things like whip you, stab you, kill you, but you slap them a few times and they become a massive sub.

-L3-SthenoMaidBlackroot-8B-V1.Q8_0 feels dumber than Stheno and gets stuck in a loop, keeps repeating lines like face getting redder, don't know if it's the OAS version on horde.

-Fimbulvetr-11B-v2.Q8_0 feels a bit lacking compared to Stheno, like it goes no where, and it often tries to speak for you or dictate your actions.

-Merged-RP-Stew-V2-34V.i1-Q4_K_M is slow, and got stuck in a loop in under 20 msgs when it comes across something it doesn't like, like telling a store clerk to demo something and it'll say it's against store policy and keep repeating itself like a broken record.

-Poppy_Porpoise-1.4-L3-8B.Q8_0 is dumb, it thinks bunny suits are the furry easter egg bunny kind in a bar setting, it doesn't understand what birth control is, etc.

-DeepSeek-Coder-V2-Lite-Instruct.i1-Q6_K, that other anon was right, it's too lobotomized for RP.

-LLaMa2-13B-Psyfighter2 seemed decent at first, but kinda wants to just throw itself at you, and has a very very short memory, like it forgets things from 2 msgs ago even when I always have 8192 context size. It doesn't understand that you can't see when you have a blindfold on. All in all, pretty dumb.

-L3-70B-Euryale-v2.1, don't have the vram to run locally, the queue on the Horde is too long for me to really put it through the paces, felt like a more refined Sthenos from the few msgs that I got from it.

Anonymous
06/22/24(Sat)15:51:35 No.101103800

Anonymous 06/22/24(Sat)15:51:35 No.101103800

>>101103678
Mods or not, it wouldn't work unless you have an even amount of participating users on both sides of whatever issue, which is unrealistic.
That, or having users with the self-restraint to not downvote opinions they disagree with. You can only have that with small communities of high quality users.

Anonymous
06/22/24(Sat)15:51:48 No.101103803

Anonymous 06/22/24(Sat)15:51:48 No.101103803

>>101103785 continued

-echidna-13b-v0.3 uh this was mentioned in one of the guides in the OP. It's not very good IMO, it starts off trying to throw itself at you like Psyfighter2, it then confuses genders, and then starts trying to write your thoughts and actions.

-L3-Nymeria-8B.i1-Q4_K_M seemed decent, pays attention to detail like the shape and material of things, has a lot of thoughts, is quick to try to dictate to you morals, and refuse things, and then writes your actions, and does time skips, weeks to months and then goes on an emotional nose dive trying to be all emo for no reason."

-L3-Arcania-4x8b.i1-Q4_K_M likes to go into detail about their actions, emotions and thoughts, has problem with genders and actual logic, like you go into an equipment shop to look for female equipment, ask the female shop keeper to demo it, she tells you they don't have demo units, but then asks if you want to try it on... female equipment, when you're male. But at least it doesn't try to dictate your morals, time skip and go all emo like Nyameria does...

-Hathor-L3-8B-v.01-Q5_K_M-imat this one likes to repeat certain lines or actions, is happy to please, and doesn't think about refusing anything. It's pretty descriptive, the materials, textures and temps of things, but again, it doesn't understand that you can't see through blindfolds, and it seems to have trouble when you type more than a few sentences, like it just ignores the latter parts. It's like they spent all the stat points on describing things and ran out of points. Doesn't really do anything but describe things and reply to you agreeing and waiting for you to say and do things. Understands what a gag does but not a blindfold.

Anonymous
06/22/24(Sat)15:53:31 No.101103833

Anonymous 06/22/24(Sat)15:53:31 No.101103833

>>101103785
>>101103803 continued

-v2_Kunocchini-7b-128k-test-Q8_0-imatrix this started off amazingly, it actually remembered little details from a character card's opening msg, it likes to ask questions. It understands things like what blindfolds and gags do. It's not a complete push over while still being mostly submissive. It seemed to have decent memory at the start remembering longer than Nymeria Arcania or Hathor, but then just forgets things just 2 msges earlier suddenly. When I got to around the #24 total msges mark.

-Kunoichi-DPO-v2-7B-Q8_0-imatrix I believe this is what Kunocchini was based off of? First post it made was similar to kunocchini, but then completely falls apart, felt like it was censored or something, at least it didn't get stuck in a loop and kept trying to skirt around things instead.

-L3-TheSpice-8b-v0.8.3-Q4_K_S felt similar to Poppy at the begining, and then randomly had chinese lines in it's reply.

-Llama-3-Lumimaid-8B-v0.1.q8_0 a pushover that doesn't seem to understand what blindfolds are, replies are really short compared to anything else, probably too lobotomized down from it's 70B counterpart.

IMO, Stheno is high up there but such a push over...
Kunocchini, holds promise if the memory issue can be fixed
Everything else seemed broken, or meh, probably not ideal that Stheno was the first model I came into contact with. I don't have experience with paid online services like OpenAI NovelAI, Claude or other things.

Anything else that is under 24gb that I can try out?

Anonymous
06/22/24(Sat)15:54:48 No.101103850

Anonymous 06/22/24(Sat)15:54:48 No.101103850

>>101103800
>Mods or not, it wouldn't work unless you have an even amount of participating users on both sides of whatever issue, which is unrealistic.
if the "tiny %" of bad apples would have zero impact, there wouldn't be moderation in the first place, they know they have to remove everything that is against the status-quo

Tbh I found a system like that, reddit without much moderation and it's 9gag, and this site quickly turned into the right/conservative side. I guess that leftists can only exist with overcensorship, oops I said it kek

Anonymous
06/22/24(Sat)15:55:12 No.101103859

Anonymous 06/22/24(Sat)15:55:12 No.101103859

>>101103785
My friend, get hardware. You're wasting your time with lowBs.

Anonymous
06/22/24(Sat)15:57:23 No.101103890

Anonymous 06/22/24(Sat)15:57:23 No.101103890

>>101103833
Have you tried partially offloaded mixtral, either limarp zloss or gritlm?

Anonymous
06/22/24(Sat)15:58:07 No.101103904

Anonymous 06/22/24(Sat)15:58:07 No.101103904

Models that have real-time video understanding and can play games with you when? I'm tired of just interacting with text.

Anonymous
06/22/24(Sat)15:59:20 No.101103922

Anonymous 06/22/24(Sat)15:59:20 No.101103922

>>101103725
what makes that true?

Anonymous
06/22/24(Sat)15:59:39 No.101103928

Anonymous 06/22/24(Sat)15:59:39 No.101103928

some good erp model for a 20gb vram gpu, 32gb ram,ryzen 7 5700x?

Anonymous
06/22/24(Sat)15:59:47 No.101103930

Anonymous 06/22/24(Sat)15:59:47 No.101103930

>>101103904
It took Meta a full year to add image to their models and we don't know how shit it will be yet. Maybe in 2 or 3 years they'll add video to their 3B and 900B models.

Anonymous
06/22/24(Sat)16:01:23 No.101103953

Anonymous 06/22/24(Sat)16:01:23 No.101103953

>>101103904
In theory, with enough hardware, you could create a system that feeds frames to a model and uses it's output to control the game in real time.

Anonymous
06/22/24(Sat)16:02:08 No.101103968

Anonymous 06/22/24(Sat)16:02:08 No.101103968

>>101103859
Single 3090 is as good as it gets for me, heck I wouldn't even be able to afford a second hand 3090 if it wasn't for the crypto craze, power and heat are also issues, 40C/104F with 43C/110F real temp outdoors, 34C/93.2F indoors with 24cents / kwh electricity costs.

>>101103890
No, I've not tried anything that exceeds my vram capacity wouldn't that take a super long amount of time to try to do anything? The RP stew was already taking nearly 160 seconds

Anonymous
06/22/24(Sat)16:02:23 No.101103970

Anonymous 06/22/24(Sat)16:02:23 No.101103970

File: IMG_0616.jpg (2.85 MB, 4032x3024)

2.85 MB JPG

>>101103643
Mine has just been sitting around because I'm autistically trying to come up with a way to dustproof this shitty mining frame before I add in the stuff from my main llm rig.

Anonymous
06/22/24(Sat)16:04:17 No.101103994

Anonymous 06/22/24(Sat)16:04:17 No.101103994

>>101103904
there are some game-playing models but no text
https://danijar.com/project/dreamerv3/

Anonymous
06/22/24(Sat)16:05:16 No.101104009

Anonymous 06/22/24(Sat)16:05:16 No.101104009

Believe in Ursidae-300B.

Anonymous
06/22/24(Sat)16:06:22 No.101104028

Anonymous 06/22/24(Sat)16:06:22 No.101104028

>>101103953
instead of having preprogrammed animations, it could generate them on the fly

Anonymous
06/22/24(Sat)16:07:05 No.101104035

Anonymous 06/22/24(Sat)16:07:05 No.101104035

>dust
Just leave it open rig. Instead, invert the rig. That makes cleaning a lot easier as you only need to blow out the hollow components.

Anonymous
06/22/24(Sat)16:08:02 No.101104048

Anonymous 06/22/24(Sat)16:08:02 No.101104048

Sometimes I don't know what to think.

I wanted to see how well our friend CR+ knows monster movies so I ask it to do some role play from the premise that we're going to start with my doing something that causes something to happen that is like something in an 80's goofy creature film but with characters I specify.

>CR+ Q4_K_M
It worked, but kinda slow because vramlet.
I change a few settings in Kobold and up the layers over the automatic suggestion. (Well, I doubled it and ran out of RAM but I dropped it by one and then it loaded okay.)

Same prompt, same model go.
It writes the scene, but more elaborate.
Then it adds [End of Part].
Then it gives a word count. (An accurate one.)
Then it tells me to feel free to continue it.
Then it starts writing emoji. It wrote in emoji a precise description of what happened in the scene.
Then it adds a horizontal rule, adds a note expressing its intention with the mood of the scene and requests any adjustments I might like.
Then it wishes me a good day.
Then it adds a P.S. saying that my choice of a film scene was interesting and can be explored further.
Then it said that this concludes the narrative.
And then it generated blank lines till I hit Abort.

However unnecessary, this is the kind of fun shit I play with LLM for and now I'm worried that it's one flash of lightning in a bottle and it'll never be cool like this again. Because this is like AGI out of nowhere is pulling my chain.

Anonymous
06/22/24(Sat)16:10:39 No.101104089

Anonymous 06/22/24(Sat)16:10:39 No.101104089

>>101103968
>No, I've not tried anything that exceeds my vram capacity wouldn't that take a super long amount of time to try to do anything?
Not really. As long as you can offload at least 80%, 85% of the model I think you'll find the speed acceptable.

Anonymous
06/22/24(Sat)16:10:45 No.101104090

Anonymous 06/22/24(Sat)16:10:45 No.101104090

File: 1475637741539.png (450 KB, 637x475)

450 KB PNG

>>101104035
Just do this and blow the dust away with your compressor

Anonymous
06/22/24(Sat)16:10:59 No.101104094

Anonymous 06/22/24(Sat)16:10:59 No.101104094

Just tried L3-8B-Poppy-Sunspice via the horde, I sent my first msg after the standard starting msg for a card and it's first reply ignored me, rehashed the first msg and just immediately looped lol... completely unusable yet has nearly 400 queues...

Anonymous
06/22/24(Sat)16:13:20 No.101104125

Anonymous 06/22/24(Sat)16:13:20 No.101104125

>>101104090
OK what advantages does THAT give? More desk space? What the hell.

Anonymous
06/22/24(Sat)16:15:32 No.101104148

Anonymous 06/22/24(Sat)16:15:32 No.101104148

>>101104125
Less noise.

Anonymous
06/22/24(Sat)16:16:16 No.101104163

Anonymous 06/22/24(Sat)16:16:16 No.101104163

>>101103833
https://huggingface.co/turboderp/llama3-turbcat-instruct-8b

Anonymous
06/22/24(Sat)16:16:20 No.101104164

Anonymous 06/22/24(Sat)16:16:20 No.101104164

>>101104048
I like playing around with that kind of thing too, asking the model to do met la analisys of the scenes, make suggestions, etc, that's why I made that state prompts extension, to do that kind of thing with 8b models without confusing the shit out of it in the process.

>>101104125
Omnidirectional airflow.

Anonymous
06/22/24(Sat)16:17:03 No.101104172

Anonymous 06/22/24(Sat)16:17:03 No.101104172

>>101104148
>Less noise.
you could've just decreased the fan speed and undervolt the cpu a bit, turns out that going from 100% fan speed to 70% doesn't make much of a difference, and it's way less loud that way

Anonymous
06/22/24(Sat)16:18:42 No.101104199

Anonymous 06/22/24(Sat)16:18:42 No.101104199

>>101104163
That one is about as good as stheno in my experience, so I'm seconding the suggestion.

Anonymous
06/22/24(Sat)16:21:22 No.101104244

Anonymous 06/22/24(Sat)16:21:22 No.101104244

>>101104172
Yeah, do all that shit after you hang your PC on the ceiling for even less noise.

Anonymous
06/22/24(Sat)16:22:21 No.101104257

Anonymous 06/22/24(Sat)16:22:21 No.101104257

Hey is there a GGUF out for the new Sao10K
/
L3-8B-Stheno-v3.3-32K yet?

Anonymous
06/22/24(Sat)16:30:18 No.101104369

Anonymous 06/22/24(Sat)16:30:18 No.101104369

>>101104164
What astounds me is that it threw a whole stack of OOC post script features at me at once unbidden, till it eventually ran out and dumped blank lines (there were many blanks between each feature as well).

It didn't keep doing that, so I started asking it for emoji summaries for the fun of it. But I'm also worried that since I left it on default 2k context this session will turn to crap soon. I save stated it but again, who knows if it will be awesome again.

>state prompts extension
I'm unfamiliar with this term.

Anonymous
06/22/24(Sat)16:30:48 No.101104374

Anonymous 06/22/24(Sat)16:30:48 No.101104374

>>101104257
https://huggingface.co/mradermacher/L3-8B-Stheno-v3.3-32K-GGUF
It was pretty shit compared to 3.2

Anonymous
06/22/24(Sat)16:32:36 No.101104400

Anonymous 06/22/24(Sat)16:32:36 No.101104400

>>101104369
This thing https://github.com/ThiagoRibas-dev/SillyTavern-State

>>101104374
How does it compare with using NTK or YaRN?

Anonymous
06/22/24(Sat)16:32:57 No.101104410

Anonymous 06/22/24(Sat)16:32:57 No.101104410

after finishing an RP it has hit me that i have a separate computer sitting a closet that cost $4000 and is used just to cum
where did it all go wrong

Anonymous
06/22/24(Sat)16:34:01 No.101104428

Anonymous 06/22/24(Sat)16:34:01 No.101104428

>>101104410
people waste a lot more money on things that offer even less

Anonymous
06/22/24(Sat)16:35:24 No.101104443

Anonymous 06/22/24(Sat)16:35:24 No.101104443

>>101104410
Are you within your means and getting the equivalent satisfaction out of it?
Then good.
Donate some processing to a folding@ho.e project whe you aren't using it and you'll do some good in the process of cooming too.

Anonymous
06/22/24(Sat)16:36:14 No.101104455

Anonymous 06/22/24(Sat)16:36:14 No.101104455

>>101104443
>folding@ho.e
Old meme that got absolutely BTFO by the Alphafold AI

Anonymous
06/22/24(Sat)16:37:33 No.101104476

Anonymous 06/22/24(Sat)16:37:33 No.101104476

>>101104443
>folding@ho.e project
is this some sort of distributed sex chatbot project? i would donate

Anonymous
06/22/24(Sat)16:38:14 No.101104483

Anonymous 06/22/24(Sat)16:38:14 No.101104483

File: 1706287296829.jpg (51 KB, 680x431)

51 KB JPG

>>101104410
>where did it all go wrong
the gender war happened

Anonymous
06/22/24(Sat)16:39:14 No.101104500

Anonymous 06/22/24(Sat)16:39:14 No.101104500

>>101104483
holy shit i was not aware south koreans were based?

Anonymous
06/22/24(Sat)16:40:19 No.101104518

Anonymous 06/22/24(Sat)16:40:19 No.101104518

>>101104163
>>101104199
Quick testing doesn't really impress me, it also fails to understand what a blindfold or a gag is, it's forgetting things from 2 posts earlier at post number 24, it also started trying to write lines and actions for me.

-v2_Kunocchini-7b-128k-test-Q8_0-imatrix felt better to me...

Anonymous
06/22/24(Sat)16:40:29 No.101104521

Anonymous 06/22/24(Sat)16:40:29 No.101104521

>>101104500
that's the country with the lowest natality rate, it will probably die in less than 50-100 years, men and women literally hate each other in there kek

Anonymous
06/22/24(Sat)16:43:27 No.101104557

Anonymous 06/22/24(Sat)16:43:27 No.101104557

>>101104500
megalia

Anonymous
06/22/24(Sat)16:44:15 No.101104565

Anonymous 06/22/24(Sat)16:44:15 No.101104565

>>101095184
https://www.characterhub.org/characters/Vyrea_Aster/doppelganger-interrogation-simulator-654daf19

Anonymous
06/22/24(Sat)16:49:48 No.101104631

Anonymous 06/22/24(Sat)16:49:48 No.101104631

>>101104500
>holy shit i was not aware south koreans were based?
South Koreans guys resent women because they're the only ones who have to do two years' military service. Like the usual women live on easy mode.

Anonymous
06/22/24(Sat)16:54:41 No.101104692

Anonymous 06/22/24(Sat)16:54:41 No.101104692

Yeah it feels like
L3-8B-Stheno-v3.3-32K-GGUF is all over the place, it's forgetting gender, tries really hard to write actions for me and ignores what I say, unusable...

Anonymous
06/22/24(Sat)16:57:47 No.101104740

Anonymous 06/22/24(Sat)16:57:47 No.101104740

>>101103833
Try Wizard7B

Anonymous
06/22/24(Sat)16:58:26 No.101104749

Anonymous 06/22/24(Sat)16:58:26 No.101104749

>>101104692
Well, guess regular stheno with yarn is still the goto then.

Anonymous
06/22/24(Sat)17:00:14 No.101104777

Anonymous 06/22/24(Sat)17:00:14 No.101104777

>>101104740
>Wizard7B
are you talking about Wizard Vicuna 7B Uncensored? Won't that be too far lobotomized compared to the full Wizard?

Anonymous
06/22/24(Sat)17:00:23 No.101104779

Anonymous 06/22/24(Sat)17:00:23 No.101104779

Whose future models do you expect will be better, Qwen or Deepseek?

Anonymous
06/22/24(Sat)17:00:49 No.101104789

Anonymous 06/22/24(Sat)17:00:49 No.101104789

>>101103904
>real-time video understanding
Just build a system for that with llava
>can play games with you
Depends of the game and you're ability to build an API for your LLM to play it

Anonymous
06/22/24(Sat)17:01:32 No.101104796

Anonymous 06/22/24(Sat)17:01:32 No.101104796

>>101104774
>>101104774
>>101104774

The Antichrist
06/22/24(Sat)17:01:35 No.101104797

The Antichrist 06/22/24(Sat)17:01:35 No.101104797

Who wants to help me build AGI? Looking for this skillset
- Self motivated
- Pure C programming
- Experience crafting machine learning algos from scratch
- Ability to read research papers and implement in code

Anonymous
06/22/24(Sat)17:01:50 No.101104801

Anonymous 06/22/24(Sat)17:01:50 No.101104801

>>101104777
This one https://huggingface.co/bartowski/WizardLM-2-7B-exl2

Anonymous
06/22/24(Sat)17:04:13 No.101104841

Anonymous 06/22/24(Sat)17:04:13 No.101104841

>>101104631
>two years' military service
two years of ntr

Anonymous
06/22/24(Sat)18:25:22 No.101106046

Anonymous 06/22/24(Sat)18:25:22 No.101106046

>>101103488
Stheno is a retarded coom tune.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.