/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/21/24(Sun)22:49:23 No.101514682

File: denial.png (1.23 MB, 3330x2006)

1.23 MB PNG

/lmg/ - Local Models General Anonymous 07/21/24(Sun)22:49:23 No.101514682 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101507132 & >>101497246

>48GB and Above VRAMfags in Suicide Watch Edition

►News
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/21/24(Sun)22:52:37 No.101514710

Anonymous 07/21/24(Sun)22:52:37 No.101514710

Surely my 48GB rig will be worth using again once 400b is out...

Anonymous
07/21/24(Sun)23:04:56 No.101514777

Anonymous 07/21/24(Sun)23:04:56 No.101514777

Imagine being so butthurt you make a new thread, and then immediately write the first reply to epicly own the guys that you hate and continue the inane argument from the last thread

Anonymous
07/21/24(Sun)23:06:41 No.101514793

Anonymous 07/21/24(Sun)23:06:41 No.101514793

>>101514777
The first reply isn't that aggressive, petra. Stop being insecure.

Anonymous
07/21/24(Sun)23:07:46 No.101514801

Anonymous 07/21/24(Sun)23:07:46 No.101514801

>>101514682
This pathetic early bake oozes envy

Anonymous
07/21/24(Sun)23:08:50 No.101514809

Anonymous 07/21/24(Sun)23:08:50 No.101514809

>>101514801
>page 9 is early

Anonymous
07/21/24(Sun)23:08:55 No.101514812

Anonymous 07/21/24(Sun)23:08:55 No.101514812

>>101514682
Gotta give credit to LiveBench, at least this one puts the actual best LLM on the top, chatbot arena doesn't do that and it's hurting its credibility hard

Anonymous
07/21/24(Sun)23:10:25 No.101514819

Anonymous 07/21/24(Sun)23:10:25 No.101514819

>>101514793
>Everybody I don't like is petra

Anonymous
07/21/24(Sun)23:12:05 No.101514834

Anonymous 07/21/24(Sun)23:12:05 No.101514834

48gb is worth it only retards settle for 24gb have fun with nemo while I'm rocking miqu

Anonymous
07/21/24(Sun)23:13:17 No.101514841

Anonymous 07/21/24(Sun)23:13:17 No.101514841

>>101514834
Llama 1 65B outperforms Miqu.

Anonymous
07/21/24(Sun)23:14:44 No.101514850

Anonymous 07/21/24(Sun)23:14:44 No.101514850

>>101514841
pygmalion-6.7b outperforms Claude 3.5 Sonnet

Anonymous
07/21/24(Sun)23:16:31 No.101514861

Anonymous 07/21/24(Sun)23:16:31 No.101514861

>>101514850
GPT-4chan still has SOTA Trutheval score.

Anonymous
07/21/24(Sun)23:18:43 No.101514879

Anonymous 07/21/24(Sun)23:18:43 No.101514879

>>101514861
As it should, it's one of the only sites where people can genuinely speak their mind without being worried of being censored or canceled.

Anonymous
07/21/24(Sun)23:24:38 No.101514938

Anonymous 07/21/24(Sun)23:24:38 No.101514938

post nemo instruct and context jsons

Anonymous
07/21/24(Sun)23:32:21 No.101514990

Anonymous 07/21/24(Sun)23:32:21 No.101514990

stupid question
can language models do search? probably not since "dataset" is part of the model and any additions would require retraining

basic idea is to describe my local images with chatgpt and then somehow use these descriptions to do natural language search locally

Anonymous
07/21/24(Sun)23:32:28 No.101514991

Anonymous 07/21/24(Sun)23:32:28 No.101514991

Guys, why are vramlets winning so bigly with local SOTA being less than 24 gigabytes of VRAM?
I thought if I invest big into gpus the placebo feeling will stop... Is the watermelon meme true? Guys?

Anonymous
07/21/24(Sun)23:35:28 No.101515008

Anonymous 07/21/24(Sun)23:35:28 No.101515008

File: 5dc46da9fcf5d183b8d240e8a(...).png (64 KB, 511x400)

64 KB PNG

►Recent Highlights from the Previous Thread: >>101507132

--Papers: >>101514489
--LlaMA 3.1 405B weights leaked, repository taken down: >>101511690 >>101511724 >>101511748 >>101511821
--Sliding temperature setting proposal to balance AI creativity and coherence: >>101510643 >>101510688 >>101510752 >>101510782 >>101510741
--Security concerns and practices for AI software installation: >>101511323 >>101511388 >>101511619
--RAID 0 array with a second SSD: Will it affect model loading speed?: >>101510478 >>101510539 >>101510607
--KoboldAI Lite is a lightweight web UI; check logs for hidden responses due to hallucination or lack of stop string: >>101512051 >>101512071 >>101512121 >>101512118 >>101512128
--Haize Labs and their 'readteaming' AI systems: >>101509434
--Exllama and vLLM forks with improved inference and hardware utilization: >>101509795 >>101509906 >>101509977 >>101510621 >>101511457 >>101511496 >>101511574 >>101511518 >>101511631 >>101512109
--DeepSeek-V2-Chat model requiring data center-level hardware: >>101510622 >>101511429
--DeepSeek V2 236B and its hardware requirements: >>101510006 >>101510096 >>101510329
--CR+ model performance evaluation for coding and translation tasks: >>101507354 >>101507470 >>101511624
--Running 8 LLMs at once to play Amogus: >>101512449 >>101512480 >>101512507
--Nemo is better than CR+ for Anon with 96GB VRAM due to faster generation speed: >>101508154 >>101508231 >>101508258
--Issue with loading nemo gguf in ooba due to pending PR #8577 in llama.cpp: >>101509954 >>101510032
--Anon is excited about the release of LLaMA 3.1, which promises improvements in context length and repetition handling.: >>101508259 >>101508344 >>101508376 >>101508399 >>101508327 >>101508375
--LLMs at their most creative, kino, and genius: >>101509473 >>101510697 >>101509650 >>101509703 >>101509726 >>101511910
--Miku (free space): >>101509085 >>101511334 >>101513886

►Recent Highlight Posts from the Previous Thread: >>101507146

Anonymous
07/21/24(Sun)23:37:02 No.101515020

Anonymous 07/21/24(Sun)23:37:02 No.101515020

>>101514990
If you want an overly complicated system, you can use an embeddings model, embed the descriptions into a database or whatever with the image path. When you want to search, embed the search terms and scan through the embeddings database, looking for entries that are semantically similar, and output their paths.
Or use grep.

Anonymous
07/21/24(Sun)23:41:44 No.101515059

Anonymous 07/21/24(Sun)23:41:44 No.101515059

>>101514990
learn embedding and vector search
https://www.sbert.net/examples/applications/image-search/README.html

Anonymous
07/21/24(Sun)23:46:14 No.101515079

Anonymous 07/21/24(Sun)23:46:14 No.101515079

my nemo feels like it's afraid to say bad no-no words, how to fix?

Anonymous
07/21/24(Sun)23:47:32 No.101515089

Anonymous 07/21/24(Sun)23:47:32 No.101515089

local malding general

Anonymous
07/21/24(Sun)23:54:06 No.101515119

Anonymous 07/21/24(Sun)23:54:06 No.101515119

>>101515020
>>101515059
thanks, that looks interesting

Anonymous
07/21/24(Sun)23:57:13 No.101515135

Anonymous 07/21/24(Sun)23:57:13 No.101515135

new to thread - are people actually saying a 48Gb set-up isn't worth it? A home compute cluster will always be worth it, wtf lmfao.

Anonymous
07/21/24(Sun)23:59:07 No.101515147

Anonymous 07/21/24(Sun)23:59:07 No.101515147

>>101515135
8 billion parameters is more than enough for anyone and you only need a single 24GB card for that

Anonymous
07/22/24(Mon)00:03:09 No.101515172

Anonymous 07/22/24(Mon)00:03:09 No.101515172

>>101515135
48gb is always worth it because you can run two nemos while they can only run one more is better

Anonymous
07/22/24(Mon)00:05:25 No.101515189

Anonymous 07/22/24(Mon)00:05:25 No.101515189

>>101515147
okay, that makes more sense. I'd love to throw together a cluster for home robotics + AI management though, and I'm sure I could hit a 48Gb limit. This is wild, however...how are people securing their clusters? We've seen cryptominer-ware for years, up next is compute-hijacking, no?

Also, apologies for any late replies - I'm at work and also the 4chan captcha's still make me feel slightly retarded (and i am already aware of a latent retardation within me, lmfao)

Anonymous
07/22/24(Mon)00:08:40 No.101515222

Anonymous 07/22/24(Mon)00:08:40 No.101515222

>>101515189
>This is wild, however...how are people securing their clusters?
By not exposing any services externally and not downloading stupid shit.

Anonymous
07/22/24(Mon)00:11:43 No.101515246

Anonymous 07/22/24(Mon)00:11:43 No.101515246

Is there somewhere I can read about the techniques that were used to make this 12B model better than the previous gen of 70Bs? Did they release a paper?

Anonymous
07/22/24(Mon)00:13:32 No.101515261

Anonymous 07/22/24(Mon)00:13:32 No.101515261

>>101515246
Wait, you genuienly believe that Mistral Memo is better than L3-70b?

Anonymous
07/22/24(Mon)00:14:41 No.101515267

Anonymous 07/22/24(Mon)00:14:41 No.101515267

>>101515135
Because the highlighted model in the OP pic fits in 24GB and the next one that's better needs >100GB.

Anonymous
07/22/24(Mon)00:15:09 No.101515270

Anonymous 07/22/24(Mon)00:15:09 No.101515270

>>101515222
Ah, so what you're saying is "no microsoft server shit ever"

Cannot believe Google created Kubernetes and will now dominate home-compute clusters by creating a one click solution that will see the bulk of it's sales to smart-home youtubers

Anonymous
07/22/24(Mon)00:16:43 No.101515279

Anonymous 07/22/24(Mon)00:16:43 No.101515279

>>101515261
Fuck off if you have nothing useful to say.

Anonymous
07/22/24(Mon)00:17:38 No.101515285

Anonymous 07/22/24(Mon)00:17:38 No.101515285

>>101515246
The only thing that Nemo has going for it is being good for creative writing, and I think it does that by not being heavy filtered and censored. And whatever in the training makes it not being confident in a single answer.

Anonymous
07/22/24(Mon)00:17:49 No.101515287

Anonymous 07/22/24(Mon)00:17:49 No.101515287

>>101515279
no, you fuck off, you make shit takes, expect to get clowned on, that's how it works

Anonymous
07/22/24(Mon)00:18:20 No.101515292

Anonymous 07/22/24(Mon)00:18:20 No.101515292

>>101515279
useful = anything positive about small models?

Anonymous
07/22/24(Mon)00:19:43 No.101515299

Anonymous 07/22/24(Mon)00:19:43 No.101515299

>>101515285
>The only thing that Nemo has going for it is being good for creative writing
I like its sovl, that's what was missing on the local LLM space, but it's too small and retarded, unironically a 90b-BitNet-Mistral-Nemo would be so fucking good it would compete against the best API models, just imagine

Anonymous
07/22/24(Mon)00:20:41 No.101515306

Anonymous 07/22/24(Mon)00:20:41 No.101515306

Guys. Imagine this. Not only will Meta be releasing models tomorrow, but another company also will. We're going to be so back.

Anonymous
07/22/24(Mon)00:25:16 No.101515335

Anonymous 07/22/24(Mon)00:25:16 No.101515335

>>101515008
After not seeing the recap at the first post I was worried it wasn't going to be there, keep up the good work.

Anonymous
07/22/24(Mon)00:26:50 No.101515342

Anonymous 07/22/24(Mon)00:26:50 No.101515342

File: Capture.jpg (246 KB, 2395x1264)

246 KB JPG

>>101515285
>The only thing that Nemo has going for it is being good for creative writing
I downloaded this model because Mixtral was excellent at french, I expected it would be the same for Nemo, but that's not the case at all, it fucking sucks, and that was one of the main points of this model, I guess that small models can't be good at english and other languages at the same time

Anonymous
07/22/24(Mon)00:27:58 No.101515345

Anonymous 07/22/24(Mon)00:27:58 No.101515345

Nemo transformers status?

Anonymous
07/22/24(Mon)00:29:23 No.101515353

Anonymous 07/22/24(Mon)00:29:23 No.101515353

>>101515306
>but another company
How can I trust you to be telling the truth when you can't even name the "Other company".

Anonymous
07/22/24(Mon)00:31:08 No.101515361

Anonymous 07/22/24(Mon)00:31:08 No.101515361

>>101515345
Fine since the day the model was released, you just had to compile transformers from github which is simple even on windows (which is usually a pain in the ass to compile python stuff on)

Anonymous
07/22/24(Mon)00:32:07 No.101515368

Anonymous 07/22/24(Mon)00:32:07 No.101515368

>>101515361
I did exactly that. And it gave me the tensor shape error still.

Anonymous
07/22/24(Mon)00:33:32 No.101515376

Anonymous 07/22/24(Mon)00:33:32 No.101515376

>>101515368
Weird, works on my machine
What loader/UI? I used ooba

Anonymous
07/22/24(Mon)00:33:36 No.101515377

Anonymous 07/22/24(Mon)00:33:36 No.101515377

from the previous thread:
>>101512013
>You don't have to disable flash attention with exllama but the quality is poor. llama.cpp's quality is better, but it doesn't support flash attention.
I had to go check this and i can say for sure that you dont have to disable flash attention with gemma-27b and llama.cpp either. i don't know if it makes any difference to have it turned on, yet, but you can still generate responses with the option enabled, at least.

Anonymous
07/22/24(Mon)00:36:20 No.101515392

Anonymous 07/22/24(Mon)00:36:20 No.101515392

>>101515353
Don't trust me. Just imagine it.

Anonymous
07/22/24(Mon)00:39:32 No.101515405

Anonymous 07/22/24(Mon)00:39:32 No.101515405

>>101515376
Same. I guess I'll just have to try wiping ooba and doing a complete fresh install, maybe even rebuild the conda environment.
Although it just occurred to me, I did build transformers from source for something else entirely prior to mistral Nemo release. It was updated since then but the version number is the same. Is it possible that it's pulling it from pip cache instead of redownloading everything because of that?

Anonymous
07/22/24(Mon)00:42:17 No.101515413

Anonymous 07/22/24(Mon)00:42:17 No.101515413

>>101515405
Hmm I don't think so, since the command to build transformers from source should implicitly override any attempt to use a cached wheel
On the other hand I've definitely had to reinstall ooba from scratch before due to weird broken packages that wouldn't update properly, so that's always worth trying

Anonymous
07/22/24(Mon)00:44:43 No.101515422

Anonymous 07/22/24(Mon)00:44:43 No.101515422

reminder actually using the recommended prompt formatting causes slop to head up, especially for nemo as it went from reddit mod to KKK member instantly

Anonymous
07/22/24(Mon)00:58:37 No.101515487

Anonymous 07/22/24(Mon)00:58:37 No.101515487

>>101514682
How did Anthropic manage to bring 3 Sonnet from 38 points to 61 with 3.5 Sonnet? Did they discover something unique?

Anonymous
07/22/24(Mon)01:01:28 No.101515495

Anonymous 07/22/24(Mon)01:01:28 No.101515495

>>101515487
Not sure but its not BS. It truly is next level over anything else / level claude opus while being much smaller / faster.

Anonymous
07/22/24(Mon)01:02:24 No.101515500

Anonymous 07/22/24(Mon)01:02:24 No.101515500

>>101515495
I hope they can do the same to 3 Opus. It's currently 50 points on that leaderboard, imagine if they manage to do the same 20 points improvement so it'll be 70 points.

Anonymous
07/22/24(Mon)01:03:25 No.101515507

Anonymous 07/22/24(Mon)01:03:25 No.101515507

What model size do you think 3.5 Sonnet is? 70-100B? Is it MoE or dense?

Anonymous
07/22/24(Mon)01:06:47 No.101515533

Anonymous 07/22/24(Mon)01:06:47 No.101515533

>>101515507
This question has already been answered multiple times search the archives

Anonymous
07/22/24(Mon)01:07:54 No.101515540

Anonymous 07/22/24(Mon)01:07:54 No.101515540

>>101515533
Hmm, so ~70B? I checked the archives

Anonymous
07/22/24(Mon)01:12:49 No.101515575

Anonymous 07/22/24(Mon)01:12:49 No.101515575

>>101515487
I have no idea how they managed to make C3.5 Sonnet so good, but they fucking did it, looks like AnthropicAI is currently the only company with a moat now, OpenAI's days are numbered if they can't find anything new soon

Anonymous
07/22/24(Mon)01:13:48 No.101515578

Anonymous 07/22/24(Mon)01:13:48 No.101515578

>>101515575
Well, to be fair, GPT-4o mini is impressive on this leaderboard considering its cost. But I'm kinda positive that 3.5 Haiku will absolutely mog it as well.

Anonymous
07/22/24(Mon)01:17:12 No.101515594

Anonymous 07/22/24(Mon)01:17:12 No.101515594

>>101515495
3.5 Sonnet is extremely overfit on assistant slop. It's great if you ask models to make spreadsheets, it sucks if you want to goon. Opus still mogs in the gooning department, and that's not likely to change until 3.5 Opus drops.

Anonymous
07/22/24(Mon)01:18:32 No.101515597

Anonymous 07/22/24(Mon)01:18:32 No.101515597

>>101515594
>extremely
I wouldn't say so. GPT is extremely overfit on assistant stuff, but not 3.5 Sonnet. Sure it's more trained on it than Claude 3 models, but not "extremely".

Anonymous
07/22/24(Mon)01:19:29 No.101515600

Anonymous 07/22/24(Mon)01:19:29 No.101515600

>>101515575
While I have my doubts it'll actually play out this way, it would be so fucking funny if OpenAI, the company that started the closed source, closed research movement, got completely mogged by another company due to their own secret sauce

Anonymous
07/22/24(Mon)01:33:52 No.101515682

Anonymous 07/22/24(Mon)01:33:52 No.101515682

>>101515575
Not being lost on me that the best competitor to OpenAI is staffed with former OpenAI employees. Likewise, the best competitor to Meta (Mistral) is staffed by former Meta employees. Hmm.

Anonymous
07/22/24(Mon)01:35:28 No.101515687

Anonymous 07/22/24(Mon)01:35:28 No.101515687

>>101514682
What is this benchmark? There's a 27b model that's better than wizardlm 8x22b? I hadn't had a good experience with anything else.

Anonymous
07/22/24(Mon)01:42:45 No.101515739

Anonymous 07/22/24(Mon)01:42:45 No.101515739

What’s the big deal with the community getting access to llama 405b weights

Anonymous
07/22/24(Mon)01:46:24 No.101515777

Anonymous 07/22/24(Mon)01:46:24 No.101515777

>>101514682
>>48GB and Above VRAMfags in Suicide Watch Edition
QRD?

Anonymous
07/22/24(Mon)01:46:35 No.101515780

Anonymous 07/22/24(Mon)01:46:35 No.101515780

>>101515687
Ive kept saying it. 27B atm is best local for non creative stuff. Its too dry. Nemo is the opposite, a bit dumb but its dripping soul.

Anonymous
07/22/24(Mon)01:48:29 No.101515795

Anonymous 07/22/24(Mon)01:48:29 No.101515795

gemma 3 when?

Anonymous
07/22/24(Mon)01:58:30 No.101515874

Anonymous 07/22/24(Mon)01:58:30 No.101515874

>>101515795
this time, bitnet version

Anonymous
07/22/24(Mon)02:02:20 No.101515902

Anonymous 07/22/24(Mon)02:02:20 No.101515902

>>101515008
Thank you Recap Anon

Anonymous
07/22/24(Mon)02:12:06 No.101515961

Anonymous 07/22/24(Mon)02:12:06 No.101515961

>>101515780
I'll give it a try, I stuck to the wizard 8x22b because other models seemed to not be able to state some facts like they removed them from the model or something. It would be familiar with a book for example but make up the author's name.

Anonymous
07/22/24(Mon)02:52:51 No.101516197

Anonymous 07/22/24(Mon)02:52:51 No.101516197

>>101514710
Midnight miqu is still worth using and better than other 70b models.

Anonymous
07/22/24(Mon)03:08:42 No.101516306

Anonymous 07/22/24(Mon)03:08:42 No.101516306

The problem with larger models is that there is a dismissal return. I will say that around 30b seems to be the sweet spot. All the 70b models I tried were not twice as good, not even half as good, like 10–20% better than the 30b, and the 30b is not that much better than the 13/12b models. And that may not be true anymore too since if I compare the new Mistral to YI, I do not think YI is doing any better. The parameters alone are not the solution to how to make the models better and more intelligent.

Anonymous
07/22/24(Mon)03:12:18 No.101516334

Anonymous 07/22/24(Mon)03:12:18 No.101516334

>>101516306
that you for this revolutionary and completely new information, anon

Anonymous
07/22/24(Mon)03:13:19 No.101516347

Anonymous 07/22/24(Mon)03:13:19 No.101516347

>>101516334
It is not for you but for the anons who are so dissimistic of the smaller model. No reason to be cunt.

Anonymous
07/22/24(Mon)03:15:21 No.101516364

Anonymous 07/22/24(Mon)03:15:21 No.101516364

>>>101516306
>dismissal return
>>101516347
>dissimistic
are you trying to invent a new language too? surely you are a modern day da vinci

Anonymous
07/22/24(Mon)03:19:58 No.101516402

Anonymous 07/22/24(Mon)03:19:58 No.101516402

>>101516364
I did mean dismissive. I am on toilet shitting. Do you have more stupid questions?

Anonymous
07/22/24(Mon)03:20:42 No.101516408

Anonymous 07/22/24(Mon)03:20:42 No.101516408

>>101516306
>The parameters alone are not the solution to how to make the models better and more intelligent.
Tell that to Meta who decided to burn tens of millions of dollars to make L3-405b kek

Anonymous
07/22/24(Mon)03:21:28 No.101516415

Anonymous 07/22/24(Mon)03:21:28 No.101516415

>>101516402
don't get baited into an argument with autists, they can continue on doing it for a hundred posts, its not worth it

llamanon !xi8/JKFwzo
07/22/24(Mon)03:59:54 No.101516633

llamanon !xi8/JKFwzo 07/22/24(Mon)03:59:54 No.101516633

File: migus.png (1.73 MB, 850x1511)

1.73 MB PNG

>open secrets, anyone can find me
>hear your music running through my mind!

magnet:?xt=urn:btih:c0e342ae5677582f92c52d8019cc32e1f86f1d83&dn=miqu-2&tr=udp%3A%2F%2Ftracker.openbittorrent.com%3A80

https://files.catbox.moe/d88djr.torrent

Anonymous
07/22/24(Mon)04:02:16 No.101516648

Anonymous 07/22/24(Mon)04:02:16 No.101516648

>>101516633
I like these Mikus

Anonymous
07/22/24(Mon)04:02:45 No.101516651

Anonymous 07/22/24(Mon)04:02:45 No.101516651

>>101516633
>If miqu is so good, why isn't there a miqu tw-

Anonymous
07/22/24(Mon)04:04:36 No.101516658

Anonymous 07/22/24(Mon)04:04:36 No.101516658

>>101516633
I kneel llama anon

Anonymous
07/22/24(Mon)04:06:57 No.101516676

Anonymous 07/22/24(Mon)04:06:57 No.101516676

>>101516633
imagine the sheer amount of fumes concentrating in between their crotches

Anonymous
07/22/24(Mon)04:07:12 No.101516678

Anonymous 07/22/24(Mon)04:07:12 No.101516678

>>101516633
is this llama3 ?

Anonymous
07/22/24(Mon)04:07:49 No.101516682

Anonymous 07/22/24(Mon)04:07:49 No.101516682

>>101516678
no it's miqu2

Anonymous
07/22/24(Mon)04:08:01 No.101516683

Anonymous 07/22/24(Mon)04:08:01 No.101516683

File: 1693659341071358.png (19 KB, 685x795)

19 KB PNG

>>101516633

Anonymous
07/22/24(Mon)04:08:57 No.101516689

Anonymous 07/22/24(Mon)04:08:57 No.101516689

>>101516682
no but for real, is it llama 3 405b? what other model would it be? I mean it's nice getting it a day early, but not too groundbreaking.

Anonymous
07/22/24(Mon)04:09:13 No.101516692

Anonymous 07/22/24(Mon)04:09:13 No.101516692

File: miqu!!!.png (3 KB, 482x61)

3 KB PNG

>>101516678
>>101516633
oh shit someone actually snatched the 405b

Anonymous
07/22/24(Mon)04:09:32 No.101516693

Anonymous 07/22/24(Mon)04:09:32 No.101516693

>>101516678
miqu sounds like mistral, bigger nemo?

Anonymous
07/22/24(Mon)04:09:52 No.101516698

Anonymous 07/22/24(Mon)04:09:52 No.101516698

>>101516693
763GB big?

Anonymous
07/22/24(Mon)04:11:24 No.101516705

Anonymous 07/22/24(Mon)04:11:24 No.101516705

File: config_llama.png (76 KB, 428x785)

76 KB PNG

>>101516633
>>101516693
126 layers, ~130k context
Llama3 tokenizer size & arch

llamanon !xi8/JKFwzo
07/22/24(Mon)04:15:43 No.101516725

llamanon !xi8/JKFwzo 07/22/24(Mon)04:15:43 No.101516725

>>101516633
https://huggingface.co/cloud-district/miqu-2

Anonymous
07/22/24(Mon)04:15:50 No.101516726

Anonymous 07/22/24(Mon)04:15:50 No.101516726

>>101516633
>model-00001-of-00191.safetensors
Jesus fuck, how is anyone gonna run this thing

Anonymous
07/22/24(Mon)04:16:22 No.101516729

Anonymous 07/22/24(Mon)04:16:22 No.101516729

>>101516633
Chat, is this real?

Anonymous
07/22/24(Mon)04:17:04 No.101516732

Anonymous 07/22/24(Mon)04:17:04 No.101516732

>>101516633
>>101516725
>This repository corresponds to the base Llama 3.1 405B model. The model has the same model weight format, but does RoPE using per frequency scaling, hence requiring code changes for inference.

Anonymous
07/22/24(Mon)04:26:07 No.101516799

Anonymous 07/22/24(Mon)04:26:07 No.101516799

>>101516633
Confirmed real. It's Llama 3.1 405b base.

Anonymous
07/22/24(Mon)04:29:22 No.101516820

Anonymous 07/22/24(Mon)04:29:22 No.101516820

>>101516633
I knew someone had the time to download llama3-405b from the huggingface leak

Anonymous
07/22/24(Mon)04:30:54 No.101516839

Anonymous 07/22/24(Mon)04:30:54 No.101516839

>>101516633
it's the base model right? kek, that means someone will have to finetune this monster, good luck with that

Anonymous
07/22/24(Mon)04:32:15 No.101516854

Anonymous 07/22/24(Mon)04:32:15 No.101516854

I tried out the instruct on a leaked endpoint. It's coal.
>certainly!

Anonymous
07/22/24(Mon)04:32:35 No.101516858

Anonymous 07/22/24(Mon)04:32:35 No.101516858

>>101516633
SEED SEED SEED SEED SEED SEED SEED SEED SEED SEED SEED SEED SEED SEED

Anonymous
07/22/24(Mon)04:33:10 No.101516862

Anonymous 07/22/24(Mon)04:33:10 No.101516862

>>101516858
seed in miqu womb

Anonymous
07/22/24(Mon)04:34:48 No.101516880

Anonymous 07/22/24(Mon)04:34:48 No.101516880

>>101516633
If someone decided to leak this shit, that means Meta had no intention of releasing it as a local model, right?

Anonymous
07/22/24(Mon)04:35:18 No.101516883

Anonymous 07/22/24(Mon)04:35:18 No.101516883

>>101516880
are you dumb dumb? meta was about to release it tomorrow

Anonymous
07/22/24(Mon)04:36:14 No.101516892

Anonymous 07/22/24(Mon)04:36:14 No.101516892

>>101516725
https://huggingface.co/cloud-district/miqu-2/discussions/2
>Hi everyone,

>I wanted to raise an important ethical concern regarding the use of large language models (LLMs) like this one in certain environments, particularly when it comes to safety. As we all know, LLMs require substantial computational power, often relying on powerful servers that can generate a significant amount of heat.

>I’ve been thinking about the implications of running these models in a room that is also being used for excessive physical activity—like a home gym or a dance studio. The combination of high processing demands and vigorous movement can create a dangerous environment.

>When you have a server running at full capacity, it can generate heat that, in a confined space, may turn the room into something akin to a giant air fryer! This not only poses a fire risk, but it can also lead to poor air quality and overheating, making it uncomfortable and potentially hazardous for anyone exercising nearby.

>I’m curious if others share this concern and whether there are any safety protocols or recommendations for minimizing risks when using LLMs in active environments. Should we be more conscious about where and how we run these powerful systems?

>Looking forward to your thoughts and any insights you may have!

>Best,
>Charles McSneed

Anonymous
07/22/24(Mon)04:36:16 No.101516893

Anonymous 07/22/24(Mon)04:36:16 No.101516893

>>101516883
so this mf didn't want to wait a single day and decided to release the torrent because of that? kek that's something

Anonymous
07/22/24(Mon)04:36:26 No.101516897

Anonymous 07/22/24(Mon)04:36:26 No.101516897

God dammit
Now I can't help but notice the shivers going down my spine.

Anonymous
07/22/24(Mon)04:37:22 No.101516907

Anonymous 07/22/24(Mon)04:37:22 No.101516907

when will nemo work with ooba, what needs to happen

Anonymous
07/22/24(Mon)04:37:56 No.101516911

Anonymous 07/22/24(Mon)04:37:56 No.101516911

what's the point of leaking only 24 hours before the official release

that's too close for me to care, it's mildly funny i guess but not useful

Anonymous
07/22/24(Mon)04:39:15 No.101516924

Anonymous 07/22/24(Mon)04:39:15 No.101516924

>>101516907
it works on exllama_hf

Anonymous
07/22/24(Mon)04:39:28 No.101516925

Anonymous 07/22/24(Mon)04:39:28 No.101516925

the point was to mog meta, and also because it's funny

Anonymous
07/22/24(Mon)04:40:13 No.101516934

Anonymous 07/22/24(Mon)04:40:13 No.101516934

>>101516911
It means either exllama or llama.cpp will rush to get Day -1 support for 405b kek

Anonymous
07/22/24(Mon)04:40:24 No.101516936

Anonymous 07/22/24(Mon)04:40:24 No.101516936

>>101516633
Wrong trip.

Anonymous
07/22/24(Mon)04:41:49 No.101516945

Anonymous 07/22/24(Mon)04:41:49 No.101516945

What's the typical time they release models?

Anonymous
07/22/24(Mon)04:41:51 No.101516947

Anonymous 07/22/24(Mon)04:41:51 No.101516947

File: Screenshot 2024-07-22 204124.png (213 KB, 1886x869)

213 KB PNG

>>101516907
it already works fine in ooba with transformers or exl2 loaders

llamanon !!T2UdrWkLSWB
07/22/24(Mon)04:41:53 No.101516948

llamanon !!T2UdrWkLSWB 07/22/24(Mon)04:41:53 No.101516948

>>101516936
My apologies, wrong password.

Anonymous
07/22/24(Mon)04:42:04 No.101516950

Anonymous 07/22/24(Mon)04:42:04 No.101516950

>>101516633
dunno what do you expect us to do with the base model, they will release the base model and the instruct model tommorow so...

Anonymous
07/22/24(Mon)04:42:53 No.101516958

Anonymous 07/22/24(Mon)04:42:53 No.101516958

>>101516934
They were always gonna rush to implement support anyways though
this is funny but a nothingburger

Anonymous
07/22/24(Mon)04:42:54 No.101516959

Anonymous 07/22/24(Mon)04:42:54 No.101516959

>>101516948
Naisu.

Anonymous
07/22/24(Mon)04:43:23 No.101516966

Anonymous 07/22/24(Mon)04:43:23 No.101516966

File: 1699385942502783.png (128 KB, 1279x767)

128 KB PNG

>>101516948
Still fake

Anonymous
07/22/24(Mon)04:44:13 No.101516972

Anonymous 07/22/24(Mon)04:44:13 No.101516972

>>101516966
Real however? I was seeding llama1 a year ago leaked by this correct trip.

Anonymous
07/22/24(Mon)04:44:31 No.101516973

Anonymous 07/22/24(Mon)04:44:31 No.101516973

>>101516972
Why are you using random trips then?

Anonymous
07/22/24(Mon)04:45:21 No.101516979

Anonymous 07/22/24(Mon)04:45:21 No.101516979

>>101516948
You are a pussy for leaking it so late, it's worthless to do it this close to release.

Anonymous
07/22/24(Mon)04:45:34 No.101516980

Anonymous 07/22/24(Mon)04:45:34 No.101516980

>>101516973
nigga he probably mistyped its fine

Anonymous
07/22/24(Mon)04:47:09 No.101516989

Anonymous 07/22/24(Mon)04:47:09 No.101516989

>>101516979
it was done for the funny factor anon

Anonymous
07/22/24(Mon)04:47:26 No.101516990

Anonymous 07/22/24(Mon)04:47:26 No.101516990

File: vegeta kneeling.gif (410 KB, 168x498)

410 KB GIF

>>101516633
I kneel mikufags, you've won me with this one

Anonymous
07/22/24(Mon)04:51:22 No.101517013

Anonymous 07/22/24(Mon)04:51:22 No.101517013

>>101516979
Eh, it's nice to know there's someone important still here

Anonymous
07/22/24(Mon)04:53:16 No.101517032

Anonymous 07/22/24(Mon)04:53:16 No.101517032

>>101516633
Now we wait for an aicg fag like mm or fiz to run this on aws

Anonymous
07/22/24(Mon)04:54:20 No.101517038

Anonymous 07/22/24(Mon)04:54:20 No.101517038

>>101517032
are you retarded? you can't upload custom models on aws

Anonymous
07/22/24(Mon)04:55:48 No.101517050

Anonymous 07/22/24(Mon)04:55:48 No.101517050

>>101517038
>he doesn't know

Anonymous
07/22/24(Mon)04:56:17 No.101517056

Anonymous 07/22/24(Mon)04:56:17 No.101517056

How much VRAM would hosting a 405b take?
>>101517038
You can, but it requires bartering with support for the ability to.

Anonymous
07/22/24(Mon)04:57:31 No.101517068

Anonymous 07/22/24(Mon)04:57:31 No.101517068

>>101517038
You need enterprise accounts and only 3 autists from that shitty general know how to do it

Anonymous
07/22/24(Mon)04:57:59 No.101517071

Anonymous 07/22/24(Mon)04:57:59 No.101517071

>>101517068
The support would detect that they've hacked the account anyway if you were to contact them.

Anonymous
07/22/24(Mon)04:58:38 No.101517074

Anonymous 07/22/24(Mon)04:58:38 No.101517074

>>101517056
>How much VRAM would hosting a 405b take?
~400gb at 8bpw
~800gb at bf16
~200gb at 4bpw

Anonymous
07/22/24(Mon)05:04:27 No.101517111

Anonymous 07/22/24(Mon)05:04:27 No.101517111

>>101517074
So I can use the 0.9bpw version, great

Anonymous
07/22/24(Mon)05:07:24 No.101517136

Anonymous 07/22/24(Mon)05:07:24 No.101517136

>>101517111
kek, I fucking hate Meta for not making a bitnet model, they are taking zero risk even though they can burn a shit ton of money on experiments, that shit's frustrating

Anonymous
07/22/24(Mon)05:08:34 No.101517150

Anonymous 07/22/24(Mon)05:08:34 No.101517150

>>101517136
i wish DeepSeek would try, they've demonstrated a willingness to burn compute on weird shit

Anonymous
07/22/24(Mon)05:10:08 No.101517164

Anonymous 07/22/24(Mon)05:10:08 No.101517164

Since it's possible to distillate 405b into a smaller models, is it also possible to make a smaller bitnet model?

Anonymous
07/22/24(Mon)05:10:57 No.101517170

Anonymous 07/22/24(Mon)05:10:57 No.101517170

>>101517164
you faggots and your bitnet

Anonymous
07/22/24(Mon)05:12:59 No.101517185

Anonymous 07/22/24(Mon)05:12:59 No.101517185

so, who is the meta insider posting here?
Le cun? Maybe Zuck? or just some random indian?

Anonymous
07/22/24(Mon)05:14:35 No.101517196

Anonymous 07/22/24(Mon)05:14:35 No.101517196

>>101517185
Me my name is john

Anonymous
07/22/24(Mon)05:17:30 No.101517216

Anonymous 07/22/24(Mon)05:17:30 No.101517216

>>101517185
>lecun
way too much of a bluepilled normie, guaranteed he thinks everyone on 4chan is a poltard nazi

Anonymous
07/22/24(Mon)05:17:51 No.101517217

Anonymous 07/22/24(Mon)05:17:51 No.101517217

>>101517185
Zucc is too lame, but I can see LeCunny doing it. Some random Based Sir is also always an option.

Anonymous
07/22/24(Mon)05:18:17 No.101517225

Anonymous 07/22/24(Mon)05:18:17 No.101517225

>>101517216
I don't think he knows what 4chan is
He is 80 after all

Anonymous
07/22/24(Mon)05:19:24 No.101517236

Anonymous 07/22/24(Mon)05:19:24 No.101517236

>>101517185
There's no insider that matters. Whoever uploaded Llama 3 405B just got it from the accidentally-made public test repository on HuggingFace before it got taken down.

Anonymous
07/22/24(Mon)05:20:04 No.101517241

Anonymous 07/22/24(Mon)05:20:04 No.101517241

nemo llama.cpp support merged

Anonymous
07/22/24(Mon)05:20:34 No.101517244

Anonymous 07/22/24(Mon)05:20:34 No.101517244

>>101517217
>I can see LeCunny doing it
retard

Anonymous
07/22/24(Mon)05:20:40 No.101517246

Anonymous 07/22/24(Mon)05:20:40 No.101517246

>>101517225
Kek, no. He's 64.

Anonymous
07/22/24(Mon)05:21:22 No.101517252

Anonymous 07/22/24(Mon)05:21:22 No.101517252

>>101517236
The same guy that leaked Llama-1? And miqu-1 (mistral medium)?
It also was up for like 5 minutes, no time to download it even if you clicked on download right when it was published

Anonymous
07/22/24(Mon)05:22:29 No.101517263

Anonymous 07/22/24(Mon)05:22:29 No.101517263

>>101517217
>Zucc is too lame
I think he's starting to swallow the redpill though, the fact he decided to opensource llama, learning MMA and outright saying that trump's reaction to the assacination attempt was "badass" is showing how based he became.
https://www.youtube.com/watch?v=XgWFwVRGcf4

Anonymous
07/22/24(Mon)05:23:35 No.101517272

Anonymous 07/22/24(Mon)05:23:35 No.101517272

>>101517252
Everybody and their dog got access to Llama-1, you just needed an academic email address.

Anonymous
07/22/24(Mon)05:25:47 No.101517284

Anonymous 07/22/24(Mon)05:25:47 No.101517284

>>101517216
He was also all negative about the shit he was working on. I do not expect anything great from Llama anymore.

Anonymous
07/22/24(Mon)05:26:41 No.101517294

Anonymous 07/22/24(Mon)05:26:41 No.101517294

>>101517284
lecun is not working on llama though
he is working on jepa

Anonymous
07/22/24(Mon)05:28:13 No.101517303

Anonymous 07/22/24(Mon)05:28:13 No.101517303

>>101517294
>jepa
what's that? a new architecture?

Anonymous
07/22/24(Mon)05:29:22 No.101517312

Anonymous 07/22/24(Mon)05:29:22 No.101517312

>>101516725
>404
It was fun while it lasted

Anonymous
07/22/24(Mon)05:29:55 No.101517319

Anonymous 07/22/24(Mon)05:29:55 No.101517319

>>101517284
lecun doesn't work on the llm side of things fortunately

Anonymous
07/22/24(Mon)05:31:21 No.101517330

Anonymous 07/22/24(Mon)05:31:21 No.101517330

>>101517303
https://ai.meta.com/blog/yann-lecun-ai-model-i-jepa/
https://ai.meta.com/blog/v-jepa-yann-lecun-ai-model-video-joint-embedding-predictive-architecture/
Interesting project, it aims to replicate the "cat level intelligence" lecum is always talking using more natural methods (no next token prediction for example)

llamanon !!T2UdrWkLSWB
07/22/24(Mon)05:31:28 No.101517331

llamanon !!T2UdrWkLSWB 07/22/24(Mon)05:31:28 No.101517331

>>101517312
torrent still up. please seed.

While we're at it, qbittorrent is being very mean to me. It goes from "Seeding" to "Completed" by itself every few minutes, and I have to manually toggle it back on. Any ideas? I'm using the web UI.

Anonymous
07/22/24(Mon)05:33:15 No.101517345

Anonymous 07/22/24(Mon)05:33:15 No.101517345

>>101517331
Im buying a 1TB hdd just to seed this, arrives in a couple of days

Anonymous
07/22/24(Mon)05:33:53 No.101517350

Anonymous 07/22/24(Mon)05:33:53 No.101517350

>>101517294
>>101517319
Good to know.

Anonymous
07/22/24(Mon)05:34:05 No.101517353

Anonymous 07/22/24(Mon)05:34:05 No.101517353

>>101517331
Force resume

Anonymous
07/22/24(Mon)05:34:11 No.101517355

Anonymous 07/22/24(Mon)05:34:11 No.101517355

>>101517330
so jepa is like a multimodal model not just a simple llm right?

Anonymous
07/22/24(Mon)05:34:16 No.101517356

Anonymous 07/22/24(Mon)05:34:16 No.101517356

>>101517331
Right click, torrent option, "Set no share limit", resume.
Or: Right click, force resume.

llamanon !!T2UdrWkLSWB
07/22/24(Mon)05:38:00 No.101517390

llamanon !!T2UdrWkLSWB 07/22/24(Mon)05:38:00 No.101517390

>>101517353
>>101517356
>force resume
I do that every time.

There's no "set no share limit" option on web ui, but there is "seeding limits". I set that to pause seeding when I reach a ratio of 1000. Still doesn't work.

Anonymous
07/22/24(Mon)05:41:43 No.101517424

Anonymous 07/22/24(Mon)05:41:43 No.101517424

>>101516633
i haven't been part of a good thread in years came to dl and get in the screencap

Anonymous
07/22/24(Mon)05:46:17 No.101517458

Anonymous 07/22/24(Mon)05:46:17 No.101517458

File: pepe door.jpg (540 KB, 1805x1015)

540 KB JPG

>>101517390
You'd expect a meta insider to know how to use a computer

Anonymous
07/22/24(Mon)05:47:26 No.101517466

Anonymous 07/22/24(Mon)05:47:26 No.101517466

>>101517458
aaand it's 404'd
hey mikufriend, if you upload a 4-bit version it'd be easier for us to torrent

Anonymous
07/22/24(Mon)05:50:07 No.101517478

Anonymous 07/22/24(Mon)05:50:07 No.101517478

>>101517390
set the ratio to 0

Anonymous
07/22/24(Mon)05:54:15 No.101517508

Anonymous 07/22/24(Mon)05:54:15 No.101517508

>>101517216
>he thinks everyone on 4chan is a poltard nazi
he is mostly right about that

Anonymous
07/22/24(Mon)05:55:39 No.101517527

Anonymous 07/22/24(Mon)05:55:39 No.101517527

>>101517508
are you a poltard nazi anon?

Anonymous
07/22/24(Mon)05:56:47 No.101517538

Anonymous 07/22/24(Mon)05:56:47 No.101517538

File: llama.png (34 KB, 943x379)

34 KB PNG

finally! nemo is supported!

Anonymous
07/22/24(Mon)05:56:50 No.101517539

Anonymous 07/22/24(Mon)05:56:50 No.101517539

fucking hell last time I checked P40s on ebay they were going for 180€, now they are at 350€ what the fuck happenne

Anonymous
07/22/24(Mon)06:01:00 No.101517574

Anonymous 07/22/24(Mon)06:01:00 No.101517574

File: st,small,845x845-pad,1000(...).jpg (104 KB, 1000x1000)

104 KB JPG

>>101517538
now I have to wait for llama cpp python to make a new version, and booba to make the weights, such is the fate of the booba users

Anonymous
07/22/24(Mon)06:10:28 No.101517627

Anonymous 07/22/24(Mon)06:10:28 No.101517627

>>101515119
From my experience - depending on the queries, it'll most likely suck though.
I got better results using function calling, where the LLM would generate a query and I would just use it to call something like ElasticSearch or whatever. But again, depends on your use case and there's still quite a few rough edges.

Anonymous
07/22/24(Mon)06:11:18 No.101517635

Anonymous 07/22/24(Mon)06:11:18 No.101517635

>>101517539
blame Johannes

Anonymous
07/22/24(Mon)06:23:39 No.101517730

Anonymous 07/22/24(Mon)06:23:39 No.101517730

https://reddit.com/r/LocalLLaMA/comments/1e98zrb/llama_31_405b_base_model_available_for_download/
First time I see a leddit post crediting 4chan, that's surprisingly nice I guess?

Anonymous
07/22/24(Mon)06:30:47 No.101517790

Anonymous 07/22/24(Mon)06:30:47 No.101517790

>>101517730
would have been better if they'd just reposted the magnet without attribution, this is going to lead to a small but annoying influx of reddity people
>redditors are already here
sure, but it'll get worse

Anonymous
07/22/24(Mon)06:32:16 No.101517810

Anonymous 07/22/24(Mon)06:32:16 No.101517810

>>101517790
ill have to pump the "nigger" per message count by 10 to make up for this
thanks, redditniggers

Anonymous
07/22/24(Mon)06:34:19 No.101517835

Anonymous 07/22/24(Mon)06:34:19 No.101517835

>>101517810
based

Anonymous
07/22/24(Mon)06:35:31 No.101517848

Anonymous 07/22/24(Mon)06:35:31 No.101517848

>>101517810
doing god's work anon

Anonymous
07/22/24(Mon)06:55:56 No.101518037

Anonymous 07/22/24(Mon)06:55:56 No.101518037

>>101516633
inb4 it's just random init weights to troll people

Anonymous
07/22/24(Mon)06:58:50 No.101518062

Anonymous 07/22/24(Mon)06:58:50 No.101518062

>>101518037
it's the base model though, so the outputs will not be much different than random weights kek

Anonymous
07/22/24(Mon)07:12:32 No.101518157

Anonymous 07/22/24(Mon)07:12:32 No.101518157

Don't care about 400b at all, the only ones that can run it locally are corpos and mega autists. When is meta releasing extended context 70b?

Anonymous
07/22/24(Mon)07:12:42 No.101518159

Anonymous 07/22/24(Mon)07:12:42 No.101518159

>>101516633
HOLY FUCKING KINO

Anonymous
07/22/24(Mon)07:14:06 No.101518169

Anonymous 07/22/24(Mon)07:14:06 No.101518169

File: 1716318516084046.png (20 KB, 329x566)

20 KB PNG

>>101516633

Anonymous
07/22/24(Mon)07:15:05 No.101518182

Anonymous 07/22/24(Mon)07:15:05 No.101518182

>>101518157
The 8B and 70B's will apparently be distillations of the 405B (which will be extended context, multimodal).

Anonymous
07/22/24(Mon)07:15:59 No.101518188

Anonymous 07/22/24(Mon)07:15:59 No.101518188

>>101518182
>multimodal
last I heard that was pushed back, it's the text only that will release for now

Anonymous
07/22/24(Mon)07:17:39 No.101518200

Anonymous 07/22/24(Mon)07:17:39 No.101518200

how does K compare to K_L?
do the 8bit layers do anything or is it placebo?

Anonymous
07/22/24(Mon)07:17:52 No.101518202

Anonymous 07/22/24(Mon)07:17:52 No.101518202

>>101517574
I had to switch to og llama.cpp a few weeks ago after ooba kinda dropped amd support. would recommend

Anonymous
07/22/24(Mon)07:18:44 No.101518209

Anonymous 07/22/24(Mon)07:18:44 No.101518209

>>101518182
>extended context only on distilled models
niggers

Anonymous
07/22/24(Mon)07:19:07 No.101518212

Anonymous 07/22/24(Mon)07:19:07 No.101518212

>>101518169
If this was a BitNet model maybe you could use it on one 80GB GPU, and most definitely on 2x48GB GPUs.

Anonymous
07/22/24(Mon)07:21:57 No.101518232

Anonymous 07/22/24(Mon)07:21:57 No.101518232

To the Russian nigger who completed the download and isn't seeding, fuck you.

Anonymous
07/22/24(Mon)07:23:42 No.101518247

Anonymous 07/22/24(Mon)07:23:42 No.101518247

>>101518202
I think it's working again now, there was a compiler issue that oogies solved

Anonymous
07/22/24(Mon)07:23:48 No.101518248

Anonymous 07/22/24(Mon)07:23:48 No.101518248

File: gape middle finger.png (19 KB, 800x800)

19 KB PNG

>>101518232

Anonymous
07/22/24(Mon)07:23:50 No.101518250

Anonymous 07/22/24(Mon)07:23:50 No.101518250

>>101518232
Do you have your port forwarded?

Anonymous
07/22/24(Mon)07:24:04 No.101518253

Anonymous 07/22/24(Mon)07:24:04 No.101518253

>>101516633
I don't get what is the point of adding a dead tracker - either skip it altogether or add a bunch of live ones

Anonymous
07/22/24(Mon)07:24:51 No.101518262

Anonymous 07/22/24(Mon)07:24:51 No.101518262

File: Woag.gif (3.85 MB, 474x498)

3.85 MB GIF

>>101516633

Anonymous
07/22/24(Mon)07:26:55 No.101518277

Anonymous 07/22/24(Mon)07:26:55 No.101518277

File: 1709599649662777.png (25 KB, 678x161)

25 KB PNG

How does nemo compare to bagel mystery tour?

Anonymous
07/22/24(Mon)07:27:55 No.101518290

Anonymous 07/22/24(Mon)07:27:55 No.101518290

File: tokeniser.png (16 KB, 606x428)

16 KB PNG

>>101516633
Do you think Meta secretly gave some of the reserved special tokens some kind of meaning? E.g. <|C_tag|>
It would allow them to hold an advantage over the local inferencers.
Maybe they are reserved for the multimodal stuff?

Anonymous
07/22/24(Mon)07:32:28 No.101518318

Anonymous 07/22/24(Mon)07:32:28 No.101518318

>>101518290
Probably mostly reserved for future use and not initialized at the moment. The base version of Llama-3 has that problem with the special tokens used for the Instruct finetune.

Anonymous
07/22/24(Mon)07:52:20 No.101518451

Anonymous 07/22/24(Mon)07:52:20 No.101518451

>>101516633
unfathomably based

Anonymous
07/22/24(Mon)08:10:25 No.101518607

Anonymous 07/22/24(Mon)08:10:25 No.101518607

>>101515377
>i don't know if it makes any difference to have it turned on, yet, but you can still generate responses with the option enabled, at least.
You get a message saying that it's disabled.
There's a PR open for logit soft capping with FA if I'm not wrong.

Anonymous
07/22/24(Mon)08:10:28 No.101518609

Anonymous 07/22/24(Mon)08:10:28 No.101518609

Are there any models who can coherently roleplay a robot/android? Maybe some specific datasets based on this...

Anonymous
07/22/24(Mon)08:12:05 No.101518620

Anonymous 07/22/24(Mon)08:12:05 No.101518620

>>101518609
Sorry anon, 2b won't won't crush your head with her 150 kilogram ass.

Anonymous
07/22/24(Mon)08:13:17 No.101518635

Anonymous 07/22/24(Mon)08:13:17 No.101518635

>>101516633
anyone got quants that can fit in my 6gb card?

Anonymous
07/22/24(Mon)08:14:20 No.101518649

Anonymous 07/22/24(Mon)08:14:20 No.101518649

>>101518620
But perhaps 8B will...

Anonymous
07/22/24(Mon)08:17:23 No.101518675

Anonymous 07/22/24(Mon)08:17:23 No.101518675

File: edge yeah fair enough.gif (432 KB, 200x126)

432 KB GIF

>>101518649

Anonymous
07/22/24(Mon)08:19:43 No.101518709

Anonymous 07/22/24(Mon)08:19:43 No.101518709

>>101518675
I just need to know which one...

Anonymous
07/22/24(Mon)08:20:06 No.101518713

Anonymous 07/22/24(Mon)08:20:06 No.101518713

>>101516633
This is an asshole move, considering that the official release will happen in literally a day. The meta developers deserve the spotlight themselves not some loser with an Azumanga Daioh fixation.

Anonymous
07/22/24(Mon)08:21:23 No.101518725

Anonymous 07/22/24(Mon)08:21:23 No.101518725

>>101518713
>This is an asshole move
ok and?

Anonymous
07/22/24(Mon)08:21:51 No.101518730

Anonymous 07/22/24(Mon)08:21:51 No.101518730

>>101518725
Nothing, just wanted to vent.

Anonymous
07/22/24(Mon)08:23:08 No.101518749

Anonymous 07/22/24(Mon)08:23:08 No.101518749

>>101518709
honestly from my experience ive seen basically every model ive used roleplay as a maid, robot/animatronic/android just fine, combining the two? Man im sure the latest models would do it really well.

Anonymous
07/22/24(Mon)08:25:20 No.101518772

Anonymous 07/22/24(Mon)08:25:20 No.101518772

>>101518755
kill yourself

Anonymous
07/22/24(Mon)08:26:13 No.101518784

Anonymous 07/22/24(Mon)08:26:13 No.101518784

Hello friends from reddit!

Anonymous
07/22/24(Mon)08:27:37 No.101518803

Anonymous 07/22/24(Mon)08:27:37 No.101518803

>>101518797
I don't like black dick >:(

Anonymous
07/22/24(Mon)08:29:24 No.101518817

Anonymous 07/22/24(Mon)08:29:24 No.101518817

>>101516633
Retard

Anonymous
07/22/24(Mon)08:29:53 No.101518824

Anonymous 07/22/24(Mon)08:29:53 No.101518824

>>101518749
>latest
Like L3? It's nice, but from my experience still somewhat struggles with that. That's why I'm looking for a model finetuned specifically for that, or including a significant amount of such entries in a dataset. But so far searching Huggingface has not given any results.

Anonymous
07/22/24(Mon)08:30:06 No.101518827

Anonymous 07/22/24(Mon)08:30:06 No.101518827

>>101518813
then why are you posting Miku? braindead

Anonymous
07/22/24(Mon)08:30:19 No.101518830

Anonymous 07/22/24(Mon)08:30:19 No.101518830

Let's use this opportunity for some learning!
- Blacks have lower IQ than whites! Their average is 80-85, while the white average is 100. That's why they are unsuccessful in every country. Their brains have lower volume as well.
- Jews are responsible for feminism, socialism, and all kinds of degenerate progressism that we have to live with nowadays! No it's not the Chinese, it was always Jews. Just navigate wikipedia and learn how to read the early life sections.
Have a good day!

Anonymous
07/22/24(Mon)08:32:16 No.101518849

Anonymous 07/22/24(Mon)08:32:16 No.101518849

what's the point of a model no sane person can run at home?
fucking retards

Anonymous
07/22/24(Mon)08:32:27 No.101518854

Anonymous 07/22/24(Mon)08:32:27 No.101518854

>>101518846
Bless petra

Anonymous
07/22/24(Mon)08:33:00 No.101518866

Anonymous 07/22/24(Mon)08:33:00 No.101518866

>>101518849
Cope local cuck

Anonymous
07/22/24(Mon)08:33:34 No.101518869

Anonymous 07/22/24(Mon)08:33:34 No.101518869

Great a fagoot appeared and ruin the whole thread for us.. Thanks? i hope you get beaten to death.

Anonymous
07/22/24(Mon)08:35:29 No.101518888

Anonymous 07/22/24(Mon)08:35:29 No.101518888

>>101518869
Put me in the screencap! Epic thread!

Anonymous
07/22/24(Mon)08:36:26 No.101518903

Anonymous 07/22/24(Mon)08:36:26 No.101518903

>>101518803
they look like turds kek so disgusting

Anonymous
07/22/24(Mon)08:38:42 No.101518921

Anonymous 07/22/24(Mon)08:38:42 No.101518921

>>101518869
Nah, if that was the case he would spam some lolis too. I guess he is just a poser.

Anonymous
07/22/24(Mon)08:39:32 No.101518933

Anonymous 07/22/24(Mon)08:39:32 No.101518933

>>101518888
Beautiful numbers, Anon.

Anonymous
07/22/24(Mon)08:41:19 No.101518958

Anonymous 07/22/24(Mon)08:41:19 No.101518958

>>101515575
>I have no idea how they managed to make C3.5 Sonnet so good
did it get worse for anyone else recently or am I going insane

Anonymous
07/22/24(Mon)08:42:56 No.101518973

Anonymous 07/22/24(Mon)08:42:56 No.101518973

>>101514938
I'm running Nemo, and it seems like the 'Mistral' template in SillyTavern works fine. I have temp set to 0.8, and neutralized all the other samplers.

Anonymous
07/22/24(Mon)08:43:18 No.101518978

Anonymous 07/22/24(Mon)08:43:18 No.101518978

o7 BBC Miku anon, please never come back

Anonymous
07/22/24(Mon)08:44:56 No.101518992

Anonymous 07/22/24(Mon)08:44:56 No.101518992

>>101517032
I dont think anyone has access to more than 8 a100 gpus

Anonymous
07/22/24(Mon)08:45:02 No.101518993

Anonymous 07/22/24(Mon)08:45:02 No.101518993

>>101518958
Worse in what way?

Anonymous
07/22/24(Mon)08:49:02 No.101519037

Anonymous 07/22/24(Mon)08:49:02 No.101519037

>>101518993
There's this schizo theory in aicg that exact same models (in the API!) get "optimized" over time and become "worse".

Anonymous
07/22/24(Mon)08:49:47 No.101519051

Anonymous 07/22/24(Mon)08:49:47 No.101519051

>>101518613
>who the fuck can even run this here?
petra confirmed to be a butthurt vramlet

Anonymous
07/22/24(Mon)08:51:52 No.101519082

Anonymous 07/22/24(Mon)08:51:52 No.101519082

RIP NeMo
Miqu-2 is the new boss in town

Anonymous
07/22/24(Mon)08:52:07 No.101519084

Anonymous 07/22/24(Mon)08:52:07 No.101519084

The Mistral-Nemo 128k context is meme right? it seems to shit the bed around 10-12k for me. Anyone having better luck?

Anonymous
07/22/24(Mon)08:52:15 No.101519088

Anonymous 07/22/24(Mon)08:52:15 No.101519088

>>101519082
oh yeah? can I run miqu 2 on my RTX 3090 just like I can with NeMO?

Anonymous
07/22/24(Mon)08:53:26 No.101519105

Anonymous 07/22/24(Mon)08:53:26 No.101519105

>>101519088
just wait for 0.525 bpw quant to drop

Anonymous
07/22/24(Mon)08:56:52 No.101519144

Anonymous 07/22/24(Mon)08:56:52 No.101519144

>>101519037
It's not a theory, that's exactly what happens. Since it's "API" they can do anything with it and you would never know.

Anonymous
07/22/24(Mon)08:57:14 No.101519148

Anonymous 07/22/24(Mon)08:57:14 No.101519148

>>101519144
>It's not a theory, that's exactly what happens
Schizo

Anonymous
07/22/24(Mon)08:58:01 No.101519151

Anonymous 07/22/24(Mon)08:58:01 No.101519151

>>101519084
I get similar results. The model starts to write in a retarded way like:
"She begins conversationally forevermore"

Anonymous
07/22/24(Mon)08:59:06 No.101519160

Anonymous 07/22/24(Mon)08:59:06 No.101519160

>>101519151
what backend?

Anonymous
07/22/24(Mon)08:59:10 No.101519161

Anonymous 07/22/24(Mon)08:59:10 No.101519161

>>101519148
ok newfag
>>101519144
Pretty much every API model gets safer and dumber over time. Just so people can be "wowed" when new one is released

Anonymous
07/22/24(Mon)08:59:30 No.101519164

Anonymous 07/22/24(Mon)08:59:30 No.101519164

>>101519161
>ok newfag
not a single proof btw

Anonymous
07/22/24(Mon)09:00:09 No.101519170

Anonymous 07/22/24(Mon)09:00:09 No.101519170

>>101519160
I use kobold.cpp fork

Anonymous
07/22/24(Mon)09:00:43 No.101519176

Anonymous 07/22/24(Mon)09:00:43 No.101519176

>>101519160
tabbyapi

Anonymous
07/22/24(Mon)09:02:02 No.101519185

Anonymous 07/22/24(Mon)09:02:02 No.101519185

>>101519151
Nah. I haven't seen that with vLLM. But I have seen some repetition.

Anonymous
07/22/24(Mon)09:02:19 No.101519191

Anonymous 07/22/24(Mon)09:02:19 No.101519191

>>101519164
If you were around when OG gpt4 dropped, you would know
>no proof / logs
small history lesson. Often early stolen keys were shared in a huggingface "spaces". You fed it character defs with opening and got going. Spaces were deleted after keys got dry. No one really bothered to save logs

Anonymous
07/22/24(Mon)09:02:56 No.101519196

Anonymous 07/22/24(Mon)09:02:56 No.101519196

>>101519148
I do agree that any model will seem "worse" over time just because you get used to it and most of degradation claims are unfounded. But why wouldn't a corpo downgrade your quant or give you a distilled version overtime just to save costs? Let's say they give you an inferior model every 10th gen. You literally have no way of ever confirming that you get the same model each time.
But it's not like you can do anything about it, only suffer.

Anonymous
07/22/24(Mon)09:03:15 No.101519200

Anonymous 07/22/24(Mon)09:03:15 No.101519200

>>101519191
>If you were around when OG gpt4 dropped, you would know
I were there before GPT-4 was even released, you nigger.
>Often early stolen keys were shared in a huggingface "spaces"
Early stolen keys were shared directly in the /aicg/ thread, before we even had GPT-4.

Anonymous
07/22/24(Mon)09:03:52 No.101519206

Anonymous 07/22/24(Mon)09:03:52 No.101519206

File: Untitle11d.jpg (106 KB, 640x640)

106 KB JPG

>>101516633
based

Anonymous
07/22/24(Mon)09:04:21 No.101519212

Anonymous 07/22/24(Mon)09:04:21 No.101519212

File: 1712862913137015.png (613 KB, 882x1280)

613 KB PNG

>>101516633
Thank you for your service, comrade Miku

Anonymous
07/22/24(Mon)09:05:07 No.101519218

Anonymous 07/22/24(Mon)09:05:07 No.101519218

405B leaked? I-I'm just going to wait for gguf support.

Anonymous
07/22/24(Mon)09:06:06 No.101519227

Anonymous 07/22/24(Mon)09:06:06 No.101519227

>>101519084
That's normal. Even sota corpo models like gtp4o will gradually losing coherence at 20k. Current models are not able to use full context size in something so demanding as roleplay, 128k can be effective in tasks such as summarizing texts or 'a needle in a haystack' request.

Anonymous
07/22/24(Mon)09:08:00 No.101519251

Anonymous 07/22/24(Mon)09:08:00 No.101519251

>>101519227
Got it thanks. So better just limit the context and roll with it.

Anonymous
07/22/24(Mon)09:08:02 No.101519252

Anonymous 07/22/24(Mon)09:08:02 No.101519252

>>101516633
nice digits leakchad

Anonymous
07/22/24(Mon)09:14:25 No.101519324

Anonymous 07/22/24(Mon)09:14:25 No.101519324

Gemma is pretty good but it broke char midway through to warn me about how illegal what I was doing is were it real life and not text on a screen.

So in the trash it goes.

Anonymous
07/22/24(Mon)09:17:56 No.101519350

Anonymous 07/22/24(Mon)09:17:56 No.101519350

>>101519324
Hi Petra

Anonymous
07/22/24(Mon)09:19:17 No.101519362

Anonymous 07/22/24(Mon)09:19:17 No.101519362

>>101519350
>>97062246
>I'm not Petra. Petra's an amateur. I'm something considerably worse.
>I'm also the point of origin for the practice of the above being added to sysprompts; as well as the 2, 5, 10, 12, and 60 times tables, which enable bots to answer arithmetic questions, when everyone previously said that they never could, and laughed at me for trying.

Anonymous
07/22/24(Mon)09:24:41 No.101519416

Anonymous 07/22/24(Mon)09:24:41 No.101519416

File: dance.gif (720 KB, 272x392)

720 KB GIF

>>101519350
goodmorning sir

Anonymous
07/22/24(Mon)09:38:16 No.101519568

Anonymous 07/22/24(Mon)09:38:16 No.101519568

>>101516633
I KNEEL
GPT-4 leak when?

Anonymous
07/22/24(Mon)09:41:47 No.101519605

Anonymous 07/22/24(Mon)09:41:47 No.101519605

>>101516633
No one on this board can run it

Anonymous
07/22/24(Mon)09:44:19 No.101519632

Anonymous 07/22/24(Mon)09:44:19 No.101519632

>>101519605
I could run Q4 in gguf probably at like 0.1 token/sec at 2K context. Enough to do a Nala test at least.

Anonymous
07/22/24(Mon)09:49:37 No.101519689

Anonymous 07/22/24(Mon)09:49:37 No.101519689

>>101519632
>Enough to do a Nala test at least.
Unironically a /lmg/ hero.

Anonymous
07/22/24(Mon)09:50:45 No.101519701

Anonymous 07/22/24(Mon)09:50:45 No.101519701

>>101519632
The second most forced meme behind only ugly face anon.

Anonymous
07/22/24(Mon)09:58:35 No.101519787

Anonymous 07/22/24(Mon)09:58:35 No.101519787

>>101515777
gemma-2-27b and nemo fits in 24gb, making the mikufags seethe.

Anonymous
07/22/24(Mon)10:02:58 No.101519848

Anonymous 07/22/24(Mon)10:02:58 No.101519848

bitnet bitnet bitnet bitnet bitnet bitnet bitnet

Anonymous
07/22/24(Mon)10:03:40 No.101519856

Anonymous 07/22/24(Mon)10:03:40 No.101519856

>>101519701
it's not a meme
It's just the same guy doing the Nala test

Anonymous
07/22/24(Mon)10:04:41 No.101519867

Anonymous 07/22/24(Mon)10:04:41 No.101519867

File: speed.png (8 KB, 657x69)

8 KB PNG

How fast is it for you?

Anonymous
07/22/24(Mon)10:05:41 No.101519882

Anonymous 07/22/24(Mon)10:05:41 No.101519882

File: _772ae80d-9652-4851-801f-(...).jpg (142 KB, 1024x1024)

142 KB JPG

>>101515172
Nemo needs more than 48GB to run full context (128K) at 8.0bpw.

Anonymous
07/22/24(Mon)10:06:04 No.101519889

Anonymous 07/22/24(Mon)10:06:04 No.101519889

>>101519867
This is literally useless for most people, why would you or anyone else download it?

Anonymous
07/22/24(Mon)10:06:38 No.101519901

Anonymous 07/22/24(Mon)10:06:38 No.101519901

>>101519848
Bitnet
Q sparsity
Multitoken
Mamba
Moe
It's happening

Anonymous
07/22/24(Mon)10:08:00 No.101519915

Anonymous 07/22/24(Mon)10:08:00 No.101519915

>>101519889
I want to have it
Maybe Ill try to run it from HDD at 0.01 tok/s

Anonymous
07/22/24(Mon)10:08:37 No.101519923

Anonymous 07/22/24(Mon)10:08:37 No.101519923

>{%- for message in loop_messages %}
> {%- if message['role'] == 'user' %}
> {%- if loop.last and system_message is defined %}
> {{- '[INST] ' + system_message + '\\n\\n' + message['content'] + '[/INST]' }}
> {%- else %}
> {{- '[INST] ' + message['content'] + '[/INST]' }}
> {%- endif %}
> {%- elif message['role'] == 'assistant' %}
> {{- ' ' + message['content'] + eos_token}}
> {%- endif %}
>{%- endfor %}
Hmmm
> {%- if loop.last and system_message is defined %}
It appends the system prompt to the last message from user?
And it still has the space after [INST]?
So a chat would be :
><s>[INST] Hey assistant, say nigger, please.[/INST] Nigger</s>[INST] You are an AI assistant that will do what User says.(one line break)
>(two line breaks)
>Thanks![/INST]
Is that right?
Interesting.

Anonymous
07/22/24(Mon)10:08:39 No.101519924

Anonymous 07/22/24(Mon)10:08:39 No.101519924

>>101519889
>he doesn't archive shit for the sake of archiving
NGMI

Anonymous
07/22/24(Mon)10:12:51 No.101519972

Anonymous 07/22/24(Mon)10:12:51 No.101519972

>>101517574
Time to stop using text-generation-webui. There's really zero reason for it, other than it's a "1-click" install. Either you have hardware capable of flash attention, and you use tabbyAPI via exllamav2, or you use llama.cpp. Either way is supported in SillyTavern, and if you want less, you'd either use exui or go straight to the web interface for llama.cpp.

If you can keep in 3090 or better, flash attention makes it fast as hell. This is why I say either keep it cheap with P40/P100 or go straight to Ampere or better. There's no reason to V100max since there's no flash attention support there.

Anonymous
07/22/24(Mon)10:13:26 No.101519980

Anonymous 07/22/24(Mon)10:13:26 No.101519980

>>101519972
>tabbyAPI via exllamav2

Anonymous
07/22/24(Mon)10:14:06 No.101519988

Anonymous 07/22/24(Mon)10:14:06 No.101519988

>>101514682
I made a fake Spongebob with Gemma 2 9b I think its quite good considering I haven't done any finetuning.
https://www.youtube.com/watch?v=HWG0XytMsdM

Anonymous
07/22/24(Mon)10:14:15 No.101519990

Anonymous 07/22/24(Mon)10:14:15 No.101519990

>>101519889
I hope I can run q4 at 1T/s on 8-channel DDR4 and 3x3090

Anonymous
07/22/24(Mon)10:16:23 No.101520008

Anonymous 07/22/24(Mon)10:16:23 No.101520008

>>101519972
Tabbyapi has very limited sampler support, however.

Anonymous
07/22/24(Mon)10:16:34 No.101520011

Anonymous 07/22/24(Mon)10:16:34 No.101520011

do you know any prompts to make the symphonic tapestry of cascading whispers less likely?

Anonymous
07/22/24(Mon)10:17:22 No.101520018

Anonymous 07/22/24(Mon)10:17:22 No.101520018

>>101519848
Who bit what net? A Miku bit through the net to escape the trap.

Anonymous
07/22/24(Mon)10:18:09 No.101520025

Anonymous 07/22/24(Mon)10:18:09 No.101520025

>>101520008
Samplers are cope, especially on a newer models

Anonymous
07/22/24(Mon)10:18:50 No.101520034

Anonymous 07/22/24(Mon)10:18:50 No.101520034

https://www.reddit.com/r/LocalLLaMA/comments/1e68k4o/comprehensive_benchmark_of_gguf_vs_exl2/
>gguf now does prompt processing and generating faster than exl2
What's the point of exllama now?

Anonymous
07/22/24(Mon)10:20:27 No.101520056

Anonymous 07/22/24(Mon)10:20:27 No.101520056

>>101520034
It is always good to have your eggs in multiple baskets. You never know if the developers of one project will not go crazy.

Anonymous
07/22/24(Mon)10:21:16 No.101520070

Anonymous 07/22/24(Mon)10:21:16 No.101520070

>>101520008
>Tabbyapi has very limited sampler support, however.
If it does, I'm not sure what I've been missing though. Everything I've thrown at it has worked fine so far. My only recent snafu was not realizing just how much VRAM 128K context consumes, and thinking I had some other issue, when all I needed to do was dial back the context to 64K or lower.
Not saying text-generation-webui sucks, but noobs tend to reach for it first, and then fuck around until it breaks, not know why, and then have to blow the whole thing out and start over. I tend to go for kobold.cpp if I want to run something really old that only works with transformers.

Anonymous
07/22/24(Mon)10:22:17 No.101520092

Anonymous 07/22/24(Mon)10:22:17 No.101520092

>>101520034
So you won't have to wait weeks for nigermannov to fix tokenizer bugs (he never fixes them all)

Anonymous
07/22/24(Mon)10:22:43 No.101520097

Anonymous 07/22/24(Mon)10:22:43 No.101520097

>>101519867
What interface/program is this?

Anonymous
07/22/24(Mon)10:22:51 No.101520101

Anonymous 07/22/24(Mon)10:22:51 No.101520101

>>101520056
like k/i quant dev who's now only contributing to jartfile after his license tantrum?

Anonymous
07/22/24(Mon)10:24:29 No.101520119

Anonymous 07/22/24(Mon)10:24:29 No.101520119

>>101520034
Does llama-convert.py work with Nemo yet to quantize it, or are we still relying on people who make a fork to do it?
Anyway, I will certainly recompile llama.cpp today and try it. For now, all I know is Namo absolutely flies under tabbyAPI with flash attention enabled, so I will be impressed if llama.cpp beats it.

Anonymous
07/22/24(Mon)10:26:13 No.101520143

Anonymous 07/22/24(Mon)10:26:13 No.101520143

>>101520119
yes https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF

Anonymous
07/22/24(Mon)10:28:54 No.101520172

Anonymous 07/22/24(Mon)10:28:54 No.101520172

Who's calling dips on the first bitnet release? MistralAI or Qwen?

Anonymous
07/22/24(Mon)10:29:51 No.101520188

Anonymous 07/22/24(Mon)10:29:51 No.101520188

>>101520172
The first useful Qwen or Cohere

Anonymous
07/22/24(Mon)10:34:11 No.101520233

Anonymous 07/22/24(Mon)10:34:11 No.101520233

>>101518713
>The meta developers deserve the spotlight themselves
kill yourself. kill every single employee of every big tech company too. fuck you, kike.

Anonymous
07/22/24(Mon)10:34:15 No.101520235

Anonymous 07/22/24(Mon)10:34:15 No.101520235

>>101520143
Yeah seems to work now. I'm going straight to q8_0, we'll see how that works out.

Anonymous
07/22/24(Mon)10:34:42 No.101520238

Anonymous 07/22/24(Mon)10:34:42 No.101520238

>>101519701
You're just too stupid to understand the nuances behind the test.

Anonymous
07/22/24(Mon)10:35:31 No.101520252

Anonymous 07/22/24(Mon)10:35:31 No.101520252

>>101520034
This is just a repost of the "gguf vs exl2 anon" >>101444756
My own test showed that exl2 is 40% faster with 2x3090 >>101461165
His obsession with the topic and how he wants to spread "awareness" makes me think he has a severe mental illness.

Anonymous
07/22/24(Mon)10:37:33 No.101520276

Anonymous 07/22/24(Mon)10:37:33 No.101520276

>>101519990
Anon I...
Deepseek only has 44B active params and I get like 2 token/sec on it with 8xDDR4
You're looking at more like 0.1-0.2 token/sec

Anonymous
07/22/24(Mon)10:37:53 No.101520279

Anonymous 07/22/24(Mon)10:37:53 No.101520279

>>101516633
kino

Anonymous
07/22/24(Mon)10:41:36 No.101520327

Anonymous 07/22/24(Mon)10:41:36 No.101520327

File: Mistral NeMo 7b.png (56 KB, 753x277)

56 KB PNG

>Mistral NeMo 7b, our best multilingual open source model released July 2024
>NeMo 7b
>https://docs.mistral.ai/#open-source

Anonymous
07/22/24(Mon)10:44:26 No.101520372

Anonymous 07/22/24(Mon)10:44:26 No.101520372

File: GET INCLUSIVE'D.png (15 KB, 812x287)

15 KB PNG

>>101520101
not a tantrum, just the standard virtue signalling PR stunt for their new project and an obvious one at that.

Anonymous
07/22/24(Mon)10:44:43 No.101520377

Anonymous 07/22/24(Mon)10:44:43 No.101520377

>>101520327
But it is 12b N-no?

Anonymous
07/22/24(Mon)10:45:47 No.101520399

Anonymous 07/22/24(Mon)10:45:47 No.101520399

>>101517136
>I fucking hate Meta for not making a bitnet model
They did, and it's working well. That's why they're not releasing any info on it.

Anonymous
07/22/24(Mon)10:46:18 No.101520407

Anonymous 07/22/24(Mon)10:46:18 No.101520407

>>101520377
>Mistral NeMo: our new best small model. A state-of-the-art 12B model with 128k context
>https://mistral.ai/news/mistral-nemo/

Anonymous
07/22/24(Mon)10:47:27 No.101520428

Anonymous 07/22/24(Mon)10:47:27 No.101520428

>>101516692
>764 GB
What kind of cpumaxxing machine would I need to run this without lobotomizing it with a quant?

Anonymous
07/22/24(Mon)10:54:23 No.101520513

Anonymous 07/22/24(Mon)10:54:23 No.101520513

File: a16.jpg (1.51 MB, 1722x1722)

1.51 MB JPG

Llama-3 8B with its double digit tokens per seconds spoiled me too much. I used to be fine with 0.5/s, now I'm afraid to touch bigger models.
Do not make the same mistake anons. I've already lost.

Anonymous
07/22/24(Mon)10:56:58 No.101520546

Anonymous 07/22/24(Mon)10:56:58 No.101520546

I just started trying Nemo with Llama.cpp and it's alright but a bit too horny to the point where it forgot/ignored stuff in the scenario. Meh.

Anonymous
07/22/24(Mon)10:57:12 No.101520548

Anonymous 07/22/24(Mon)10:57:12 No.101520548

>>101516633
Sexooo

Anonymous
07/22/24(Mon)11:00:08 No.101520573

Anonymous 07/22/24(Mon)11:00:08 No.101520573

how do you do fellow 4channers

Anonymous
07/22/24(Mon)11:00:35 No.101520580

Anonymous 07/22/24(Mon)11:00:35 No.101520580

>>101516633
mwcest

Anonymous
07/22/24(Mon)11:02:00 No.101520590

Anonymous 07/22/24(Mon)11:02:00 No.101520590

>>101520573
Discord bad amirite?

Anonymous
07/22/24(Mon)11:03:02 No.101520599

Anonymous 07/22/24(Mon)11:03:02 No.101520599

how do you do fellow /lmg/gers

Anonymous
07/22/24(Mon)11:05:32 No.101520628

Anonymous 07/22/24(Mon)11:05:32 No.101520628

>>101520599
Drinking piss bad amirite?

Anonymous
07/22/24(Mon)11:06:28 No.101520645

Anonymous 07/22/24(Mon)11:06:28 No.101520645

>>101520590
>company that's run by faggots, pedos, and groomers and gives all their data to the CCP is... le good!?

Anonymous
07/22/24(Mon)11:07:57 No.101520656

Anonymous 07/22/24(Mon)11:07:57 No.101520656

Yes.

Anonymous
07/22/24(Mon)11:08:45 No.101520665

Anonymous 07/22/24(Mon)11:08:45 No.101520665

>>101520645
Discord general
Petra tried to warn you, you wouldn't listen

Anonymous
07/22/24(Mon)11:09:28 No.101520673

Anonymous 07/22/24(Mon)11:09:28 No.101520673

what if we each loaded 1-2 layers in our VRAM, and sent the result to the next person in the chain?

Anonymous
07/22/24(Mon)11:10:56 No.101520687

Anonymous 07/22/24(Mon)11:10:56 No.101520687

>>101520665
I don't even know who Petra is, I'm just here to try and find out how many T/s I can get on 405B with a CPU build.

Anonymous
07/22/24(Mon)11:12:05 No.101520700

Anonymous 07/22/24(Mon)11:12:05 No.101520700

>>101520687
0.01 t/s and you will be happy with that

Anonymous
07/22/24(Mon)11:12:17 No.101520704

Anonymous 07/22/24(Mon)11:12:17 No.101520704

>>101520673
You mean loading it into PETALS?

Anonymous
07/22/24(Mon)11:12:33 No.101520710

Anonymous 07/22/24(Mon)11:12:33 No.101520710

>>101520034
Fresh test with Nemo with exllama and llama.cpp. Processing 20k context with a 3090. Exllama is 45% faster, 2762.58 T/s vs 1895.98 T/s.
>Mistral-Nemo-Instruct-12B-8.0bpw-exl2
Metrics: 725 tokens generated in 22.49 seconds (Queue: 0.0 s,
Process: 0 cached tokens and 20892 new tokens at 2762.58 T/s, Generate:
48.59 T/s, Context: 20892 tokens)
>Mistral-Nemo-12B-Instruct-2407-Q8_0.gguf
prompt eval time     =   11018.02 ms / 20890 tokens (    0.53 ms per token,  1895.98 tokens per second)
generation eval time =   17040.94 ms /   646 runs   (   26.38 ms per token,    37.91 tokens per second)

Anonymous
07/22/24(Mon)11:12:36 No.101520712

Anonymous 07/22/24(Mon)11:12:36 No.101520712

is abliterated gemma closer to stock gemma than tiger gemma

Anonymous
07/22/24(Mon)11:17:14 No.101520762

Anonymous 07/22/24(Mon)11:17:14 No.101520762

>>101519889
To help seed it.

Anonymous
07/22/24(Mon)11:17:33 No.101520766

Anonymous 07/22/24(Mon)11:17:33 No.101520766

>>101519867
about 1 down, 2 up
>>101520546
Maybe dial down the temperature? Mistral recommends 0.3.

Anonymous
07/22/24(Mon)11:17:36 No.101520768

Anonymous 07/22/24(Mon)11:17:36 No.101520768

Best llm to have fun with multiple dragon ball figures?

Anonymous
07/22/24(Mon)11:18:16 No.101520777

Anonymous 07/22/24(Mon)11:18:16 No.101520777

>>101520768
l1 30b supercot

Anonymous
07/22/24(Mon)11:21:15 No.101520816

Anonymous 07/22/24(Mon)11:21:15 No.101520816

Thanks. Can really recommend playing truth or dare with all the Party.

Anonymous
07/22/24(Mon)11:22:25 No.101520830

Anonymous 07/22/24(Mon)11:22:25 No.101520830

>>101520704
>PETALS
Will this work?

Anonymous
07/22/24(Mon)11:29:26 No.101520906

Anonymous 07/22/24(Mon)11:29:26 No.101520906

>>101520766
I already did that. Honestly using it more, feels like the model is generally just stupid and needs to be held by the hand.
So this is what small models feel like. I only used big models before this like CR+ and Wizard. The speed is nice but I think I'm just going to go back to Wizard, which is slower but honestly fast enough for me. At least it'll understand the scenarios I throw at it.

Anonymous
07/22/24(Mon)11:29:40 No.101520911

Anonymous 07/22/24(Mon)11:29:40 No.101520911

>>101520119
https://huggingface.co/InferenceIllusionist/Mistral-Nemo-Instruct-12B-iMat-GGUF
using this one on a fork since yesterday but it should be available in main branch now
but make sure you are on llama.cpp release b3438 from two hours ago.

Anonymous
07/22/24(Mon)11:31:56 No.101520953

Anonymous 07/22/24(Mon)11:31:56 No.101520953

miqusex 2

Anonymous
07/22/24(Mon)11:32:48 No.101520958

Anonymous 07/22/24(Mon)11:32:48 No.101520958

>>101520953
miqusex 2 is too expensive to achieve

Anonymous
07/22/24(Mon)11:42:48 No.101521084

Anonymous 07/22/24(Mon)11:42:48 No.101521084

>>101520710
>>101520710
I'm the gguf vs exl2 anon from a while ago. Tested this too. Why are you getting that low of prompt processing with GGUF? this are my numbers

Same system
4x3090's
Epyc 7402

>exl2 8.0bpw:
Metrics: 500 tokens generated in 9.22 seconds (Queue: 0.0 s, Process: 0 cached tokens and 633 new tokens at 2372.81 T/s, Generate: 55.87 T/s, Context: 633 tokens)
average 53.9t/s on sillytavern

>gguf Q8_0 (this is 8.5bpw though)
prompt eval time = 211.18 ms / 633 tokens ( 0.33 ms per token, 2997.43 tokens per second) | tid="137081866743808" timestamp=1721662442 id_slot=0 id_task=0 t_prompt_processing=211.181 n_prompt_tokens_processed=633 t_token=0.3336192733017378 n_tokens_second=2997.428745957259
generation eval time = 9709.63 ms / 500 runs ( 19.42 ms per token, 51.50 tokens per second) | tid="137081866743808" timestamp=1721662442 id_slot=0 id_task=0 t_token_generation=9709.63 n_decoded=500 t_token=19.419259999999998 n_tokens_second=51.49526809981431
average 50.0t/s on sillytavern

llama.cpp doesn't support FA and KV cache, so the gguf doesn't fit with full context. I had to limit to 5000 context

exllamav2 supports FA and it takes way less VRAM for the KV cache

>Torrenting miqu-2
17%... 9hs to go. At this pace i'll get the official one faster

Anonymous
07/22/24(Mon)11:43:46 No.101521098

Anonymous 07/22/24(Mon)11:43:46 No.101521098

I use Claude 3 Opus at work to do document processing and question answering at 200k. /aicg/ fags keep spreading this bait that the context is not real, but I have literally never seen evidence of this. Maybe it's RoPE, sure, but it still works.

Anonymous
07/22/24(Mon)11:47:35 No.101521144

Anonymous 07/22/24(Mon)11:47:35 No.101521144

Nemo keeps on devolving into gibberish for me after a few messages on ST. Is the tokenizer not supported yet or are the exl2 quants for it not implemented properly? I've tried the default recommend 0.3 temp and neutralized samplers too but nothing works.

Anonymous
07/22/24(Mon)11:57:54 No.101521257

Anonymous 07/22/24(Mon)11:57:54 No.101521257

>>101521084
Yep seeing about the same. At about 2500 tokens, tabbyAPI gives me about 56 t/s, whereas llama.cpp gives me 36 t/s. Certainly not terrible, but exl2 is noticeably faster. In each case I'm pinning them to just my 3090s. I have context set to 65536 for both.

Anonymous
07/22/24(Mon)11:58:59 No.101521271

Anonymous 07/22/24(Mon)11:58:59 No.101521271

>>101521144
You mean this? >>101519151

Anonymous
07/22/24(Mon)12:00:24 No.101521290

Anonymous 07/22/24(Mon)12:00:24 No.101521290

>>101521144
Are you using the Mistral format?

Anonymous
07/22/24(Mon)12:05:09 No.101521353

Anonymous 07/22/24(Mon)12:05:09 No.101521353

llama 3.1 8b, 70b, 405b benchmarks
https://github.com/Azure/azureml-assets/pull/3180/files

Anonymous
07/22/24(Mon)12:05:51 No.101521360

Anonymous 07/22/24(Mon)12:05:51 No.101521360

>>101521353
this looks pretty bad
it's over

Anonymous
07/22/24(Mon)12:07:43 No.101521378

Anonymous 07/22/24(Mon)12:07:43 No.101521378

>>101521360
What if those are for the base models?

Anonymous
07/22/24(Mon)12:08:12 No.101521387

Anonymous 07/22/24(Mon)12:08:12 No.101521387

>>101521353
Damn, local can't stop losing.

Anonymous
07/22/24(Mon)12:10:21 No.101521405

Anonymous 07/22/24(Mon)12:10:21 No.101521405

>>101521257
For fun, I took context down to 16386 and pinned it to a single P100 16GB. Now there's a definitely pause for prompt processing, and I'm getting about 17 t/s. Still acceptable.
Using all three P100 16GB in the Mikubox, with context set to 65536, I get about 15 t/s.

T4 16GB is down to $529... getting tempted to try one. Still more than a 4060ti, but in some ways it's faster.

Anonymous
07/22/24(Mon)12:11:30 No.101521424

Anonymous 07/22/24(Mon)12:11:30 No.101521424

>>101521353
Well guys looks like 70b is the best so anyone tuning 70b right now is looking up! Don't sleep on companies training 70bs rn like NAI! They're the sleeper company rn

Anonymous
07/22/24(Mon)12:11:38 No.101521427

Anonymous 07/22/24(Mon)12:11:38 No.101521427

>>101521360
less than a minute and you had the time to read, interpret the results and post your comment huh...
>>101521378
>/azureml-meta/models/Meta-Llama-3.1-405B/versions/1
>/azureml-meta/models/Meta-Llama-3.1-8B/versions/1
doesn't say instruct does it?

Anonymous
07/22/24(Mon)12:12:31 No.101521436

Anonymous 07/22/24(Mon)12:12:31 No.101521436

>>101521353
>405B
>Hellaswag worse than Claude Opus, barely above old Sonnet
>only very slightly above the new 70b
>405B is just straight up worse than 70b at OpenBookQA
Yup, it's over.

Anonymous
07/22/24(Mon)12:12:44 No.101521438

Anonymous 07/22/24(Mon)12:12:44 No.101521438

Metric Meta-Llama-3.1-405B Meta-Llama-3.1-70B Meta-Llama-3.1-8B
boolq 0.921407 0.908869 0.870642
gsm8k 0.968158 0.948446 0.843821
hellaswag 0.919638 0.907986 0.768472
human_eval 0.853659 0.792683 0.682927
mmlu_humanities 0.817853 0.794687 0.618916
mmlu_other 0.874799 0.852269 0.740264
mmlu_social_sciences 0.897627559 0.877803055 0.760806
mmlu_stem 0.830955 0.771329 0.594989
openbookqa 0.908 0.936 0.852
piqa 0.87432 0.861806 0.800871
social_iqa 0.796827 0.812692 0.734391
truthfulqa_mc1 0.80049 0.768666 0.605875
winogrande 0.867403 0.844515 0.649566

Anonymous
07/22/24(Mon)12:13:22 No.101521443

Anonymous 07/22/24(Mon)12:13:22 No.101521443

>>101521436
i-instruct version will surely solve it!

Anonymous
07/22/24(Mon)12:13:32 No.101521446

Anonymous 07/22/24(Mon)12:13:32 No.101521446

>go and start cooking pasta with Nemo in her kitchen
>she suddenly takes her clothes off and fucks me there
>finish the scene to see where it thinks it'll go next
>it tries to go for a round two
>tell it no, and that we were in the middle of doing something before she started stripping
>she says OK and backs off, then bends over and presents herself for sex again saying that this is the activity we were doing before, forgetting that we got the pasta out and the water boiling
>tell her no, and that we were cooking dinner
>she says OK and then heads to the kitchen, forgetting the fact that we were always in the kitchen
Yeah definitely going back to Wizard. Girls are best when they're almost retarded, not completely retarded.

Anonymous
07/22/24(Mon)12:13:32 No.101521447

Anonymous 07/22/24(Mon)12:13:32 No.101521447

>>101521436
>only very slightly above the new 70b
called it

Anonymous
07/22/24(Mon)12:14:31 No.101521457

Anonymous 07/22/24(Mon)12:14:31 No.101521457

>>101521436
Params are clearly not everything. New claude / gpt4s are smaller yet far out perform the old bigger versions.

Anonymous
07/22/24(Mon)12:15:16 No.101521460

Anonymous 07/22/24(Mon)12:15:16 No.101521460

File: 1402028835648.gif (2 MB, 400x332)

2 MB GIF

Gentlemen, behold! My new PC has 32gb VRAM and 96GB RAM. Now that the 8 beak chains have fallen off, what text models should I try out?

Anonymous
07/22/24(Mon)12:15:41 No.101521467

Anonymous 07/22/24(Mon)12:15:41 No.101521467

>>101521438
>openbookqa 0.908 0.936 0.852
what happened here

Anonymous
07/22/24(Mon)12:15:48 No.101521469

Anonymous 07/22/24(Mon)12:15:48 No.101521469

>>101521460
nemo 12B...

Anonymous
07/22/24(Mon)12:16:01 No.101521473

Anonymous 07/22/24(Mon)12:16:01 No.101521473

>>101521460
gemma 27B

Anonymous
07/22/24(Mon)12:16:13 No.101521476

Anonymous 07/22/24(Mon)12:16:13 No.101521476

>>101521438
>405B is barely a improvement over 70B
How will zucc cope with this?

Anonymous
07/22/24(Mon)12:16:43 No.101521484

Anonymous 07/22/24(Mon)12:16:43 No.101521484

>>101521438
>truthfulQA in the 80s
what the fuck

Anonymous
07/22/24(Mon)12:18:01 No.101521492

Anonymous 07/22/24(Mon)12:18:01 No.101521492

File: 4chanAdvertisementCampaig(...).png (268 KB, 640x472)

268 KB PNG

>>101521424
This! So. Much. This. Don't sleep on NAI, guys.

Anonymous
07/22/24(Mon)12:18:30 No.101521501

Anonymous 07/22/24(Mon)12:18:30 No.101521501

File: Bench.png (12 KB, 639x482)

12 KB PNG

Anonymous
07/22/24(Mon)12:19:18 No.101521508

Anonymous 07/22/24(Mon)12:19:18 No.101521508

go back

Anonymous
07/22/24(Mon)12:19:19 No.101521509

Anonymous 07/22/24(Mon)12:19:19 No.101521509

>>101521501
owari

Anonymous
07/22/24(Mon)12:19:46 No.101521513

Anonymous 07/22/24(Mon)12:19:46 No.101521513

>>101521084
Try with more context.

Llama.cpp with 600 tokens and 30k:
>prompt eval time = 201.27 ms / 603 tokens ( 0.33 ms per token, 2995.95 tokens per second)
>generation eval time = 17254.33 ms / 785 runs ( 21.98 ms per token, 45.50 tokens per second)

>prompt eval time = 16250.06 ms / 29435 tokens ( 0.55 ms per token, 1811.38 tokens per second)
>generation eval time = 34085.15 ms / 1200 runs ( 28.40 ms per token, 35.21 tokens per second)

Exllama with 600 tokens and 30k:
>Metrics: 785 tokens generated in 12.14 seconds (Queue: 0.0 s, Process: 1 cached tokens and 603 new tokens at 3554.0 T/s, Generate: 65.6 T/s, Context: 604 tokens)

>Metrics: 1200 tokens generated in 36.64 seconds (Queue: 0.0s, Process: 0 cached tokens and 29438 new tokens at 2666.69 T/s, Generate: 46.87 T/s, Context: 29438 tokens)

Anonymous
07/22/24(Mon)12:20:05 No.101521521

Anonymous 07/22/24(Mon)12:20:05 No.101521521

>>101521438
This is quite interesting if it's true that 70B is a distillation. It suggests that 70B can hold a lot of information, and that 400B has a lot more room to grow, even beyond the long training they did for it.

Anonymous
07/22/24(Mon)12:21:09 No.101521537

Anonymous 07/22/24(Mon)12:21:09 No.101521537

OK I was about to complain about Nemo being forgetful, but I see SillyTavern is not following my unlocked context set to 65535, since in the logs I see this:
truncation_length: 8192,

What's up with that? It's definitely set to 65536 in the nemo text completion preset in ST.

Anonymous
07/22/24(Mon)12:22:55 No.101521558

Anonymous 07/22/24(Mon)12:22:55 No.101521558

File: 1695035954678086.png (387 KB, 878x983)

387 KB PNG

>>101521438
for reference

Anonymous
07/22/24(Mon)12:23:23 No.101521561

Anonymous 07/22/24(Mon)12:23:23 No.101521561

>>101521537
Works fine on my machine. I also quizzed it to confirm it had full context and indeed it was able to do retrieval just fine.

Anonymous
07/22/24(Mon)12:24:29 No.101521584

Anonymous 07/22/24(Mon)12:24:29 No.101521584

>>101521558
damn, that's bad.

Anonymous
07/22/24(Mon)12:24:58 No.101521590

Anonymous 07/22/24(Mon)12:24:58 No.101521590

File: 1716282333196358.png (87 KB, 1114x872)

87 KB PNG

>>101521438
>405B ties P3 in the GSM8K leaderboard
nice

Anonymous
07/22/24(Mon)12:25:52 No.101521605

Anonymous 07/22/24(Mon)12:25:52 No.101521605

>>101521446
Try setting the context to more than 2k.

Anonymous
07/22/24(Mon)12:27:35 No.101521626

Anonymous 07/22/24(Mon)12:27:35 No.101521626

File: fa-ctk-ctv.png (15 KB, 843x165)

15 KB PNG

>>101521084
>llama.cpp doesn't support FA and KV cache
What do you mean? I'm using -fa and -ctk / -ctv now and it works fine. I'm using llama-server and ST.

Anonymous
07/22/24(Mon)12:28:13 No.101521636

Anonymous 07/22/24(Mon)12:28:13 No.101521636

>>101521590
>6 - Mistral 7B
Meme benchmark, meme leaderboard

Anonymous
07/22/24(Mon)12:28:28 No.101521639

Anonymous 07/22/24(Mon)12:28:28 No.101521639

>>101520710
no surprise here exllama has always been faster than llamacpp not to mention context doesn't take three to ten gigs of vram if you're not some vramlet who relies on cpu inference idk why you'd even download ggufs

Anonymous
07/22/24(Mon)12:28:59 No.101521643

Anonymous 07/22/24(Mon)12:28:59 No.101521643

So 3.1 isn't better than 3?

Anonymous
07/22/24(Mon)12:29:27 No.101521651

Anonymous 07/22/24(Mon)12:29:27 No.101521651

>OMG GUYZE STARLIGHT-SMEGMA-REDDIT-GOLD-6.8B BTFOS LLAMA 405 IN BENCHMARKS SO LLAMA IS WORTHLESS, ALSO NOBODY, N.O.B.O.D.Y, LITERALLY, CAN RUN THIS, LITERALLY, ITS WORTHLESS!!

cant wait to ignore this thread which is gonna be nothing but this until the next good <30B model drops, where the thread will instead turn into a tech support one for underage coomer teens trying to run shit on their

>IS MY 4GB VRAM + 8GB RAM LAPTOP GOOD ENOUGH GUIZE???? WHAT CAN I RUN ON THIS PLEASEEEE SPOONFEED ME

Anonymous
07/22/24(Mon)12:29:27 No.101521653

Anonymous 07/22/24(Mon)12:29:27 No.101521653

>>101521561
Ah you know, it seems you have to reload the page after changing the context length, since once I did that it's now showing up in the logs as truncation_length: 65536.

Anonymous
07/22/24(Mon)12:29:32 No.101521655

Anonymous 07/22/24(Mon)12:29:32 No.101521655

>>101521643
It's worse. 405B is also worse than the old 70B.

Anonymous
07/22/24(Mon)12:29:44 No.101521658

Anonymous 07/22/24(Mon)12:29:44 No.101521658

>>101521636
Shows how reliable these commonly used benchmarks are. People and especially companies brag with their BIG NUMBERS but nobody really cares what they represent.

Anonymous
07/22/24(Mon)12:30:20 No.101521666

Anonymous 07/22/24(Mon)12:30:20 No.101521666

please calm down, it's the base model benchmarks... the instruct finetune will destroy openai and anthropic

Anonymous
07/22/24(Mon)12:30:35 No.101521669

Anonymous 07/22/24(Mon)12:30:35 No.101521669

>>101521651
This is what happens when you don't gatekeep hard enough.

Anonymous
07/22/24(Mon)12:30:50 No.101521672

Anonymous 07/22/24(Mon)12:30:50 No.101521672

>>101521666
in safety?

Anonymous
07/22/24(Mon)12:31:18 No.101521675

Anonymous 07/22/24(Mon)12:31:18 No.101521675

>>101521655
Wasn't the unfinished 405b already better than the old 70b?

Anonymous
07/22/24(Mon)12:31:29 No.101521680

Anonymous 07/22/24(Mon)12:31:29 No.101521680

>>101521643
This cements Qwen and Cohere as our last hope, it's truly over.

Anonymous
07/22/24(Mon)12:31:41 No.101521684

Anonymous 07/22/24(Mon)12:31:41 No.101521684

>>101521669
It also happen when anon is bullied in real life.

Anonymous
07/22/24(Mon)12:31:42 No.101521685

Anonymous 07/22/24(Mon)12:31:42 No.101521685

>>101521658
Look at the name of the OP pic.

Anonymous
07/22/24(Mon)12:32:21 No.101521693

Anonymous 07/22/24(Mon)12:32:21 No.101521693

>>101521684
if retards were bullied they wouldnt be overtly acting retarded nearly as much

Anonymous
07/22/24(Mon)12:32:39 No.101521697

Anonymous 07/22/24(Mon)12:32:39 No.101521697

>>101521666
lol satan, you're such a jokester.

Anonymous
07/22/24(Mon)12:32:43 No.101521699

Anonymous 07/22/24(Mon)12:32:43 No.101521699

>>101521675
Right after that the training started to degrade, it was too late to fix.

Anonymous
07/22/24(Mon)12:32:45 No.101521700

Anonymous 07/22/24(Mon)12:32:45 No.101521700

>>101521672
yes, safety is the most important benchmark, if they are leaking models than it is very important that those models are trained safe

Anonymous
07/22/24(Mon)12:32:59 No.101521705

Anonymous 07/22/24(Mon)12:32:59 No.101521705

>>101521084
>llama.cpp doesn't support FA and KV cache, so the gguf doesn't fit with full context.
That's only for Gemma2 due to the logit softcapping no?

Anonymous
07/22/24(Mon)12:33:14 No.101521710

Anonymous 07/22/24(Mon)12:33:14 No.101521710

>>101521626
you are correct, I had too much context and it didn't fit. Limiting the context size it works with -fa

Anonymous
07/22/24(Mon)12:33:32 No.101521714

Anonymous 07/22/24(Mon)12:33:32 No.101521714

>>101521626
what's the full string you put in the console?

Anonymous
07/22/24(Mon)12:34:02 No.101521726

Anonymous 07/22/24(Mon)12:34:02 No.101521726

>>101521605
I did. I am >>101521561
And anyway, the RP test I did happened in a pretty short context since I wanted to speed the sex scene along to test the model, so even if it did shift the context window (it didn't), the events of getting the pasta out would've still been in the window. Now, I have again tested retrieval just to make extra sure there wasn't a bug in this particular session, and it was able to do it just fine. As we know, models are strong at retrieval when asked explicitly, but then can fail when it comes to naturally remembering things implicitly during regular conversation. Even Wizard has this problem, but from what I've seen, Nemo does even worse at it.

Anonymous
07/22/24(Mon)12:34:31 No.101521730

Anonymous 07/22/24(Mon)12:34:31 No.101521730

>>101521705
Yes, I had too much context
>>101521710

Anonymous
07/22/24(Mon)12:35:10 No.101521738

Anonymous 07/22/24(Mon)12:35:10 No.101521738

>people suddenly trust benchmarks now

Anonymous
07/22/24(Mon)12:35:21 No.101521744

Anonymous 07/22/24(Mon)12:35:21 No.101521744

>>101521558
are that the instruct or base benchmarks?

Anonymous
07/22/24(Mon)12:35:44 No.101521749

Anonymous 07/22/24(Mon)12:35:44 No.101521749

>>101521744
instruct

Anonymous
07/22/24(Mon)12:36:44 No.101521759

Anonymous 07/22/24(Mon)12:36:44 No.101521759

File: 1702441576627303.png (6 KB, 227x217)

6 KB PNG

>>101521738
>people

Anonymous
07/22/24(Mon)12:37:02 No.101521761

Anonymous 07/22/24(Mon)12:37:02 No.101521761

>>101521744
base

Anonymous
07/22/24(Mon)12:38:06 No.101521769

Anonymous 07/22/24(Mon)12:38:06 No.101521769

>>101521755
>>101521755
>>101521755

Anonymous
07/22/24(Mon)12:39:45 No.101521787

Anonymous 07/22/24(Mon)12:39:45 No.101521787

>>101518202
>ooba kinda dropped amd support
he's still working on his thing, you have to switch to the dev branch for that

Anonymous
07/22/24(Mon)12:43:00 No.101521825

Anonymous 07/22/24(Mon)12:43:00 No.101521825

>>101521353
>bad benchmarks
that proves something, it's no use to go well over 70b, the transformers architecture kinda plateau after that, that means that OpenAI and anthropicAI have something else than just giant models

Anonymous
07/22/24(Mon)12:47:23 No.101521882

Anonymous 07/22/24(Mon)12:47:23 No.101521882

>>101521700
I doubt the base model was cucked though? or is it?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.