/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/20/24(Sat)16:42:17 No.101497246

File: 39_04247_.png (1.03 MB, 896x1152)

1.03 MB PNG

/lmg/ - Local Models General Anonymous 07/20/24(Sat)16:42:17 No.101497246 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101488042 & >>101474151

►News
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/20/24(Sat)16:42:43 No.101497256

Anonymous 07/20/24(Sat)16:42:43 No.101497256

File: __hatsune_miku_kagamine_r(...).jpg (177 KB, 650x650)

177 KB JPG

►Recent Highlights from the Previous Thread: >>101488042

--Critiquing the repetitiveness of AI-generated content and reflecting on the appeal of erotic themes: >>101489564
--Wft Fish Audio: Surprisingly Good Quality, but a Bit Slow: >>101490020 >>101496116 >>101496152 >>101496782
--Uncanny Valley in Erotic Text Generation: Overcoming Shiverslop with Data Diversity and Artificial Augmentation: >>101491690 >>101491779 >>101493848 >>101493894 >>101494179 >>101494246 >>101494626 >>101494952 >>101493921 >>101491884
--The Eternal Curse of GPT-isms in LLMs: >>101488157 >>101488303 >>101488469 >>101495367
--Seeking Small Local Text Model for Wording Improvement and Corpo Sugar: >>101494211 >>101494503 >>101494655
--Release of LimaRP-DS dataset and the sunfall-v0.5 model: >>101492700 >>101492792
--Mistral NeMo Instruct System Messages Rant and the Emergent Nature of AI Functionality: >>101492374 >>101492849 >>101492864 >>101492914 >>101492938 >>101493650 >>101492971 >>101493739
--Is Mistral FP8 Really Lossless?: >>101489057 >>101489085 >>101489630 >>101489120 >>101489221
--Purchasing a Server with 8xAMD Instinct Mi100 32GB GPU's for ?7000: Is It Worth It?: >>101490545 >>101490725 >>101490998 >>101491029 >>101491087 >>101491207 >>101491640 >>101492046 >>101492116 >>101490787
--Mistral Compatibility with llama.cpp: Unofficial Solution Available: >>101492182 >>101492241 >>101492318
--400b Model: Savior or Coffin Nail for Open-Source LLMs?: >>101490382 >>101490423 >>101490811 >>101493445
--Miku (free space): >>101492219

►Recent Highlight Posts from the Previous Thread: >>101488050

Anonymous
07/20/24(Sat)16:45:38 No.101497292

Anonymous 07/20/24(Sat)16:45:38 No.101497292

>>101492328
I download random chars off the internet and try to speedrun consensual sex.

Anonymous
07/20/24(Sat)16:47:01 No.101497317

Anonymous 07/20/24(Sat)16:47:01 No.101497317

Rin sex

Anonymous
07/20/24(Sat)16:47:43 No.101497330

Anonymous 07/20/24(Sat)16:47:43 No.101497330

Do (you) think 2025 will be the year the various LLM companies will start rushing Multi-models, or do you think they will stick with improving LLM's that year?

Anonymous
07/20/24(Sat)16:50:08 No.101497367

Anonymous 07/20/24(Sat)16:50:08 No.101497367

>>101497330
I just want CLIP tiling. Image generation is retarded.

Anonymous
07/20/24(Sat)16:51:36 No.101497391

Anonymous 07/20/24(Sat)16:51:36 No.101497391

File: femslop.png (260 KB, 488x656)

260 KB PNG

>>101497205
here is an excerpt of a sex scene in haunting adeline
women get off to this
shiverslop is steinbeck in comparison

Anonymous
07/20/24(Sat)16:55:27 No.101497456

Anonymous 07/20/24(Sat)16:55:27 No.101497456

>>101497391
It's not about the exact writing it's the interaction/attention to what you're saying that makes it so good.

Anonymous
07/20/24(Sat)16:55:30 No.101497458

Anonymous 07/20/24(Sat)16:55:30 No.101497458

>decide to finally check out all the cards that exist for a certain popular character
>literally not a single one of them is good
>the one that's tryhard and has more lore details is full of ESL mistakes that make it hard to understand
Jesus.
>if only you knew how bad things really are

Anonymous
07/20/24(Sat)16:59:52 No.101497516

Anonymous 07/20/24(Sat)16:59:52 No.101497516

>>101497391
Model?????

Anonymous
07/20/24(Sat)17:01:02 No.101497526

Anonymous 07/20/24(Sat)17:01:02 No.101497526

>>101497458
Stop bitching and make your own cards

Anonymous
07/20/24(Sat)17:03:01 No.101497551

Anonymous 07/20/24(Sat)17:03:01 No.101497551

>>101497330
The big companies like meta. Most won't fall for the meme

Anonymous
07/20/24(Sat)17:03:29 No.101497560

Anonymous 07/20/24(Sat)17:03:29 No.101497560

>>101497516
Human-100B

Anonymous
07/20/24(Sat)17:04:32 No.101497574

Anonymous 07/20/24(Sat)17:04:32 No.101497574

File: 2hzG.gif (727 KB, 500x284)

727 KB GIF

I'm at ICML, if anybody wants to meet up

Anonymous
07/20/24(Sat)17:04:38 No.101497576

Anonymous 07/20/24(Sat)17:04:38 No.101497576

>>101497458
>using other people's characters
Why?

Anonymous
07/20/24(Sat)17:04:38 No.101497577

Anonymous 07/20/24(Sat)17:04:38 No.101497577

>>101497526
I do though.

Anonymous
07/20/24(Sat)17:06:33 No.101497605

Anonymous 07/20/24(Sat)17:06:33 No.101497605

>be you
>spend 2000$ on gpu to make big booba anime girl
>be me
>spend $0 to look up big booba anime girl, somehow looks better
I think you've all been cheated.

Anonymous
07/20/24(Sat)17:08:16 No.101497628

Anonymous 07/20/24(Sat)17:08:16 No.101497628

>>101497605
wrong general

Anonymous
07/20/24(Sat)17:08:43 No.101497630

Anonymous 07/20/24(Sat)17:08:43 No.101497630

>>101497605
I can talk with mine.

Anonymous
07/20/24(Sat)17:09:01 No.101497636

Anonymous 07/20/24(Sat)17:09:01 No.101497636

>>101497605
>be you
>spend 2000 hrs looking for specific big booba anime girl in specific pose
>be me
>spend 1 hr generating big booba anime girl in that pose
i think you've been cheated

Anonymous
07/20/24(Sat)17:09:44 No.101497641

Anonymous 07/20/24(Sat)17:09:44 No.101497641

>>101497605
If you were a true gamer you wouldn't have that problem

Anonymous
07/20/24(Sat)17:10:41 No.101497648

Anonymous 07/20/24(Sat)17:10:41 No.101497648

>>101497576
I was just curious what the state of things was as I never actually bothered to conduct a full read of the landscape. I mean I knew things were bad but I feel like there should've been at least 1 good card out of the 6 that popped up.

Anonymous
07/20/24(Sat)17:15:02 No.101497691

Anonymous 07/20/24(Sat)17:15:02 No.101497691

>>101497636
>1 hr
wtf anon

Anonymous
07/20/24(Sat)17:16:16 No.101497702

Anonymous 07/20/24(Sat)17:16:16 No.101497702

>>101497574
what is that?

Anonymous
07/20/24(Sat)17:18:37 No.101497728

Anonymous 07/20/24(Sat)17:18:37 No.101497728

>>101497702
"I Cum to Machine Learning"

Anonymous
07/20/24(Sat)17:23:59 No.101497800

Anonymous 07/20/24(Sat)17:23:59 No.101497800

>>101497691
put effort into your sloppa

Anonymous
07/20/24(Sat)17:24:12 No.101497803

Anonymous 07/20/24(Sat)17:24:12 No.101497803

>>101497560
don't be harsh, women have at least 150b

Anonymous
07/20/24(Sat)17:28:32 No.101497856

Anonymous 07/20/24(Sat)17:28:32 No.101497856

>this is not an up-merged 70b.
https://huggingface.co/PrimeIntellect/Meta-Llama-3-405B-Instruct/discussions/1
well vram chads?

Anonymous
07/20/24(Sat)17:40:15 No.101497996

Anonymous 07/20/24(Sat)17:40:15 No.101497996

File: neurons.png (35 KB, 695x291)

35 KB PNG

>>101497803

Anonymous
07/20/24(Sat)17:42:39 No.101498029

Anonymous 07/20/24(Sat)17:42:39 No.101498029

>>101497856
sigh *unzips vram*

Anonymous
07/20/24(Sat)17:43:15 No.101498040

Anonymous 07/20/24(Sat)17:43:15 No.101498040

>>101497996
Nb is the number of parameters, not neurons
each "artificial neuron" has multiple parameters

Anonymous
07/20/24(Sat)17:43:23 No.101498043

Anonymous 07/20/24(Sat)17:43:23 No.101498043

>>101497996
Biological neurons are way more complex than simple MLP weights.

Anonymous
07/20/24(Sat)17:47:40 No.101498092

Anonymous 07/20/24(Sat)17:47:40 No.101498092

>>101497856
Just test the 8b, "vocab_size": 128256

Anonymous
07/20/24(Sat)17:48:23 No.101498101

Anonymous 07/20/24(Sat)17:48:23 No.101498101

Comparing biological neurons to parameters in a digital neural network is ridiculous.

Anonymous
07/20/24(Sat)17:48:49 No.101498107

Anonymous 07/20/24(Sat)17:48:49 No.101498107

https://huggingface.co/mradermacher/Meta-Llama-3-405B-Instruct-Up-Merge-GGUF/tree/main
>only q8
aaaaaaaaaa
i could technically fit it on my external drive but trying to quant from there will take ages

Anonymous
07/20/24(Sat)17:49:03 No.101498112

Anonymous 07/20/24(Sat)17:49:03 No.101498112

>>101498092
oops misread context size

Anonymous
07/20/24(Sat)17:49:31 No.101498118

Anonymous 07/20/24(Sat)17:49:31 No.101498118

The new mirror is decent, but the difference between 8/9/12 and even 27 seems to be so fucking small that it makes me rather skeptical about where LLMs are heading. Or rather, I did get so spoiled since the original Llama was released that I am not even able to perceive the great steps we see. I mean, I would kill for functional 128k context a year ago, and yet now that I have it, it just feels kind of ok. I did become officialy a retard.

Anonymous
07/20/24(Sat)17:52:16 No.101498156

Anonymous 07/20/24(Sat)17:52:16 No.101498156

>>101498101
How else can we compare?

Anonymous
07/20/24(Sat)17:53:20 No.101498176

Anonymous 07/20/24(Sat)17:53:20 No.101498176

>>101497148
More.

>>101496965
Really? Haven’t the worst offenders stopped bot making entirely?

Anonymous
07/20/24(Sat)17:53:26 No.101498177

Anonymous 07/20/24(Sat)17:53:26 No.101498177

>>101498156
you don't

Anonymous
07/20/24(Sat)17:54:50 No.101498192

Anonymous 07/20/24(Sat)17:54:50 No.101498192

>>101498107
>not having a beowulf cluster of multi-petabyte iomega drives

Anonymous
07/20/24(Sat)17:55:55 No.101498212

Anonymous 07/20/24(Sat)17:55:55 No.101498212

>>101498177
There has to exist something else we can compare it with.

Anonymous
07/20/24(Sat)17:56:34 No.101498219

Anonymous 07/20/24(Sat)17:56:34 No.101498219

>>101498101
Human brains are the ideal architecture for intelligence. If 100B fully connected parameters can't match the performance of a human brain, it's over

Anonymous
07/20/24(Sat)17:57:32 No.101498237

Anonymous 07/20/24(Sat)17:57:32 No.101498237

>>101498156
We've been comparing CPU and brain processing speed for decades. It never made sense, and it still doesn't.

Anonymous
07/20/24(Sat)17:59:03 No.101498255

Anonymous 07/20/24(Sat)17:59:03 No.101498255

>>101497856
Definitely seems like something is wrong with that config. 70B has 80 hidden layers. How could this have 10?

Anonymous
07/20/24(Sat)18:01:22 No.101498294

Anonymous 07/20/24(Sat)18:01:22 No.101498294

What happens when you get really high and rp

Anonymous
07/20/24(Sat)18:02:01 No.101498305

Anonymous 07/20/24(Sat)18:02:01 No.101498305

>>101498219
That's not my point. My point is that comparing digital to biological neural network by metrics other than their outputs is retarded. Just because they're called "neural networks" and have what we call "neurons" doesn't make them comparable.

Anonymous
07/20/24(Sat)18:02:08 No.101498308

Anonymous 07/20/24(Sat)18:02:08 No.101498308

>>101498255
it neither has 80 nor does it have 10 layers you absolute mongoloid

Anonymous
07/20/24(Sat)18:02:34 No.101498319

Anonymous 07/20/24(Sat)18:02:34 No.101498319

>>101498237
By all means, offer an alternative.

Anonymous
07/20/24(Sat)18:03:09 No.101498328

Anonymous 07/20/24(Sat)18:03:09 No.101498328

>>101498308
>>101498255
well now it has none
>404

Anonymous
07/20/24(Sat)18:03:31 No.101498334

Anonymous 07/20/24(Sat)18:03:31 No.101498334

>>101497996
Parameters do not have a fraction of the flexibility of Neurons. This might eventually change with Neuromorphic computing but until that happens you will need to have much more parameters then Neurons to compensation for their limitations.

Anonymous
07/20/24(Sat)18:04:02 No.101498337

Anonymous 07/20/24(Sat)18:04:02 No.101498337

you did download it, right?

Anonymous
07/20/24(Sat)18:04:35 No.101498347

Anonymous 07/20/24(Sat)18:04:35 No.101498347

>neuromorphic computing

Anonymous
07/20/24(Sat)18:04:42 No.101498348

Anonymous 07/20/24(Sat)18:04:42 No.101498348

>>101498337
fuck no, i ain't wasting a tb on 8k context

Anonymous
07/20/24(Sat)18:06:04 No.101498363

Anonymous 07/20/24(Sat)18:06:04 No.101498363

Neural networks are just smart onions

Anonymous
07/20/24(Sat)18:07:30 No.101498380

Anonymous 07/20/24(Sat)18:07:30 No.101498380

>>101498319
I just did
>>101498305
>My point is that comparing digital to biological neural network by metrics other than their outputs is retarded.
By that i mean that what matters is the output. If the output is indistinguishable from that of a human and can do exactly the exact same of processing we do, then we can start saying "this many parameters == this many neurons/synapses". But even then, better tech could show up that changes the ratio. So no reasonable comparison until we get to a ratio of 1. And it won't be a stable number either.

Anonymous
07/20/24(Sat)18:19:49 No.101498513

Anonymous 07/20/24(Sat)18:19:49 No.101498513

File: 1707707160777151.png (54 KB, 749x136)

54 KB PNG

pffff, alright, i keked

Anonymous
07/20/24(Sat)18:22:58 No.101498548

Anonymous 07/20/24(Sat)18:22:58 No.101498548

>>101498380
NTA
I still prefer the "binary number of decisions per second" and "max associations in working memory" as metrics for human cognition.
Biological neurons vs MLP weights doesn't make any sense. Real neurons have a location in space and *move.* They grow connections, have internal chemical state, respond to different stimulus frequencies differently etc. They're insanely complex and really nothing at all like the weights in ML models.

Anonymous
07/20/24(Sat)18:26:10 No.101498584

Anonymous 07/20/24(Sat)18:26:10 No.101498584

>>101498513
I miss when fine-tuning was done out of the passion of having better bots and not to gain discord karma

Anonymous
07/20/24(Sat)18:31:47 No.101498650

Anonymous 07/20/24(Sat)18:31:47 No.101498650

>>101497148
What a waste. All that text just to start every sentence nearly the same way with she and her just like an 8b. Did you ban all proper nouns tokens or something?
Also:
>Permit me
>Permit me
>Permit me
>Permit me

Anonymous
07/20/24(Sat)18:37:10 No.101498702

Anonymous 07/20/24(Sat)18:37:10 No.101498702

File: 1645307010138.png (2 KB, 179x139)

2 KB PNG

If quantizing models to 8 bpw is virtually lossless, why don't model makers just train their models like that natively and cut off the fat?

Anonymous
07/20/24(Sat)18:39:00 No.101498728

Anonymous 07/20/24(Sat)18:39:00 No.101498728

>>101498702
>If quantizing models to 8 bpw is virtually lossless
it's not

Anonymous
07/20/24(Sat)18:42:07 No.101498763

Anonymous 07/20/24(Sat)18:42:07 No.101498763

>>101498702
I guess it's still very new. But that's what c.ai does for their models.

Anonymous
07/20/24(Sat)18:42:33 No.101498768

Anonymous 07/20/24(Sat)18:42:33 No.101498768

>>101498728
yes it is
point to any piece of evidence that shows a perceptible different between Q8 and full precision

Anonymous
07/20/24(Sat)18:43:50 No.101498790

Anonymous 07/20/24(Sat)18:43:50 No.101498790

>>101498702
>virtually lossless
Long way to spell "lossy".

Anonymous
07/20/24(Sat)18:44:48 No.101498798

Anonymous 07/20/24(Sat)18:44:48 No.101498798

>>101498702
stability issues, specialized kernels and complexity
if it was easy to do, everyone would be doing it, but 90% of the people who use llms don't know how to do anything that isn't already built in to whatever pipeline they're using
the big players are already using int8 kernels for training, they even have bitnet up and running

Anonymous
07/20/24(Sat)18:46:15 No.101498818

Anonymous 07/20/24(Sat)18:46:15 No.101498818

File: Screenshot_20240720_223831.png (216 KB, 1560x2021)

216 KB PNG

>>101498328
Welp. I did at least download the config file so here it is. https://files.catbox.moe/sx8b38.json

>>101498308
There's a reason why I said "that config" and not "that model". And in this reality, the config lists the number of hidden layers as 10, or at least it did. Check the above link and the screenshot of the page I still had sitting in my tabs.

Anonymous
07/20/24(Sat)18:48:31 No.101498846

Anonymous 07/20/24(Sat)18:48:31 No.101498846

File: 1542502103649.jpg (35 KB, 400x400)

35 KB JPG

>>101498763
>>101498798
Neat.

Anonymous
07/20/24(Sat)18:50:26 No.101498871

Anonymous 07/20/24(Sat)18:50:26 No.101498871

>>101497246
how to render images/memes from text prompt, on my local GPU on windows, which link is for that?

Anonymous
07/20/24(Sat)18:51:52 No.101498883

Anonymous 07/20/24(Sat)18:51:52 No.101498883

>>101498871
>>>/g/sdg

Anonymous
07/20/24(Sat)18:54:19 No.101498909

Anonymous 07/20/24(Sat)18:54:19 No.101498909

Will roleplay probably be integrated with video when it starts being a thing?

Anonymous
07/20/24(Sat)18:58:25 No.101498954

Anonymous 07/20/24(Sat)18:58:25 No.101498954

>>101498818
Also this is the safetensors index file. https://files.catbox.moe/xsdpkb.json

Anonymous
07/20/24(Sat)19:00:08 No.101498975

Anonymous 07/20/24(Sat)19:00:08 No.101498975

>>101498219
The human brain has 100T connections. Parameters in a model are parameters not neurons. One parameter is a connection between two neurons.

Anonymous
07/20/24(Sat)19:02:01 No.101498994

Anonymous 07/20/24(Sat)19:02:01 No.101498994

>>101498975
*are connections

Anonymous
07/20/24(Sat)19:03:17 No.101499007

Anonymous 07/20/24(Sat)19:03:17 No.101499007

>>101498513
>buy an a-ACK

Anonymous
07/20/24(Sat)19:03:20 No.101499010

Anonymous 07/20/24(Sat)19:03:20 No.101499010

>>101498975
So GPT4 is 1% of the power of the human brain? Nice. We just need 100x that.

Anonymous
07/20/24(Sat)19:04:40 No.101499027

Anonymous 07/20/24(Sat)19:04:40 No.101499027

>>101498818
This ehartford guy is a retard. He has never done anything useful and his wife is a nigger.

Anonymous
07/20/24(Sat)19:04:59 No.101499031

Anonymous 07/20/24(Sat)19:04:59 No.101499031

im currently trying to setup chameleon30b but it seems overly complicated and i keep getting errors. i followed the guide on github but i keep getting this:
https://files.catbox.moe/i51xq0.txt

Anonymous
07/20/24(Sat)19:05:06 No.101499033

Anonymous 07/20/24(Sat)19:05:06 No.101499033

>>101498954
Well this is odd. Now that I look at this and compare it to 70B's index file, the total_size is not within expectations.
https://huggingface.co/PrimeIntellect/Meta-Llama-3-70B-Instruct/blob/main/model.safetensors.index.json
16060522496 vs 141107412992
16 GB vs 141 GB

Anonymous
07/20/24(Sat)19:06:36 No.101499048

Anonymous 07/20/24(Sat)19:06:36 No.101499048

>>101499027
>He has never done anything useful
uncencosred wizard vicuna, samantha dolphiin yea no he better than u

Anonymous
07/20/24(Sat)19:06:58 No.101499054

Anonymous 07/20/24(Sat)19:06:58 No.101499054

>>101498975
No it's not. 1 parameter can take many parameters as input and send it's output to many other parameters. The neural nets we have now are more densely connected than the human brain. A single MLP neuron in LLaMa 8B has >2000 incoming and outgoing connections.

Anonymous
07/20/24(Sat)19:07:35 No.101499061

Anonymous 07/20/24(Sat)19:07:35 No.101499061

>>101499048
I can make a trash model in 20 minutes too

Anonymous
07/20/24(Sat)19:08:24 No.101499068

Anonymous 07/20/24(Sat)19:08:24 No.101499068

>>101498650
You're trying too hard. It's not organic.

Anonymous
07/20/24(Sat)19:08:32 No.101499071

Anonymous 07/20/24(Sat)19:08:32 No.101499071

>>101499033
>16060522496
Corresponds exactly to l3-8b
https://huggingface.co/PrimeIntellect/Meta-Llama-3-8B-Instruct/blob/main/model.safetensors.index.json#L3

Anonymous
07/20/24(Sat)19:10:12 No.101499083

Anonymous 07/20/24(Sat)19:10:12 No.101499083

File: 1710738955138884.jpg (80 KB, 760x980)

80 KB JPG

>>101498871
>>101498883
Or he could just use Kobold and stay here

Anonymous
07/20/24(Sat)19:10:38 No.101499088

Anonymous 07/20/24(Sat)19:10:38 No.101499088

>>101499031
Your system is broken.

Anonymous
07/20/24(Sat)19:11:49 No.101499094

Anonymous 07/20/24(Sat)19:11:49 No.101499094

>>101498702
If bitnet is virtually lossless why don't model makers just train their models like that?

Anonymous
07/20/24(Sat)19:12:00 No.101499096

Anonymous 07/20/24(Sat)19:12:00 No.101499096

>>101499088
how do i fix it?

Anonymous
07/20/24(Sat)19:13:42 No.101499114

Anonymous 07/20/24(Sat)19:13:42 No.101499114

>>101499031
>/usr/lib/python3/dist-packages/requests/__init__.py:87: RequestsDependencyWarning: urllib3 (2.2.2) or chardet (5.2.0) doesn't match a supported version!
Ah, good old version dependency hell.

Anonymous
07/20/24(Sat)19:13:43 No.101499115

Anonymous 07/20/24(Sat)19:13:43 No.101499115

>>101499071
Oh kek. So I guess the weights were real but the guy somehow only got the weights and none of the other files, so those were placeholders? I don't think a company (the uploader) would try to do fake joke/scam uploads at least.

Anonymous
07/20/24(Sat)19:13:44 No.101499116

Anonymous 07/20/24(Sat)19:13:44 No.101499116

>>101499054
>1 parameter can take many parameters as input and send it's output to many other parameters
what the fuck are you trying to say

Anonymous
07/20/24(Sat)19:14:06 No.101499122

Anonymous 07/20/24(Sat)19:14:06 No.101499122

>>101499094
Because it takes months to do the pretraining for base models and the bitnet paper came out in the middle of the latest slew of releases. So we might not see serious bitnet models until late fall.

Anonymous
07/20/24(Sat)19:14:29 No.101499127

Anonymous 07/20/24(Sat)19:14:29 No.101499127

>>101499083
is that image ai

Anonymous
07/20/24(Sat)19:15:33 No.101499136

Anonymous 07/20/24(Sat)19:15:33 No.101499136

>>101499114
Are you not in a venv? On my machine pip won't even run without a venv.

Anonymous
07/20/24(Sat)19:16:28 No.101499143

Anonymous 07/20/24(Sat)19:16:28 No.101499143

>>101499116
a single weight gets >2000 other weights as inputs to a non-linear function, then sends that output to >2000 other weights
a single neuron can have up to 15k connections to other neurons, but you don't need all of those for a good enough approximation

Anonymous
07/20/24(Sat)19:19:53 No.101499178

Anonymous 07/20/24(Sat)19:19:53 No.101499178

>>101498702
The other answers don't know what they're talking about. The actual reason is that the final weights are determined by billions of small little nudges are able to be accumulated over multiple training step, and while the final weights may only need 8 bits of precision, the small nudges during backpropagation which require higher precision to be represented are substantial enough when added up to make a difference

Anonymous
07/20/24(Sat)19:20:50 No.101499187

Anonymous 07/20/24(Sat)19:20:50 No.101499187

>>101499143
still gibberish, you're confusion weights, neurons and activations
"2000 weights are used to weigh 2000 input activations (from 2000 other neurons), calculating an activation through a non-linear function, which in turn gets sent to 2000 other neurons"
the number of weights per neuron is still 2000 in this example, which doesn't contradict what >>101498975 said

Anonymous
07/20/24(Sat)19:21:57 No.101499196

Anonymous 07/20/24(Sat)19:21:57 No.101499196

>>101499114
pip install wont allow me to update either of those dependencies, so im not exactly sure what to do
>>101499136
im not in a venv, no

Anonymous
07/20/24(Sat)19:24:26 No.101499223

Anonymous 07/20/24(Sat)19:24:26 No.101499223

>>101499187
1 parameter is not equivalent to a connection between two neurons, dumbass
>Parameters in a model are connections not neurons
No, they're neurons. They're designed to be approximations of neurons, and a connection between two parameters is an approximation of a connection between two neurons.

Anonymous
07/20/24(Sat)19:24:33 No.101499226

Anonymous 07/20/24(Sat)19:24:33 No.101499226

>>101499196
You can't do this stuff outside a venv. You'll want to kill yourself if you try.

Anonymous
07/20/24(Sat)19:24:48 No.101499230

Anonymous 07/20/24(Sat)19:24:48 No.101499230

>>101499196
>pip install wont allow me to update either of those dependencies
Is there a requirement file? If so, you should check the specific package versions in it.
It could be that you have too new a version instead of too old, or that you need a specific version for a given parameter.

Anonymous
07/20/24(Sat)19:25:21 No.101499240

Anonymous 07/20/24(Sat)19:25:21 No.101499240

Best settings for Nemo? Neutralizing samplers, setting min-P to 0.001 and temp to 1 works for most cards, but it struggles with others. Anyone found an optimal config? Running rep penalty at 1.06 and DRY sampling at 0.8 - Context and instruct templates set to Mistral.

Anonymous
07/20/24(Sat)19:26:28 No.101499253

Anonymous 07/20/24(Sat)19:26:28 No.101499253

>>101499054
A parameter is a weight describing the strength between two neurons. Your words do not align with this fact.

Anonymous
07/20/24(Sat)19:26:28 No.101499254

Anonymous 07/20/24(Sat)19:26:28 No.101499254

>>101499240
>temp to 1
Isn't the official guidance to use temp of 0.3?

Anonymous
07/20/24(Sat)19:27:15 No.101499258

Anonymous 07/20/24(Sat)19:27:15 No.101499258

So if I wanted to update llama.cpp, how would I go about it? It doesn't seem as simple as a git pull, because it would have to be re-compiled, right?

Anonymous
07/20/24(Sat)19:28:04 No.101499269

Anonymous 07/20/24(Sat)19:28:04 No.101499269

>>101499254
If the card isn't trash, I've found temp 1 seems to work just fine. Lowering it too much seems to introduce more GPT-isms.

Anonymous
07/20/24(Sat)19:28:49 No.101499275

Anonymous 07/20/24(Sat)19:28:49 No.101499275

File: Screenshot from 2024-07-2(...).png (178 KB, 3840x2160)

178 KB PNG

>>101499226
how do i run this in a virtual environment then?
>>101499230
i dont see a requirements file anywhere. has anyone here actually run a multimodal model successfully before?

Anonymous
07/20/24(Sat)19:29:15 No.101499281

Anonymous 07/20/24(Sat)19:29:15 No.101499281

>>101499031
Try to reinstall docker compose. The version from my package manager is 2.29.0, and that log says 1.29.2?

Anonymous
07/20/24(Sat)19:30:00 No.101499293

Anonymous 07/20/24(Sat)19:30:00 No.101499293

>>101499275
>has anyone here actually run a multimodal model successfully before?
no

Anonymous
07/20/24(Sat)19:30:08 No.101499297

Anonymous 07/20/24(Sat)19:30:08 No.101499297

>>101499275
Give me the link to the repo you are running.

Anonymous
07/20/24(Sat)19:31:30 No.101499315

Anonymous 07/20/24(Sat)19:31:30 No.101499315

>>101499258
Why not pull a fresh copy?
If you used any of the Python AI projects, you know you never update those fuckers and pull clean or else everything goes all Python everywhere and you regret the invention of the transistor till you pull fresh.

Anonymous
07/20/24(Sat)19:33:05 No.101499323

Anonymous 07/20/24(Sat)19:33:05 No.101499323

>>101499223
nigger you don't get to redefine what parameter means to win internet arguments
when meta says that their model has 70b parameters, they mean that it has 70 billion weights, not 70 billion neurons
the number of weights associated with each neuron is the number of its input neurons (+ 1 bias, usually)

Anonymous
07/20/24(Sat)19:33:25 No.101499325

Anonymous 07/20/24(Sat)19:33:25 No.101499325

>>101499281
it wont let me update it with pip for some reason.
>>101499297
https://github.com/facebookresearch/chameleon

Anonymous
07/20/24(Sat)19:39:12 No.101499384

Anonymous 07/20/24(Sat)19:39:12 No.101499384

>>101497996
that's fake news, it's an old myth taken from a dirty ass. Human brain don't have 100B neurons but only about 86B and cerebral cortex is only 16B. Most neurons don't contribute to thinking.

Anonymous
07/20/24(Sat)19:41:00 No.101499405

Anonymous 07/20/24(Sat)19:41:00 No.101499405

>>101499240
Nemo is super creative. Like you said 1 works for most but turning it down can help when it goes off script.

Anonymous
07/20/24(Sat)19:43:39 No.101499425

Anonymous 07/20/24(Sat)19:43:39 No.101499425

File: 93379188.png (404 KB, 684x630)

404 KB PNG

>>101499384
>Most neurons don't contribute to thinking
Maybe yours don't

Anonymous
07/20/24(Sat)19:44:05 No.101499430

Anonymous 07/20/24(Sat)19:44:05 No.101499430

>>101498975
which contribute to shitting , sleeping, eating, digesting , breathing, jerking , moving and whole host of stuff you don't need in an artificial brain.

Anonymous
07/20/24(Sat)19:45:23 No.101499438

Anonymous 07/20/24(Sat)19:45:23 No.101499438

>>101499430
Interesting theory but if I were a brain I would optimize most of that into a very small part of my total computation power.

Anonymous
07/20/24(Sat)19:46:35 No.101499448

Anonymous 07/20/24(Sat)19:46:35 No.101499448

Y'all, the numbers don't matter, only the effectiveness.

Use metrics that matter like if the LLM gets answers to serious questions right, makes up fun lies in role plays, and would rather play a game of chess.

Anonymous
07/20/24(Sat)19:46:42 No.101499450

Anonymous 07/20/24(Sat)19:46:42 No.101499450

I have an EPYC 7713 and 512GB of DDR4 2400MT/s ECC registered RAM. is this good enough to run any decent sized models on CPU or should I just stick with running on my GPUs?

Anonymous
07/20/24(Sat)19:46:43 No.101499451

Anonymous 07/20/24(Sat)19:46:43 No.101499451

>>101499430
>artificial brain doesn't require jerking neurons
I hate that people like you work on the highest level of these things.

Anonymous
07/20/24(Sat)19:47:47 No.101499462

Anonymous 07/20/24(Sat)19:47:47 No.101499462

I can't believe I fell for the nemo meme

Anonymous
07/20/24(Sat)19:48:03 No.101499464

Anonymous 07/20/24(Sat)19:48:03 No.101499464

>>101499438
There is an entire section of your brain dedicated to keeping your internals running, and another section dedicated to regulating hormones to keep everything on a fixed schedule
At least a third of your brain (the non-neocortex part) is not needed, and not even the entire neocortex is necessary

Anonymous
07/20/24(Sat)19:48:04 No.101499465

Anonymous 07/20/24(Sat)19:48:04 No.101499465

File: 39_04641_.png (1.71 MB, 896x1152)

1.71 MB PNG

What's your daily driver these days /lmg/?

Anonymous
07/20/24(Sat)19:49:52 No.101499479

Anonymous 07/20/24(Sat)19:49:52 No.101499479

>>101499438
>if i were a brain
who's gonna tell him?

Anonymous
07/20/24(Sat)19:51:25 No.101499492

Anonymous 07/20/24(Sat)19:51:25 No.101499492

>>101499464
Ok I stand corrected. We’re now down to 1/3rd of 200T so ~70T.

Anonymous
07/20/24(Sat)19:51:33 No.101499494

Anonymous 07/20/24(Sat)19:51:33 No.101499494

>>101499465
For code, L3 and Deepseek 33.
For RP, kinda whatever, switching around to keep things kinda fresh. No one thing seems to earn the chef's kiss. Feels like every model can be diamond one session and charcoal the next.

Anonymous
07/20/24(Sat)19:52:00 No.101499499

Anonymous 07/20/24(Sat)19:52:00 No.101499499

>>101499492
how the fuck did you go from 100T to 200T

Anonymous
07/20/24(Sat)19:52:12 No.101499504

Anonymous 07/20/24(Sat)19:52:12 No.101499504

>>101499425
that's correct. My brain thinks mostly in lobus frontallis unlike yours, which right now, most likely uses visceral nervous system that contribute to defecation.

Anonymous
07/20/24(Sat)19:52:48 No.101499511

Anonymous 07/20/24(Sat)19:52:48 No.101499511

>>101499499
Meds now.

Anonymous
07/20/24(Sat)19:53:22 No.101499517

Anonymous 07/20/24(Sat)19:53:22 No.101499517

>>101499465
claude 3.5 lol.
If I use local its 27B though.

Anonymous
07/20/24(Sat)19:53:34 No.101499523

Anonymous 07/20/24(Sat)19:53:34 No.101499523

>>101499511
oh this is an LLM isn't it
which model?

Anonymous
07/20/24(Sat)19:54:13 No.101499529

Anonymous 07/20/24(Sat)19:54:13 No.101499529

>>101499517
based, but isn't it less creative than 3?

Anonymous
07/20/24(Sat)19:54:57 No.101499541

Anonymous 07/20/24(Sat)19:54:57 No.101499541

>>101499464
>>101499479
Oops I misread. I thought your said we only use a third but you said we don’t use a third. So we’re at ~140T parameters actually.

Anonymous
07/20/24(Sat)19:56:39 No.101499560

Anonymous 07/20/24(Sat)19:56:39 No.101499560

>>101499523
You’re mixing brain neuron count (100B) with brain neurons connection count (200T). We’re talking about the fact models should be compared to the latter.

Anonymous
07/20/24(Sat)19:57:29 No.101499569

Anonymous 07/20/24(Sat)19:57:29 No.101499569

>>101499560
why is your math so fucked
we started with 100T >>101498975 1/3rd not being used means we only need 66T
where did you even get 200T from? you're the first person ITT to bring up that number

Anonymous
07/20/24(Sat)20:01:26 No.101499623

Anonymous 07/20/24(Sat)20:01:26 No.101499623

I have 6 4060TIs, totaling to 96gb of vram. What is the best model that I can run?

Anonymous
07/20/24(Sat)20:02:06 No.101499631

Anonymous 07/20/24(Sat)20:02:06 No.101499631

>>101499569
Fuck me. Im gonna stop talking now.

Anonymous
07/20/24(Sat)20:04:42 No.101499669

Anonymous 07/20/24(Sat)20:04:42 No.101499669

>>101499325
>it wont let me update it with pip for some reason.
Install/update it with your OS's package manager instead.

Anonymous
07/20/24(Sat)20:05:14 No.101499677

Anonymous 07/20/24(Sat)20:05:14 No.101499677

>>101499450
Quad-channel? Meh. Octo-channel? Ok-ish.

Anonymous
07/20/24(Sat)20:06:45 No.101499694

Anonymous 07/20/24(Sat)20:06:45 No.101499694

>>101499623
one of the L3 70B tunes
unfortunately there's not many of them to choose from, Instruct is pretty good on it's own though

Anonymous
07/20/24(Sat)20:11:56 No.101499758

Anonymous 07/20/24(Sat)20:11:56 No.101499758

>>101499315
I was hoping to avoid that because Cuda llama.cpp takes a good 30 min to compile even with a fast pc

Anonymous
07/20/24(Sat)20:13:52 No.101499784

Anonymous 07/20/24(Sat)20:13:52 No.101499784

Okay for some reason ST (or KoboldAI) has begun to reprocess the whole context on every single message...

No fucking clue.

Anonymous
07/20/24(Sat)20:14:27 No.101499786

Anonymous 07/20/24(Sat)20:14:27 No.101499786

>>101499784
Use the -j flag.

Anonymous
07/20/24(Sat)20:14:29 No.101499787

Anonymous 07/20/24(Sat)20:14:29 No.101499787

>>101499758
So don't compile all the binaries?

Anonymous
07/20/24(Sat)20:14:59 No.101499797

Anonymous 07/20/24(Sat)20:14:59 No.101499797

>>101499758
Use ccache and a small change takes seconds to recompile.

Anonymous
07/20/24(Sat)20:23:12 No.101499870

Anonymous 07/20/24(Sat)20:23:12 No.101499870

>>101498790
>>101498702
>lossy
There is always a chance of losing some bonds and shivers.

Anonymous
07/20/24(Sat)20:28:44 No.101499907

Anonymous 07/20/24(Sat)20:28:44 No.101499907

>>101499494
>diamond one session and charcoal the next
This. Not one model has all the strengths (large context, creativity, less slop/purple prose).
I'm liking the speed and context for mistral-nemo but it still has plenty of flaws.

Anonymous
07/20/24(Sat)20:30:46 No.101499915

Anonymous 07/20/24(Sat)20:30:46 No.101499915

>>101499677
yes, it is octo channel

Anonymous
07/20/24(Sat)20:40:19 No.101500019

Anonymous 07/20/24(Sat)20:40:19 No.101500019

>>101497391
The main difference though is that there is actually a structure here and it's not just repetition of sentences here before moving a bit forward building up to something. Everything is succinct and moving the plot forward even if the vocabulary is worse. Some of the LLMs are getting there but it isn't there yet.

Anonymous
07/20/24(Sat)20:43:01 No.101500054

Anonymous 07/20/24(Sat)20:43:01 No.101500054

File: 1680276445596838.jpg (181 KB, 1024x768)

181 KB JPG

>>101499915
Take a nice Q6-Q8 gguf of WizardLm 22x8 and/or CR+ for a spin and see how you feel, just make sure to configure llama/kobold right, CPU inference needs every optimization you can squeeze out of it.

Anonymous
07/20/24(Sat)20:43:32 No.101500061

Anonymous 07/20/24(Sat)20:43:32 No.101500061

File: Screenshot_2024-07-20-21-(...).jpg (481 KB, 892x2102)

481 KB JPG

It's funny to see that CoT is so deep-fried in most models that a question like:
>Two cars are travelling towards each other at 30km/h, at the start they were 50 km from each other. How far apart were they at the moment of the collision?
Causes them to do a bunch of useless calculations when the answer is obvious from the start. This causes some models to even get the wrong answer at the end.

Anonymous
07/20/24(Sat)20:46:58 No.101500090

Anonymous 07/20/24(Sat)20:46:58 No.101500090

>>101500054
god I love that lora

Anonymous
07/20/24(Sat)20:48:35 No.101500107

Anonymous 07/20/24(Sat)20:48:35 No.101500107

>>101498294
You coom. I speak from experience.

Anonymous
07/20/24(Sat)20:50:40 No.101500130

Anonymous 07/20/24(Sat)20:50:40 No.101500130

>>101500054
and what if I also have 7 4060tis in addition to my 64 cores and 512gb of RAM? does that change your recommendation?

Anonymous
07/20/24(Sat)20:53:27 No.101500161

Anonymous 07/20/24(Sat)20:53:27 No.101500161

>>101500107
Spiritual experiences may also happen

Anonymous
07/20/24(Sat)21:01:33 No.101500243

Anonymous 07/20/24(Sat)21:01:33 No.101500243

>>101499786
I checked the API and realized ST is exceeding the max token count when sending in the prompt...

Anonymous
07/20/24(Sat)21:03:10 No.101500262

Anonymous 07/20/24(Sat)21:03:10 No.101500262

File: 1720944611738558.jpg (89 KB, 900x750)

89 KB JPG

Just COOMED to Mixtral Nemo. Here are my thoughts.
>Mistral-Nemo-Instruct-12B-exl2-8.0bpw using ooba/ST
>70k tokens
>Mistral context template
>Mistral instruct presets
>temp 0.5
>minp 0.02
>rep penalty 1.2

So far so good. Language feels pretty natural and mostly unslopped with a few exceptions. Followed my card well. Its got good spatial awareness and is completely uncensored. Pretty smart, although I can't make a definite determination on where it stands because I haven't used any 70b models before. It is most definitely smarter than llama 3 8b and Mixtral 8x7b though.

I did notice that it started to become a bit dumber the longer I prompted it. Got to about 30k tokens before I stopped. It wasn't a terrible decline in intelligence and memory, but definitely noticeable. I also noticed that the longer you go, the more its likely to repeat past messages almost verbatim. I'm unsure if this issue is just a problem with the model, or the presets and/or samplers I'm using. I've read that the exl2 quants have issues with them similarly to the llama.cpp ones. Hopefully they get ironed out quick.

Anonymous
07/20/24(Sat)21:03:50 No.101500271

Anonymous 07/20/24(Sat)21:03:50 No.101500271

File: 1682729528395.png (1.25 MB, 1024x1024)

1.25 MB PNG

>>101500130
At that point you'd generally be better off running lower quants on GPU then, the only reason for a setup like that to use CPU inference is to go full madman and max out 128k context on fat models.

Anonymous
07/20/24(Sat)21:04:06 No.101500275

Anonymous 07/20/24(Sat)21:04:06 No.101500275

Video generation could revolutionize graphics and video editing. You'd be able to gen animations

Anonymous
07/20/24(Sat)21:05:06 No.101500287

Anonymous 07/20/24(Sat)21:05:06 No.101500287

>>101500243
Solved by force ST to use the api tokenizer. Though I've never had to do that before, what the fuck.

Anonymous
07/20/24(Sat)21:05:27 No.101500293

Anonymous 07/20/24(Sat)21:05:27 No.101500293

>>101500271
interesting. the reason I bought the EPYC was for the 128 gen 4 PCIe lanes so I could run a shit ton of GPUs at max speed. I was just curious to see how capable my CPU would be by itself.

Anonymous
07/20/24(Sat)21:08:35 No.101500331

Anonymous 07/20/24(Sat)21:08:35 No.101500331

File: 6086.png (90 KB, 450x274)

90 KB PNG

kek?

Anonymous
07/20/24(Sat)21:13:04 No.101500363

Anonymous 07/20/24(Sat)21:13:04 No.101500363

>>101500287
And we're back to reprocessing the whole thing. I've no idea. I even dialed the context back a bit just to force it down in size but nope.

Anonymous
07/20/24(Sat)21:14:42 No.101500384

Anonymous 07/20/24(Sat)21:14:42 No.101500384

>>101500262
how muh vram it takes?

Anonymous
07/20/24(Sat)21:15:22 No.101500392

Anonymous 07/20/24(Sat)21:15:22 No.101500392

>>101500331
This is what Yann LeCunn envisioned when he invented Transformers

Anonymous
07/20/24(Sat)21:16:12 No.101500399

Anonymous 07/20/24(Sat)21:16:12 No.101500399

>>101500384
70k context takes about 23gb of VRAM. Could probably fit more with 8bit cache.

Anonymous
07/20/24(Sat)21:22:54 No.101500474

Anonymous 07/20/24(Sat)21:22:54 No.101500474

>>101500399
did you do any experiments with optimization? is that true nemo doesn't fit into 24 GB in fp/bf16?
how muh vram kv cache quantization would save? I guess you could quantize like kv 8/4 or even 4/4 in llama.cpp provided it works as intended . Any thoughts?
what was the largest context you was able to throw in? is 30k the limit for retardation?

Anonymous
07/20/24(Sat)21:24:00 No.101500483

Anonymous 07/20/24(Sat)21:24:00 No.101500483

File: smugtommy.jpg (22 KB, 322x294)

22 KB JPG

>>101500331
>Rule 34
What's the big deal?

Anonymous
07/20/24(Sat)21:24:50 No.101500493

Anonymous 07/20/24(Sat)21:24:50 No.101500493

>>101499462
It is ok. You just have to wait until the loader gets fixed. Of course by that time 3 new models will drop and nobody will be talking about nemo anymore.

Anonymous
07/20/24(Sat)21:25:18 No.101500499

Anonymous 07/20/24(Sat)21:25:18 No.101500499

>>101500331
>>101500392
>>101500483
reddit called, asked where are you.

Anonymous
07/20/24(Sat)21:26:31 No.101500519

Anonymous 07/20/24(Sat)21:26:31 No.101500519

>>101500262
>because I haven't used any 70b models before
Thank you for disclosing that you are incapable of providing any review. I wish more people did this.

Anonymous
07/20/24(Sat)21:32:09 No.101500577

Anonymous 07/20/24(Sat)21:32:09 No.101500577

>try 8bpw 70b on cpu
>can't keep track of clothes within 2 posts with the same vocab of an 8b but it can do riddles better
why use anything but cloud honestly

Anonymous
07/20/24(Sat)21:32:26 No.101500579

Anonymous 07/20/24(Sat)21:32:26 No.101500579

>>101500262
>Mixtral Nemo
>Mistral-Nemo-Instruct
>Mixtral
>Mistral
bro?

Anonymous
07/20/24(Sat)21:33:49 No.101500589

Anonymous 07/20/24(Sat)21:33:49 No.101500589

>>101500262
Show logs

Anonymous
07/20/24(Sat)21:34:21 No.101500595

Anonymous 07/20/24(Sat)21:34:21 No.101500595

>>101500577
GGUF is garbage for poorfags that objectively has worse outputs than EXL2 and if you can't run CR+ minimum fully loaded into GPU you should not hold any opinion on open source

Anonymous
07/20/24(Sat)21:38:59 No.101500642

Anonymous 07/20/24(Sat)21:38:59 No.101500642

>nobody talking about 236B
I take it when 405B drops nobody is gonna be talking about it either? Was /lmg/ always just 100% vramlets with a few vramlets larping as vramchads?

Anonymous
07/20/24(Sat)21:41:50 No.101500670

Anonymous 07/20/24(Sat)21:41:50 No.101500670

>>101500642
It's going to be weeks before llama.cpp supports 405b anyway.

Anonymous
07/20/24(Sat)21:42:33 No.101500676

Anonymous 07/20/24(Sat)21:42:33 No.101500676

>>101500642
Motherfucker if you have 200 gigabytes of vram go fuck yourself and suck my dick

Anonymous
07/20/24(Sat)21:45:47 No.101500707

Anonymous 07/20/24(Sat)21:45:47 No.101500707

gemma-2 9b and 27b with fixed pre-tokenization and added (iMatrix) that contains many Japanese words.
https://huggingface.co/dahara1/gemma-2-27b-it-gguf-japanese-imatrix
https://huggingface.co/dahara1/gemma-2-9b-it-gguf-japanese-imatrix

Anonymous
07/20/24(Sat)21:45:59 No.101500711

Anonymous 07/20/24(Sat)21:45:59 No.101500711

>>101500642
I tried it. It was worse than Gemma 2 27B.

Anonymous
07/20/24(Sat)21:48:56 No.101500736

Anonymous 07/20/24(Sat)21:48:56 No.101500736

>>101500711
>t. IQ2_XXS

Anonymous
07/20/24(Sat)21:50:04 No.101500745

Anonymous 07/20/24(Sat)21:50:04 No.101500745

>>101500642
Get OR to host the base model and I will post about it.

Anonymous
07/20/24(Sat)21:51:48 No.101500758

Anonymous 07/20/24(Sat)21:51:48 No.101500758

>>101500595
>objectively has worse outputs than EXL2
I haven't seen anyone post comparisons. If you have them it would be welcome. I didn't get into Exllama since it didn't seem like it had anything over Llama.cpp but I'll transition if it does.

Anonymous
07/20/24(Sat)21:56:30 No.101500810

Anonymous 07/20/24(Sat)21:56:30 No.101500810

>>101500676
I have 12 gb of vram, can I still suck it?

Anonymous
07/20/24(Sat)21:58:21 No.101500828

Anonymous 07/20/24(Sat)21:58:21 No.101500828

>>101500758
i dont know about quality, but i do know that EXL2 is better optimized for multi-GPU setups. you should avoid GGUF if you have more than one GPU

Anonymous
07/20/24(Sat)22:01:11 No.101500853

Anonymous 07/20/24(Sat)22:01:11 No.101500853

>>101500810
Go ahead, it's all yours my friend

Anonymous
07/20/24(Sat)22:07:40 No.101500907

Anonymous 07/20/24(Sat)22:07:40 No.101500907

>>101500828
no. i won't. *crosses arms.*

Anonymous
07/20/24(Sat)22:12:56 No.101500958

Anonymous 07/20/24(Sat)22:12:56 No.101500958

>>101500828
ok mr no default auto-splitting across the gpu and FA enabled by default

Anonymous
07/20/24(Sat)22:14:33 No.101500964

Anonymous 07/20/24(Sat)22:14:33 No.101500964

>>101500595
gguf has better quantization performance than exl2. Do a kv divergence test and you will prove it. Don't use a base model, use an instruct model for the test.

Anonymous
07/20/24(Sat)22:16:03 No.101500975

Anonymous 07/20/24(Sat)22:16:03 No.101500975

File: 1717151251648139.jpg (115 KB, 1280x989)

115 KB JPG

I finally gave Nemo a try (5BPW), my first impressions of it are... mixed.

I don't know if I'm doing something wrong, but the model is EXTREMELY sensitive to samplers. A high temperature or a high/moderate repetition penalty makes the model break the format constantly. And without any repetition penalty the model starts to repeat itself verbatim constantly.

Besides that, the model is very very dumb, but it's surprisingly good at writing ERP and feels like it has 0 positivity bias, I'm seeing really novel shit in my mesugaki loli cards.

Parameters:
>Temp: 0.3
>Rep Pen: 1.2
>Format: Mistral

Anonymous
07/20/24(Sat)22:20:50 No.101501018

Anonymous 07/20/24(Sat)22:20:50 No.101501018

>>101498513
>actually bought an ad

Anonymous
07/20/24(Sat)22:49:51 No.101501213

Anonymous 07/20/24(Sat)22:49:51 No.101501213

Slightly different topic but given the current discussion I was curious to see if there's any quality differences (and thus issues) between offloading configurations on Llama.cpp, so I've done another KLD test. I originally did >>101465239 with all layers offloaded to GPU 1. For this new test I did the same model except now I have offloaded some layers to GPU 1, some to GPU 2, and some to CPU. The results are below. It's basically the same (there is a small dif but below margin of error).

====== Perplexity statistics ======
Mean PPL(Q) : 7.084136 ± 0.050764
Mean PPL(base) : 7.128723 ± 0.051077
Cor(ln(PPL(Q)), ln(PPL(base))): 99.58%
Mean ln(PPL(Q)/PPL(base)) : -0.006274 ± 0.000660
Mean PPL(Q)/PPL(base) : 0.993745 ± 0.000656
Mean PPL(Q)-PPL(base) : -0.044587 ± 0.004703

====== KL divergence statistics ======
Mean KLD: 0.017832 ± 0.000251
Maximum KLD: 13.449598
99.9% KLD: 0.899704
99.0% KLD: 0.191979
99.0% KLD: 0.191979
Median KLD: 0.005399
10.0% KLD: 0.000041
5.0% KLD: 0.000007
1.0% KLD: 0.000000
Minimum KLD: -0.000023

====== Token probability statistics ======
Mean Δp: 0.126 ± 0.011 %
Maximum Δp: 95.268%
99.9% Δp: 36.992%
99.0% Δp: 12.519%
95.0% Δp: 5.049%
90.0% Δp: 2.773%
75.0% Δp: 0.475%
Median Δp: 0.000%
25.0% Δp: -0.402%
10.0% Δp: -2.495%
5.0% Δp: -4.575%
1.0% Δp: -10.820%
0.1% Δp: -25.043%
Minimum Δp: -93.939%
RMS Δp : 4.006 ± 0.042 %
Same top p: 94.765 ± 0.059 %

Anonymous
07/20/24(Sat)22:50:24 No.101501215

Anonymous 07/20/24(Sat)22:50:24 No.101501215

>>101500975
max context?

Anonymous
07/20/24(Sat)22:55:36 No.101501250

Anonymous 07/20/24(Sat)22:55:36 No.101501250

File: 1719884616553296.jpg (503 KB, 1424x2144)

503 KB JPG

should i try a quant of qwen that fits on my 24gb card, or just stick with mixtral?

Anonymous
07/20/24(Sat)22:55:43 No.101501253

Anonymous 07/20/24(Sat)22:55:43 No.101501253

>>101498513
What the fuck... the madlad actually did it

Anonymous
07/20/24(Sat)23:06:09 No.101501329

Anonymous 07/20/24(Sat)23:06:09 No.101501329

>>101500676
go swing from a rope little buddy. your whore mother will be thankful

Anonymous
07/20/24(Sat)23:09:46 No.101501357

Anonymous 07/20/24(Sat)23:09:46 No.101501357

>>101499494
>For code, L3 and Deepseek 33.
Is this better than the wizard 8x22b? I've had best luck with that for all uses but it's still not great.

Anonymous
07/20/24(Sat)23:14:40 No.101501390

Anonymous 07/20/24(Sat)23:14:40 No.101501390

>>101501215
No, not even close. Just around 8k.

Anonymous
07/20/24(Sat)23:14:58 No.101501394

Anonymous 07/20/24(Sat)23:14:58 No.101501394

>>101500975
>>101500262
I've just read on llama.cpp repo that flash attention degrades the quality of Nemo with the long context. could be the same in exllama?

Anonymous
07/20/24(Sat)23:23:16 No.101501466

Anonymous 07/20/24(Sat)23:23:16 No.101501466

What is the new prompt format for nemo?

Anonymous
07/20/24(Sat)23:24:58 No.101501483

Anonymous 07/20/24(Sat)23:24:58 No.101501483

>>101501250
Try Gemma 2 27B or Mistral Nemo.

Anonymous
07/20/24(Sat)23:27:05 No.101501499

Anonymous 07/20/24(Sat)23:27:05 No.101501499

File: 1721421234867589.png (131 KB, 1355x470)

131 KB PNG

>>101501466
This one.

Anonymous
07/20/24(Sat)23:33:22 No.101501561

Anonymous 07/20/24(Sat)23:33:22 No.101501561

>>101501483
but i keep reading from the thread that gemma 2 is a mixed bag that ultimately isn't as good for word sex as mixtral base and merges are
>nemo
i have it, but haven't loaded it, i'll give it a shot today

Anonymous
07/20/24(Sat)23:33:37 No.101501563

Anonymous 07/20/24(Sat)23:33:37 No.101501563

>>101501499
What in the world were the french thinking?

Anonymous
07/20/24(Sat)23:35:39 No.101501579

Anonymous 07/20/24(Sat)23:35:39 No.101501579

>>101501499
i refuse to believe that's the case since the order is fucked

Anonymous
07/20/24(Sat)23:38:28 No.101501608

Anonymous 07/20/24(Sat)23:38:28 No.101501608

nemo 8x12 when?

Anonymous
07/20/24(Sat)23:40:32 No.101501629

Anonymous 07/20/24(Sat)23:40:32 No.101501629

>>101501561
I don't know, the last time I used Mixtral was when I still had 24GB VRAM, before the Miqu leak, that model is obsolete to me. I have 48GB and I still play with Gemma 2 or Nemo currently. That anon was probably lying.
>>101501579
The system prompt goes in the last user message, and the official API adds an empty user message if the prompt doesn't start with one.

Anonymous
07/21/24(Sun)00:06:59 No.101501843

Anonymous 07/21/24(Sun)00:06:59 No.101501843

>>101501608
2 weeks

Anonymous
07/21/24(Sun)00:08:08 No.101501855

Anonymous 07/21/24(Sun)00:08:08 No.101501855

File: 1694382291150920.jpg (278 KB, 1080x1698)

278 KB JPG

Picrelated, coming soon in local meme!
It's already here tho, if we count mental gymnastics such as prompting model to be evil or based while it trying to lecture you on some irrelevant bullshit and losing coherency with messages count going up.

Anonymous
07/21/24(Sun)00:09:31 No.101501865

Anonymous 07/21/24(Sun)00:09:31 No.101501865

>>101497246
Change the OP benchmark for programming for either this https://huggingface.co/spaces/mike-ravkine/can-ai-code-results or this https://prollm.toqan.ai/leaderboard/coding-assistant
These are much more recent

Anonymous
07/21/24(Sun)00:15:17 No.101501908

Anonymous 07/21/24(Sun)00:15:17 No.101501908

>>101501855
it wouldn't matter on local
the instruction hierarchy is about setting precedence for obedience between OAI's backend identity prompts, system prompts used by services, user messages, etc.
on local you have full access to the prompt so you could set its core identity to be miku, anon's submissive coom slave and in theory it would actually stick better because it would be harder for it to get confused about its role over the course of many messages

Anonymous
07/21/24(Sun)00:15:23 No.101501909

Anonymous 07/21/24(Sun)00:15:23 No.101501909

>>101501213
I have done another test now that I was curious about, which was varying -b and -ub. I did combinations of 2048 and 256. So ultimately there were 4 tests. And all of them came back the exact same. I tried this because I heard someone claim before that these parameters affected quality. That seems not to be the case at least in this test with my build version.

Something else I noticed about prompt processing. It seems that on a single GPU with full offloading, both flags at 256 is faster than the other combos, by around 4% compared to both at 2048. However, when doing offloading to two GPUs, both flags at 2048 was actually faster, and by 21%, surprisingly.

Now it'd be even more interesting if I had token gen speed statistics, but the KLD test doesn't happen to generate these, so I don't know how that would diff. But at least it seems that if someone is doing split GPU offloading, a larger value for both -b and -ub is beneficial for prompt processing. But this was a small dense model (L3 8B), and something as different as a large MoE model like Wizard could give different results. So many variables at play. Ultimately perhaps the defaults Llama.cpp comes with are fine for general setups and use cases.

Anonymous
07/21/24(Sun)00:19:27 No.101501942

Anonymous 07/21/24(Sun)00:19:27 No.101501942

Anyone have a good line or two to prevent shit like out of character narration or
>What will {{user}} do next?
shit? I've tried a couple but they don't appear effective.

Anonymous
07/21/24(Sun)00:24:49 No.101501974

Anonymous 07/21/24(Sun)00:24:49 No.101501974

>>101501629
willing to share logs for gemma?

Anonymous
07/21/24(Sun)00:26:15 No.101501991

Anonymous 07/21/24(Sun)00:26:15 No.101501991

>>101501942
Drink your own piss to get exclusive™ access for /lmg/™ jailbreaks™.

Anonymous
07/21/24(Sun)00:27:20 No.101501998

Anonymous 07/21/24(Sun)00:27:20 No.101501998

is boobabooga ded

Anonymous
07/21/24(Sun)00:29:18 No.101502013

Anonymous 07/21/24(Sun)00:29:18 No.101502013

>>101501942
tell it to not say that.

Anonymous
07/21/24(Sun)00:34:37 No.101502041

Anonymous 07/21/24(Sun)00:34:37 No.101502041

>>101501499
Huh? So all that separates the system message is two newlines? What if your user message and/or system prompt has two newlines (or multiple), how would the model know where the system prompt ends and where the user message starts? I mean I guess it could contextually "guess", but I imagine some cases where that would not be so easy to guess. This just seems unnecessarily confusing.

Anonymous
07/21/24(Sun)00:39:24 No.101502081

Anonymous 07/21/24(Sun)00:39:24 No.101502081

>>101500828
(You)
> know
jack shit.

Anonymous
07/21/24(Sun)00:40:23 No.101502092

Anonymous 07/21/24(Sun)00:40:23 No.101502092

>>101501855
this 4o mini is beyond cucked, I gave it a test run on translation and it lost to gpt 3.5 by far

Anonymous
07/21/24(Sun)00:45:06 No.101502131

Anonymous 07/21/24(Sun)00:45:06 No.101502131

>>101501991
Ok, I drank it. Now give.

Anonymous
07/21/24(Sun)01:11:07 No.101502295

Anonymous 07/21/24(Sun)01:11:07 No.101502295

File: angryayumu.webm (655 KB, 640x480)

655 KB WEBM

>STILL no llama.cpp jamba

Anonymous
07/21/24(Sun)01:14:23 No.101502317

Anonymous 07/21/24(Sun)01:14:23 No.101502317

found nemo

Anonymous
07/21/24(Sun)01:22:26 No.101502390

Anonymous 07/21/24(Sun)01:22:26 No.101502390

https://huggingface.co/neuralmagic/Mistral-Nemo-Instruct-2407-FP8
It doesn't seem to output random Chinese characters with vLLM, like exllama sometimes does.

Anonymous
07/21/24(Sun)01:22:44 No.101502392

Anonymous 07/21/24(Sun)01:22:44 No.101502392

VRAMlets, are you able to run Nemo at high contexts or are we still stuck at 8k? The VRAM calculator in the header doesn't support Nemo yet.
Will I be able to run it at ~32k context without a heady aphrodisiac of rivulets burning at the core of my 12GB VRAM?

Anonymous
07/21/24(Sun)01:23:28 No.101502397

Anonymous 07/21/24(Sun)01:23:28 No.101502397

>>101502295
Never will be.

Anonymous
07/21/24(Sun)01:28:16 No.101502426

Anonymous 07/21/24(Sun)01:28:16 No.101502426

>>101502390
W-were people using quants not FP8 even though mistral said to use FP8?

Anonymous
07/21/24(Sun)01:32:43 No.101502469

Anonymous 07/21/24(Sun)01:32:43 No.101502469

Is DeepSeek lite good? I tried it to make code and doesn't werk

Anonymous
07/21/24(Sun)01:35:57 No.101502498

Anonymous 07/21/24(Sun)01:35:57 No.101502498

>>101502469
It's like 16B. Of course it's not good.

Anonymous
07/21/24(Sun)01:39:45 No.101502533

Anonymous 07/21/24(Sun)01:39:45 No.101502533

>>101502498
Guess I will delete it then, haha

Anonymous
07/21/24(Sun)01:41:44 No.101502541

Anonymous 07/21/24(Sun)01:41:44 No.101502541

>>101502498
What makes the best code?

Anonymous
07/21/24(Sun)01:43:31 No.101502554

Anonymous 07/21/24(Sun)01:43:31 No.101502554

>>101502541
corpo models

Anonymous
07/21/24(Sun)01:44:56 No.101502567

Anonymous 07/21/24(Sun)01:44:56 No.101502567

>>101502554
Oh, so I have to pay?

Anonymous
07/21/24(Sun)01:48:30 No.101502594

Anonymous 07/21/24(Sun)01:48:30 No.101502594

File: .png (9 KB, 643x55)

9 KB PNG

Having ended a session with Nemo, it's much better than Deepseek, but you still need to hold its hand a good deal and swipe a lot. It's not at 8x7b levels or at 70b+ capabilities where the model gets to the point that it knows what you're implying and the nuances of language, but for what it is, it's pretty impressive. It's the 13b that llama3 should have given. Finetunes on it will be pretty good.

Anonymous
07/21/24(Sun)01:50:35 No.101502609

Anonymous 07/21/24(Sun)01:50:35 No.101502609

>>101502594
Gemma, I mean, There are too many of these models to go through.

Anonymous
07/21/24(Sun)01:50:43 No.101502611

Anonymous 07/21/24(Sun)01:50:43 No.101502611

>>101502594
>It's not at 8x7b levels
Really? With all the hype it got at first you'd think it would've at least surpassed the old mixtrals.

Anonymous
07/21/24(Sun)01:54:30 No.101502639

Anonymous 07/21/24(Sun)01:54:30 No.101502639

>>101500975
Update: WELP... This model is awesome.
It's dumb, but it writes interesting stories that keep you hooked. It doesn't shy away from anything, it moves the story forward. I almost feel like THIS is the C.AI soul local was missing.

Anonymous
07/21/24(Sun)01:57:38 No.101502659

Anonymous 07/21/24(Sun)01:57:38 No.101502659

>>101502639
This really. Still feels obviously dumber for stuff the big models do but its soul makes it better for RP / creative writing than them anyways as long as you aren't going for some really complicated mechanics. Hope we end up with a larger mistral trained the same way.

Anonymous
07/21/24(Sun)01:59:05 No.101502665

Anonymous 07/21/24(Sun)01:59:05 No.101502665

>>101502295
The bits are being assembled:
>https://github.com/ggerganov/llama.cpp/pull/8546
>https://github.com/ggerganov/llama.cpp/pull/8526
>https://github.com/ggerganov/llama.cpp/pull/7531

Anonymous
07/21/24(Sun)02:00:44 No.101502686

Anonymous 07/21/24(Sun)02:00:44 No.101502686

I finally managed to get my Tesla P40 cards running. All it took was a single setting in BIOS.

> Above 4G Decoding

Without it, the system won't boot even to BIOS, and with it, everything works flawlessly.

10 tokens/sec on a L3-70B loaded onto one 3090 and one P40.

Anonymous
07/21/24(Sun)02:12:45 No.101502767

Anonymous 07/21/24(Sun)02:12:45 No.101502767

>>101502594
what's that backend ? how much mem it takes and what's your GPU? does the model go wacky beyond 30k input context as anons reported?

Anonymous
07/21/24(Sun)02:15:20 No.101502786

Anonymous 07/21/24(Sun)02:15:20 No.101502786

>>101502611
how come 13B be possibly better than 50B MoE from the same company?

Anonymous
07/21/24(Sun)02:20:18 No.101502821

Anonymous 07/21/24(Sun)02:20:18 No.101502821

File: file.png (59 KB, 648x295)

59 KB PNG

>>101502767
TappyAPI. Running the full FP16 model in at 64k context. Dual 3090s and it's only using 33GB of VRAM, I can definitely get more context if needed.

I've only done one session up to 38k context, and it was just fine. Samplers neutralized, temp to 0.9, smoothing to 0.2

Anonymous
07/21/24(Sun)02:20:31 No.101502825

Anonymous 07/21/24(Sun)02:20:31 No.101502825

File: Kerfus.png (255 KB, 550x550)

255 KB PNG

TWO
MORE
WEEKS

Anonymous
07/21/24(Sun)02:22:27 No.101502837

Anonymous 07/21/24(Sun)02:22:27 No.101502837

>>101502825
two more days until 400B and a surprise smaller model that will save local models

Anonymous
07/21/24(Sun)02:22:34 No.101502840

Anonymous 07/21/24(Sun)02:22:34 No.101502840

File: .png (35 KB, 714x274)

35 KB PNG

>>101502821

Anonymous
07/21/24(Sun)02:25:12 No.101502860

Anonymous 07/21/24(Sun)02:25:12 No.101502860

>>101497996
The 70B is the number of parameters, and each connection in a human brain has at least one parameter. So it's 100T, not 100B. Don't make it sound like we're still anywhere close.

Anonymous
07/21/24(Sun)02:27:02 No.101502874

Anonymous 07/21/24(Sun)02:27:02 No.101502874

>>101502594
But is it at the level of an 8x7b that's been squeezed to fit on a 24gb card with 32k context?

Anonymous
07/21/24(Sun)02:27:07 No.101502876

Anonymous 07/21/24(Sun)02:27:07 No.101502876

>>101502837
>inb4 llama-3.5 bitnet 3B trained on 6 gorillion tokens that is better than gpt-5 (with 8k context)

Anonymous
07/21/24(Sun)02:29:58 No.101502899

Anonymous 07/21/24(Sun)02:29:58 No.101502899

>>101502767
>does the model go wacky beyond 30k input context as anons reported
Which anon said that? Its the first local model ive used that stays together over 32k.

Anonymous
07/21/24(Sun)02:30:54 No.101502904

Anonymous 07/21/24(Sun)02:30:54 No.101502904

>>101502825
Miau!

Anonymous
07/21/24(Sun)02:32:56 No.101502914

Anonymous 07/21/24(Sun)02:32:56 No.101502914

>>101502874
It's a bit dumber than 8x7b, and you need to fight or swipe enough times if you want its line of thinking to go a certain way. If you're a fan of spontaneous elements creeping in that still makes sense, it's great at that.

Anonymous
07/21/24(Sun)02:34:40 No.101502924

Anonymous 07/21/24(Sun)02:34:40 No.101502924

llama3-400B-mini

Anonymous
07/21/24(Sun)02:39:57 No.101502964

Anonymous 07/21/24(Sun)02:39:57 No.101502964

>>101502874
128K context.

Anonymous
07/21/24(Sun)02:42:11 No.101502980

Anonymous 07/21/24(Sun)02:42:11 No.101502980

File: 405b 8 fucking k.png (52 KB, 1059x929)

52 KB PNG

>405b
>8k
Captcha: 4444

Anonymous
07/21/24(Sun)02:45:22 No.101503004

Anonymous 07/21/24(Sun)02:45:22 No.101503004

>>101502914
ok so mistral > gemma 27b
mixtral > mistral
but mixtral sucks
and gemma is great

Anonymous
07/21/24(Sun)02:46:18 No.101503009

Anonymous 07/21/24(Sun)02:46:18 No.101503009

>>101502964
More importantly, functional context. We had models that marketed themselves as 32k but were barely usable at 8k or did complete shit the bed at 16-20, and the output did become total garbage.

Anonymous
07/21/24(Sun)02:46:55 No.101503014

Anonymous 07/21/24(Sun)02:46:55 No.101503014

>>101502964
I thought Mixtral only had 32k

Anonymous
07/21/24(Sun)02:47:55 No.101503023

Anonymous 07/21/24(Sun)02:47:55 No.101503023

>>101503014
32k was for Mixtral... he mean the new Mistral.

Anonymous
07/21/24(Sun)02:51:09 No.101503042

Anonymous 07/21/24(Sun)02:51:09 No.101503042

>>101503014
mistra nemo is 128k that works.

Anonymous
07/21/24(Sun)03:00:23 No.101503092

Anonymous 07/21/24(Sun)03:00:23 No.101503092

>>101502874(me)
Trying again
Mistral-Nemo fits in 24gb at 8bpw with full context
Mixtral 8x7b has to be quantized to less than 5bpw to fit in the same 24gb of ram with full context
The assumption is that at the same bpw, Mixtral beats Nemo. However, will that change after Mixtral's been quantized to fit on the card?

Anonymous
07/21/24(Sun)03:02:56 No.101503111

Anonymous 07/21/24(Sun)03:02:56 No.101503111

>>101503092
Remember that it has its own formatting. Dont just use the regular mistral one.

>>101501499

Anonymous
07/21/24(Sun)03:06:49 No.101503139

Anonymous 07/21/24(Sun)03:06:49 No.101503139

>>101503092
Also, use a lower temp than than you would with mixtral. Mixtral needed a high temp to be creative, nemo needs a lower one to not go off the rails. Though it can be fun with high temp depending on the card.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/21/24(Sun)03:29:02 No.101503292

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/21/24(Sun)03:29:02 No.101503292

>>101499758
Add -j <NUMBER OF CORES HERE> to the make/cmake call as described in the README in order to use multithreaded compilation.

Anonymous
07/21/24(Sun)03:42:28 No.101503368

Anonymous 07/21/24(Sun)03:42:28 No.101503368

>>101503092
Mixtral 8x7b is at most 3.7bpw to fit into 24GB with 16k context.

Anonymous
07/21/24(Sun)04:29:26 No.101503687

Anonymous 07/21/24(Sun)04:29:26 No.101503687

Finding the temp balance for Mistral is a challenge. But I like it better than Llama 3, Finetunes, and Gemma. For anons with a 12 GB RAM, it is pretty good, definitely an improvement over what we had.

Anonymous
07/21/24(Sun)04:50:06 No.101503833

Anonymous 07/21/24(Sun)04:50:06 No.101503833

>>101503687
How high can you crank the context limit on it with 12GB?

Anonymous
07/21/24(Sun)04:50:56 No.101503844

Anonymous 07/21/24(Sun)04:50:56 No.101503844

>>101503833
depends on quant

Anonymous
07/21/24(Sun)05:15:14 No.101504007

Anonymous 07/21/24(Sun)05:15:14 No.101504007

>>101503833
With GGUF Q6 41/41 layers with no KV offload 256 Blas, I tried 64k, and it uses 10 GB of RAM, so with KV offload, I think you can go to 128k, no problem, since it does not take RAM. Without offloading, it will be maybe 16k, but then you can also quantize kv cache, which I have not tried. Anyway, with the setting, I have 6 t/s speed when I fill my context to 24 k on RTX 4070. For myself, that is fully acceptable, and I can see myself using it to a point where the speed goes down to 2 t/s. But for many, even 6 t/s is likely something no no.

Anonymous
07/21/24(Sun)05:16:15 No.101504015

Anonymous 07/21/24(Sun)05:16:15 No.101504015

>>101504007
VRAM. fuck me.

Anonymous
07/21/24(Sun)05:23:11 No.101504073

Anonymous 07/21/24(Sun)05:23:11 No.101504073

Does nemo work on llamacpp

Anonymous
07/21/24(Sun)05:24:19 No.101504088

Anonymous 07/21/24(Sun)05:24:19 No.101504088

>>101503092
what's you mean by full context? 128k or what? did you quantize kv cache?

Anonymous
07/21/24(Sun)05:28:40 No.101504119

Anonymous 07/21/24(Sun)05:28:40 No.101504119

>>101504007
Hmm, I've gotten pretty used to 35 t/s on Llama3 8B Q6, guess I'll have to see how slow I can handle. Sounds promising though, feel like 64k context or so would be the sweet spot for my stories

Anonymous
07/21/24(Sun)05:40:36 No.101504200

Anonymous 07/21/24(Sun)05:40:36 No.101504200

>>101504119
One could try the smaller Quants. It is likely possible to run it in 32K context with better speed. For myself, if I ever get to a point where the speed or output sucks, I just summarize the story and continue fresh.

Anonymous
07/21/24(Sun)06:33:07 No.101504602

Anonymous 07/21/24(Sun)06:33:07 No.101504602

Does anyone here use mistral's inference library?

Anonymous
07/21/24(Sun)06:58:52 No.101504797

Anonymous 07/21/24(Sun)06:58:52 No.101504797

>>101501855
>sam altman says he might loosen filter for violence and sex
>actually finds a way to block jailbreaks
kino if true

Anonymous
07/21/24(Sun)07:14:41 No.101504925

Anonymous 07/21/24(Sun)07:14:41 No.101504925

>>101502899
this one >>101500262

Anonymous
07/21/24(Sun)07:16:49 No.101504939

Anonymous 07/21/24(Sun)07:16:49 No.101504939

Will Nemo Mistral be irrelevant in 1 week?

Anonymous
07/21/24(Sun)07:17:16 No.101504944

Anonymous 07/21/24(Sun)07:17:16 No.101504944

https://xeeter.com/AlpinDale/status/1814814551449244058
>Have confirmed that there's 8B, 70B, and 405B. First two are distilled from 405B. 128k (131k in base-10) context. 405b can't draw a unicorn. Instruct tune might be safety aligned. The architecture is unchanged from llama 3.

Anonymous
07/21/24(Sun)07:18:56 No.101504961

Anonymous 07/21/24(Sun)07:18:56 No.101504961

>>101502821
is that Vllm or exllama or what?

Anonymous
07/21/24(Sun)07:21:01 No.101504978

Anonymous 07/21/24(Sun)07:21:01 No.101504978

>>101497246
Cohere's VP of research is named Sarah Hooker.

Anonymous
07/21/24(Sun)07:23:29 No.101505000

Anonymous 07/21/24(Sun)07:23:29 No.101505000

>>101504944
Distilled from 405b? So they are completely different from the weights we have now?

Anonymous
07/21/24(Sun)07:24:42 No.101505008

Anonymous 07/21/24(Sun)07:24:42 No.101505008

>>101499054
>1 parameter can take many parameters as input and send it's output to many other parameters
You retarded subhuman don't even know what these words you are using mean.
I recommend euthanasia.

Anonymous
07/21/24(Sun)07:26:12 No.101505027

Anonymous 07/21/24(Sun)07:26:12 No.101505027

>>101499061
wizard vicuna was among the best at its time

Anonymous
07/21/24(Sun)07:32:51 No.101505101

Anonymous 07/21/24(Sun)07:32:51 No.101505101

>>101504944
Distilled? Does it mean there is 0% "harmful" data in 70b and 8b models? Oh no no no...

Anonymous
07/21/24(Sun)07:33:48 No.101505110

Anonymous 07/21/24(Sun)07:33:48 No.101505110

>>101499223
that's among the most embarrassing dunning-kruger gibberish i've read here.

Anonymous
07/21/24(Sun)07:35:10 No.101505118

Anonymous 07/21/24(Sun)07:35:10 No.101505118

>>101504944
How big is the difference between gemma 9b and gemma 27b? I guess that's useful as a guess for the difference between distilled 70b and 8b

Anonymous
07/21/24(Sun)07:38:34 No.101505150

Anonymous 07/21/24(Sun)07:38:34 No.101505150

>>101505101
I'm pretty sure gemma 9B is distilled from the 27B and it has plenty of harmful data, stop dooming about shit you don't know.

Anonymous
07/21/24(Sun)07:40:33 No.101505165

Anonymous 07/21/24(Sun)07:40:33 No.101505165

>>101505150
Gemma isn't distilled. Both models were trained separetely on different datasets.

Anonymous
07/21/24(Sun)07:43:51 No.101505187

Anonymous 07/21/24(Sun)07:43:51 No.101505187

>>101499258
How did you compile it for the first time but now somehow are stumped on how to compile it a second time after pulling? Are you braindead?
>>101499315
you retarded subhuman, the other braindead subhuman is talking about llama.cpp.
I hope you computer-illiterate niggers are replaced by llms soon.
>>101499758
My cpu is 7 years old. Last time I recompiled it took 3-5 minutes maybe. It will only recompile the parts that need to. But even a full compile of everything shouldn't take half an hour. That sounds like bullshit, unless they've added tons of new unhinged bloat utilities.

Anonymous
07/21/24(Sun)07:44:09 No.101505189

Anonymous 07/21/24(Sun)07:44:09 No.101505189

>>101505165
https://www.reddit.com/r/LocalLLaMA/comments/1dpwi3x/gemma_2_9b_model_was_trained_with_knowledge/
https://medium.com/@nabilw/gemma-2-knowledge-distillation-llama-agents-and-more-ai-updates-2ea4a409c1ba
https://huggingface.co/blog/gemma2
>According to the Gemma 2 tech report, knowledge distillation was used to pre-train the 9B model, while the 27B model was pre-trained from scratch.
come again?

Anonymous
07/21/24(Sun)07:46:52 No.101505209

Anonymous 07/21/24(Sun)07:46:52 No.101505209

>>101505189
>pre-train
bruh

Anonymous
07/21/24(Sun)07:48:05 No.101505220

Anonymous 07/21/24(Sun)07:48:05 No.101505220

>>101505209
>For post-training, the Gemma 2 team generated a diverse set of completions from a teacher (unspecified in the report, but presumably Gemini Ultra), and then trained the student models on this synthetic data with SFT. This is the basis of many open models, such as Zephyr and OpenHermes, which are trained entirely on synthetic data from larger LLMs.
https://huggingface.co/blog/gemma2#knowledge-distillation
bruh yourself

Anonymous
07/21/24(Sun)07:50:05 No.101505232

Anonymous 07/21/24(Sun)07:50:05 No.101505232

>>101500061
your illiterate use of past and present tense probably also contributes to the confusion of the model

Anonymous
07/21/24(Sun)07:50:11 No.101505235

Anonymous 07/21/24(Sun)07:50:11 No.101505235

>>101505220
so they were trained on different datasets

Anonymous
07/21/24(Sun)07:52:47 No.101505259

Anonymous 07/21/24(Sun)07:52:47 No.101505259

>>101505165
>Gemma isn't distilled
>>101505235
...

Anonymous
07/21/24(Sun)07:56:08 No.101505283

Anonymous 07/21/24(Sun)07:56:08 No.101505283

>>101505259
yes, >>101505189 implies they were not

Anonymous
07/21/24(Sun)07:57:14 No.101505289

Anonymous 07/21/24(Sun)07:57:14 No.101505289

>>101505283
My argument is: A pretty much fully distilled from bigger ones model (Gemma 9B) can still have harmful data, contrary to what doomer above was posting. I don't care if it's different than 27B that has nothing to do with what I said.
I never even mentioned 27B itself being distilled.

Anonymous
07/21/24(Sun)07:59:48 No.101505307

Anonymous 07/21/24(Sun)07:59:48 No.101505307

>>101505209
the pre-training is the most important part by far

Anonymous
07/21/24(Sun)08:16:42 No.101505445

Anonymous 07/21/24(Sun)08:16:42 No.101505445

>>101505187
>you retarded subhuman, the other braindead subhuman is talking about llama.cpp.
Are there a lot of people here who talk like this, or is it just one who is very vocal?

Anonymous
07/21/24(Sun)08:22:18 No.101505491

Anonymous 07/21/24(Sun)08:22:18 No.101505491

File: 1721564511981.jpg (422 KB, 1554x2176)

422 KB JPG

>48gb mini-vramchad soon
what's the best model that can fit on 2x24gb?

Anonymous
07/21/24(Sun)08:25:28 No.101505516

Anonymous 07/21/24(Sun)08:25:28 No.101505516

To know that we'll soon have kobold compatible ggups for Nemo...it's a heady sensation.

Anonymous
07/21/24(Sun)08:30:46 No.101505552

Anonymous 07/21/24(Sun)08:30:46 No.101505552

>>101505491
Gemma.

Anonymous
07/21/24(Sun)08:35:09 No.101505579

Anonymous 07/21/24(Sun)08:35:09 No.101505579

>>101505516
There is already branch that you can use for it.

Anonymous
07/21/24(Sun)08:35:31 No.101505582

Anonymous 07/21/24(Sun)08:35:31 No.101505582

So, they retrained Llama3 8B and 70B but still couldn't come up with some intermediate size?

Anonymous
07/21/24(Sun)08:42:35 No.101505634

Anonymous 07/21/24(Sun)08:42:35 No.101505634

>>101505582
A 23B model would have been nice and made the model lineup roughly follow a geometric progression in size up to 70B.

Anonymous
07/21/24(Sun)08:42:47 No.101505636

Anonymous 07/21/24(Sun)08:42:47 No.101505636

File: 19525689461a.jpg (89 KB, 400x400)

89 KB JPG

>>101505552
i get that you're trolling, but at least use something more convincing

Anonymous
07/21/24(Sun)08:54:07 No.101505734

Anonymous 07/21/24(Sun)08:54:07 No.101505734

>>101505118
>How big is the difference between gemma 9b and gemma 27b?
27
-9
= 18
the age you must be to post here.

Anonymous
07/21/24(Sun)09:07:54 No.101505835

Anonymous 07/21/24(Sun)09:07:54 No.101505835

>>101504944
>source: trust me bro

Anonymous
07/21/24(Sun)09:17:07 No.101505908

Anonymous 07/21/24(Sun)09:17:07 No.101505908

>>101505445
I think it's just me, but I've not been very active here in the past few weeks.

Anonymous
07/21/24(Sun)09:17:19 No.101505911

Anonymous 07/21/24(Sun)09:17:19 No.101505911

>>101505636
Nah, you'll understand when you try everything else in the 48 bracket.

Anonymous
07/21/24(Sun)09:24:01 No.101505970

Anonymous 07/21/24(Sun)09:24:01 No.101505970

>>101498176
>More
yep there's the dopamine
I post logs now and again, not too often.
there was something about oral onahole saber that was hard to resist

>>101498650
Edited in, I wanted a recurring phrase that stuck in people's head, apparently it worked

Anonymous
07/21/24(Sun)09:29:29 No.101506007

Anonymous 07/21/24(Sun)09:29:29 No.101506007

>>101505636
If you're an exlless P40 plebian he ain't wrong, Gemma 27B 8-bit roped to 16K is in the sweet spot between speed and accuracy for those.

Anonymous
07/21/24(Sun)09:32:50 No.101506033

Anonymous 07/21/24(Sun)09:32:50 No.101506033

nemo vs phi 3?

Anonymous
07/21/24(Sun)09:49:04 No.101506185

Anonymous 07/21/24(Sun)09:49:04 No.101506185

>>101506007
No, rope scaling doesn't work well with Gemma.

Anonymous
07/21/24(Sun)10:02:11 No.101506335

Anonymous 07/21/24(Sun)10:02:11 No.101506335

If some anons here don't know, right now you can trialscul GCP for access to Claude models, $150 with no CC and $300 with CC (it's free credit anyway, they won't bill you).

https://github.com/cg-dot/vertexai-cf-workers

Should be useful for some dataset generation with 3.5 Sonnet to improve local models.

Anonymous
07/21/24(Sun)10:05:39 No.101506383

Anonymous 07/21/24(Sun)10:05:39 No.101506383

>>101506335
$150 with 3.5 Sonnet ($3/$15 for 1M) is enough for a couple thousand generations with decent context and output tokens.

Anonymous
07/21/24(Sun)10:06:00 No.101506387

Anonymous 07/21/24(Sun)10:06:00 No.101506387

>>101506335
In this house:
-Claude is over-rated pajeet shit.
-You're mentally ill.
-'Tutoring' doesn't work.
-You need to fuck off back to /aicg/ you pathetic good for nothing locust.

Anonymous
07/21/24(Sun)10:06:37 No.101506396

Anonymous 07/21/24(Sun)10:06:37 No.101506396

>>101506387
>-Claude is over-rated pajeet shit.
If you actually think this way, it's over for you. Have you ever tried 3.5 Sonnet? It's a god at programming and assistant tasks.

Anonymous
07/21/24(Sun)10:07:12 No.101506398

Anonymous 07/21/24(Sun)10:07:12 No.101506398

>>101506387
>-'Tutoring' doesn't work.
Then why does the Phi series of models exist? Surely if you have a smaller more specific task, using 3.5 Sonnet to generate high-quality examples of output is going to help you improve a local model a lot, even if you don't have billions of tokens.

Anonymous
07/21/24(Sun)10:08:05 No.101506410

Anonymous 07/21/24(Sun)10:08:05 No.101506410

>>101506398
>Then why does the Phi series of models exist?
microsoft pr

Anonymous
07/21/24(Sun)10:08:24 No.101506413

Anonymous 07/21/24(Sun)10:08:24 No.101506413

>>101506410
So why is Phi-3 so good at a lot of tasks despite being so small?

Anonymous
07/21/24(Sun)10:09:05 No.101506423

Anonymous 07/21/24(Sun)10:09:05 No.101506423

>>101506413
>So why is Phi-3 so good at a lot of tasks
which ones robert?

Anonymous
07/21/24(Sun)10:09:19 No.101506425

Anonymous 07/21/24(Sun)10:09:19 No.101506425

>>101506185
Correct but for standard slop ERP 8-bit works fine enough upwards to 16K, though not always all the way.
For anything else I agree - stick to 8K.

Anonymous
07/21/24(Sun)10:10:24 No.101506443

Anonymous 07/21/24(Sun)10:10:24 No.101506443

>>101506007
dual 7900xtx
will be 3 next month

Anonymous
07/21/24(Sun)10:11:28 No.101506452

Anonymous 07/21/24(Sun)10:11:28 No.101506452

does nemo work on kobold yet?

Anonymous
07/21/24(Sun)10:12:14 No.101506466

Anonymous 07/21/24(Sun)10:12:14 No.101506466

File: 432754078_736277895327332(...).jpg (65 KB, 720x928)

65 KB JPG

Is 5090/5080 going to be a significant upgrade? 4090 wasn't that much better than the 3090/80 for our uses but I'm far from an expert.

Anonymous
07/21/24(Sun)10:12:16 No.101506467

Anonymous 07/21/24(Sun)10:12:16 No.101506467

>>101506425
There's no reason to do that when Nemo exists.

Anonymous
07/21/24(Sun)10:34:57 No.101506742

Anonymous 07/21/24(Sun)10:34:57 No.101506742

>>101506466
I do not think so. These cards are mostly for gamers and there nothing really that would challenge these cards. At best we can hope for more Vram in the 5070/5080 cards.

Anonymous
07/21/24(Sun)10:36:46 No.101506763

Anonymous 07/21/24(Sun)10:36:46 No.101506763

>>101506742
That being said, not all is negative in the HW space, and Intel and their Cpus are planning to do what Apple is doing with their chips, so if Nvidia keeps fucking us over, there may still be light at the end of the tunnel.

Anonymous
07/21/24(Sun)10:39:45 No.101506798

Anonymous 07/21/24(Sun)10:39:45 No.101506798

Nemo 24B when?

Anonymous
07/21/24(Sun)10:39:46 No.101506799

Anonymous 07/21/24(Sun)10:39:46 No.101506799

>>101506466
>>101506742
>>101506763
nvidia is a fat pig ready to be roasted. They're complacent and lazy because they literally create money. The demand vs supply is so insane that they probably don't even give a flying fuck if they lose the AI mega VRAM market share niche battle. And they will.

Anonymous
07/21/24(Sun)10:42:24 No.101506827

Anonymous 07/21/24(Sun)10:42:24 No.101506827

Need help. So oobabooga's llama.cpp has a regular version, and an HF version(just like exl2 and exl do). So when building cuda llama.cpp(not ooba, just standalone llama.cpp), is it defaulting to HF samplers when I use it as a backend with sillytavern as the front end? Is there a command I have to use when loading the model to make it hf samplers? Or does it happen automatically when I load a model in a folder with HF samplers(like the ones you download off ooba).

Anonymous
07/21/24(Sun)10:45:37 No.101506862

Anonymous 07/21/24(Sun)10:45:37 No.101506862

>>101506827
Why are you using ooba anyway?

Anonymous
07/21/24(Sun)10:46:59 No.101506878

Anonymous 07/21/24(Sun)10:46:59 No.101506878

>>101506862
He isn't.

Anonymous
07/21/24(Sun)10:47:32 No.101506883

Anonymous 07/21/24(Sun)10:47:32 No.101506883

>>101506798
We still haven't got the instruct/base versions of the 22B model MistralAI used for Codestral and Mixtral 8x22B.

Anonymous
07/21/24(Sun)10:48:42 No.101506902

Anonymous 07/21/24(Sun)10:48:42 No.101506902

>>101506878
Phew.

Anonymous
07/21/24(Sun)10:51:21 No.101506926

Anonymous 07/21/24(Sun)10:51:21 No.101506926

>>101506799
>>101506763
Alright, but when?
Do I just snag 2 x 3080 in the meantime and be happy? Or do I wait it out?

Anonymous
07/21/24(Sun)10:52:03 No.101506937

Anonymous 07/21/24(Sun)10:52:03 No.101506937

what's wrong with ooba
is the dev a pinko?

Anonymous
07/21/24(Sun)10:52:40 No.101506947

Anonymous 07/21/24(Sun)10:52:40 No.101506947

wait 2 more months for the flood of cheap 32gb v100s

Anonymous
07/21/24(Sun)10:54:51 No.101506971

Anonymous 07/21/24(Sun)10:54:51 No.101506971

>>101506926
No one can tell you that. It could be a year, 3 years, 5 years. If you need LLM coom now, then suck it up I guess.

Anonymous
07/21/24(Sun)10:56:40 No.101506997

Anonymous 07/21/24(Sun)10:56:40 No.101506997

>>101506926
I would likely wait. The 30/70b models we have now are not that much better than the 12/9/8b models. But ultimately, it depends on what card you have now and if you think it is worth having a 10–20% better experience.

Anonymous
07/21/24(Sun)10:59:04 No.101507032

Anonymous 07/21/24(Sun)10:59:04 No.101507032

>>101506997
>>101506971
Alright cheers mates

Anonymous
07/21/24(Sun)11:09:23 No.101507161

Anonymous 07/21/24(Sun)11:09:23 No.101507161

>>101507132
>>101507132
>>101507132

Anonymous
07/21/24(Sun)11:31:45 No.101507465

Anonymous 07/21/24(Sun)11:31:45 No.101507465

>>101506466
5090 is supposed to have at least 40% more memory bandwidth

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.