/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/21/24(Sun)11:07:19 No.101507132

File: __kasane_teto_adachi_rei_(...).jpg (104 KB, 850x601)

104 KB JPG

/lmg/ - Local Models General Anonymous 07/21/24(Sun)11:07:19 No.101507132 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101497246 & >>101488042

I don't think we've ever had a Rei thread Edition

►News
>(07/18) Improved DeepSeek-V2-Chat 236B: https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
>(07/18) Mistral NeMo 12B base & instruct with 128k context: https://mistral.ai/news/mistral-nemo/
>(07/16) Codestral Mamba, tested up to 256k context: https://hf.co/mistralai/mamba-codestral-7B-v0.1
>(07/16) MathΣtral Instruct based on Mistral 7B: https://hf.co/mistralai/mathstral-7B-v0.1
>(07/13) Llama 3 405B coming July 23rd: https://x.com/steph_palazzolo/status/1811791968600576271

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/21/24(Sun)11:08:21 No.101507146

Anonymous 07/21/24(Sun)11:08:21 No.101507146

File: __adachi_rei_utau_and_1_m(...).jpg (106 KB, 1200x913)

106 KB JPG

►Recent Highlights from the Previous Thread: >>101497246

--Offloading Configurations and Their Impact on Llama.cpp Performance: >>101501213 >>101501909
--Nemo AI Model Review: Better than Deepseek, but Still Needs Hand-Holding: >>101502594 >>101502767 >>101502821 >>101502914 >>101503092
--Mixtral Nemo Instruct 12B - Natural Language, Spatial Awareness, and Uncensorable: >>101500262 >>101500399 >>101500474 >>101500519 >>101501394
--Gemma-2 9b and 27b with fixed pre-tokenization and added iMatrix for Japanese words: >>101500707
--Why Don't Model Makers Quantize to 8 bpw Natively?: >>101498702 >>101498798 >>101499178
--Suggestion to Change OP Benchmark for Programming to More Recent Alternatives: >>101501865
--OpenAI's upcoming AI model with enhanced safety measures and human-like emotions: >>101501855 >>101501908
--New Prompt Format for Nemo Using the Mistral Library: Potential Issues with Newlines: >>101501466 >>101501499 >>101501629 >>101502041
--Nemo: Surprisingly Good at ERP and Creative Writing, But Sensitive to Samplers: >>101500975 >>101501394 >>101502639 >>101502659
--Nemo Settings Optimization: Seeking the Golden Config: >>101499240 >>101499254 >>101499269 >>101499405
--LLaMA 405B and Gemma 2 9B Model Updates: Distillation, Safety, and Architecture: >>101504944 >>101505189 >>101505220
--Daily Driver: Anon's Preferred AI Models for Coding and RP: >>101499465 >>101499494 >>101499907 >>101501357
--Comparing Biological Neurons to Digital Neural Network Parameters: Output vs Intrinsic Complexity: >>101498101 >>101498380 >>101498548 >>101498219 >>101499054 >>101499187 >>101499253 >>101499464
--CoT's Confusing Calculations: >>101500061 >>101505232
--Chameleon30b setup: Navigating through version dependency hell: >>101499031 >>101499114 >>101499230 >>101499281 >>101499325
--BIOS Setting "Above 4G Decoding" Fixed My Tesla P40 Cards Not Booting: >>101502686
--Miku (free space): >>101499083 >>101499425

►Recent Highlight Posts from the Previous Thread: >>101497256

Anonymous
07/21/24(Sun)11:12:32 No.101507194

Anonymous 07/21/24(Sun)11:12:32 No.101507194

>>101507146
Worst recap I ever saw wow.

Anonymous
07/21/24(Sun)11:16:58 No.101507265

Anonymous 07/21/24(Sun)11:16:58 No.101507265

>>101507146
Atrocious recap. Do better.

Anonymous
07/21/24(Sun)11:22:44 No.101507354

Anonymous 07/21/24(Sun)11:22:44 No.101507354

any cr+ chads? how does it perform on coding and translation? i can run it wanted to keep a single model for all things but not sure if i should switch from l3 70b

Anonymous
07/21/24(Sun)11:25:37 No.101507388

Anonymous 07/21/24(Sun)11:25:37 No.101507388

>>101507354
I don't use it for coding but it was definitely the best we had for translating Japanese back when it came out.

Anonymous
07/21/24(Sun)11:26:18 No.101507395

Anonymous 07/21/24(Sun)11:26:18 No.101507395

>>101507146
Very nice recap. You're doing great!

Anonymous
07/21/24(Sun)11:26:37 No.101507398

Anonymous 07/21/24(Sun)11:26:37 No.101507398

>>101507354
DeepSeekV2 is better at translation than it, at code too, probably.

Anonymous
07/21/24(Sun)11:26:47 No.101507404

Anonymous 07/21/24(Sun)11:26:47 No.101507404

>>101507132
Who the fuck is Rei

Anonymous
07/21/24(Sun)11:32:19 No.101507470

Anonymous 07/21/24(Sun)11:32:19 No.101507470

>>101507354
I've tested a hundred models to some extent and a few dozen on coding, though I haven't yet had the free time to do a head first make some projects kind of code test.

Llama 3 spins and Deepseek Coder (old 33b because I'm too ramlet for the new one at a reasonable quant) have done the best so far.
CR+ in IQ4_XS quant didn't completely flop but I haven't seen it do anything that L3 didn't do (better).

Anonymous
07/21/24(Sun)11:34:05 No.101507488

Anonymous 07/21/24(Sun)11:34:05 No.101507488

Is mistral nemo vramlet cope like llama 3 8b or is it the real deal? What could a 12b do that's a lot better than 8b besides the context?

Anonymous
07/21/24(Sun)11:34:59 No.101507501

Anonymous 07/21/24(Sun)11:34:59 No.101507501

>>101507488
It's almost as good as Gemini2-27B which is better than 70B for 95% of all tasks.

Anonymous
07/21/24(Sun)11:36:02 No.101507509

Anonymous 07/21/24(Sun)11:36:02 No.101507509

>>101507501
according to which meme benchmark?

Anonymous
07/21/24(Sun)11:38:59 No.101507547

Anonymous 07/21/24(Sun)11:38:59 No.101507547

>>101507509
put down your shitty 70b models and try them yourself
no, you are not "not poor enough" to run smaller models, you're problem is that you're coping too much about small models being good

Anonymous
07/21/24(Sun)11:41:49 No.101507577

Anonymous 07/21/24(Sun)11:41:49 No.101507577

so is there any models with the fits-in-your-pocket size-to-coherency of mixtral combined with the utter debauchery of euryale? i'm not looking for absolutely perfect prose, but i've been using mixtral merges for the past 4 months and it's just too nice - never curses or surprises me when i put "this character is explicit and vulgar" in the cards, and seems to always have a positivity bias
was going to try bagelmisterytour, but i don't know if it will make a significant difference

Anonymous
07/21/24(Sun)11:48:52 No.101507656

Anonymous 07/21/24(Sun)11:48:52 No.101507656

>>101507547
>What do you do?

Anonymous
07/21/24(Sun)11:49:31 No.101507668

Anonymous 07/21/24(Sun)11:49:31 No.101507668

>>101507501
Gemini or Gemma?
And does Gemma work on Kobold 1.70 or are we still in that shithole of half implemented bullshit different between every LLM software?

Anonymous
07/21/24(Sun)11:54:28 No.101507724

Anonymous 07/21/24(Sun)11:54:28 No.101507724

>>101507547
nta as someone who runs cr+ and wizstral daily i tried gemma and it was really impressive for its size, not enough to make me switch but ill admit it gave a good fight for a 27b gremlin
theres still hope in small models

Anonymous
07/21/24(Sun)11:54:33 No.101507725

Anonymous 07/21/24(Sun)11:54:33 No.101507725

>>101507501
>as good as Gemini2-27B
so it's shit?

Anonymous
07/21/24(Sun)11:58:13 No.101507770

Anonymous 07/21/24(Sun)11:58:13 No.101507770

>>101507488
Why not just fucking try it? If you want to larp above all, I am sure your internet speed is fast enough to download the model and give it a go. Low iq fags should be banned from the board.

Anonymous
07/21/24(Sun)12:13:25 No.101507942

Anonymous 07/21/24(Sun)12:13:25 No.101507942

>>101507577
Try Nemo.

Anonymous
07/21/24(Sun)12:14:39 No.101507956

Anonymous 07/21/24(Sun)12:14:39 No.101507956

>>101507488
It's unironically the best local model I have ever used for RP. And I'm used to running 70B models.

Anonymous
07/21/24(Sun)12:17:41 No.101507991

Anonymous 07/21/24(Sun)12:17:41 No.101507991

>>101507488
It has better prose out-of-the-box and it isn't censored, without the massive brain damage that community finetunes usually give to the models.

Anonymous
07/21/24(Sun)12:18:26 No.101508004

Anonymous 07/21/24(Sun)12:18:26 No.101508004

>>101507956
I keep forgetting that whenever someone makes claims like that here, the only thing they care about is RP. I wish more people would specify best for what.

Anonymous
07/21/24(Sun)12:21:36 No.101508038

Anonymous 07/21/24(Sun)12:21:36 No.101508038

Why won't Nemo's eyes ever leave mine?

Anonymous
07/21/24(Sun)12:23:45 No.101508075

Anonymous 07/21/24(Sun)12:23:45 No.101508075

What is "eney"?

Anonymous
07/21/24(Sun)12:24:51 No.101508088

Anonymous 07/21/24(Sun)12:24:51 No.101508088

And here comes the NAI shill damage control.

Anonymous
07/21/24(Sun)12:27:05 No.101508107

Anonymous 07/21/24(Sun)12:27:05 No.101508107

File: 1575927856387.jpg (65 KB, 1280x720)

65 KB JPG

>>101507547
>put down your shitty 70b models

Anonymous
07/21/24(Sun)12:29:08 No.101508128

Anonymous 07/21/24(Sun)12:29:08 No.101508128

Kayra mogs nemo btw

Anonymous
07/21/24(Sun)12:31:57 No.101508154

Anonymous 07/21/24(Sun)12:31:57 No.101508154

I have 96GB of VRAM and I've found myself using Nemo a few days in a row just because I'm sick of waiting for CR+ at 8t/s when I can get 4x that on Nemo (even more if I use batching)
I still have to swipe anyway on CR+ so who cares if Nemo is a bit retarded when I can generate 12 swipes in the time it takes CR+ to generate 1
It's over, the vramlets won and I should sell my cards

Anonymous
07/21/24(Sun)12:34:03 No.101508183

Anonymous 07/21/24(Sun)12:34:03 No.101508183

Nemo sends all your logs to Arthur btw

Anonymous
07/21/24(Sun)12:35:12 No.101508192

Anonymous 07/21/24(Sun)12:35:12 No.101508192

France won
Canada lost

Anonymous
07/21/24(Sun)12:35:52 No.101508201

Anonymous 07/21/24(Sun)12:35:52 No.101508201

/aids/ is still in denial about how Nemo made NovelAI completely obsolete.

Anonymous
07/21/24(Sun)12:38:01 No.101508225

Anonymous 07/21/24(Sun)12:38:01 No.101508225

>>101508201
It's hilarious how sunk cost mentality works.

Anonymous
07/21/24(Sun)12:38:22 No.101508228

Anonymous 07/21/24(Sun)12:38:22 No.101508228

>>101508128
Nice bait. Solid 7.5/10

Anonymous
07/21/24(Sun)12:38:39 No.101508231

Anonymous 07/21/24(Sun)12:38:39 No.101508231

>>101508154
CR+ is dry, do 70B instead:
Qwenny2 Instruct or New-Dawn-L3

Anonymous
07/21/24(Sun)12:39:30 No.101508242

Anonymous 07/21/24(Sun)12:39:30 No.101508242

>>101507770
I for one can't manage to build the vllm wheels

Anonymous
07/21/24(Sun)12:39:57 No.101508248

Anonymous 07/21/24(Sun)12:39:57 No.101508248

I just thought about the future. What about a moe that gets trained on different kinds of shivertastic slop?

Anonymous
07/21/24(Sun)12:40:43 No.101508258

Anonymous 07/21/24(Sun)12:40:43 No.101508258

>>101508154
I have things CR+ and Wizard simply don't understand no matter how many swipes I do, I'd rather not move to an even dumber model that can understand even less scenarios.

Anonymous
07/21/24(Sun)12:40:47 No.101508259

Anonymous 07/21/24(Sun)12:40:47 No.101508259

are you excited for distilled 8 and 70b?

Anonymous
07/21/24(Sun)12:41:44 No.101508277

Anonymous 07/21/24(Sun)12:41:44 No.101508277

>>101508259
>distilled 8 and 70b
wut

Anonymous
07/21/24(Sun)12:45:03 No.101508326

Anonymous 07/21/24(Sun)12:45:03 No.101508326

>>101508259
No, not really.

Anonymous
07/21/24(Sun)12:45:03 No.101508327

Anonymous 07/21/24(Sun)12:45:03 No.101508327

>>101508259
I'm excited about the repetition being fixed and it having long context.

Anonymous
07/21/24(Sun)12:46:17 No.101508344

Anonymous 07/21/24(Sun)12:46:17 No.101508344

>>101508277
llama 3.1 is supposed to be the 405B and a distilled 8B and 70B from it. They are supposed to be 128K context.

Anonymous
07/21/24(Sun)12:48:21 No.101508375

Anonymous 07/21/24(Sun)12:48:21 No.101508375

>>101508259
I would have been more excited for the intermediate-sized model and the BitNet versions that they should have trained (if they weren't utterly risk-averse), since they started over.

Otherwise, I expect the new 8B and 70B will still be more of the same, with a slightly updated instruct finetune giving them better benchmarks, perhaps SOTA, and stronger "safety".

Anonymous
07/21/24(Sun)12:48:22 No.101508376

Anonymous 07/21/24(Sun)12:48:22 No.101508376

>>101508344
Source? I don't know where you're getting that from.

Anonymous
07/21/24(Sun)12:49:02 No.101508380

Anonymous 07/21/24(Sun)12:49:02 No.101508380

>>101508327
>repetition being fixed
kek

Anonymous
07/21/24(Sun)12:50:26 No.101508397

Anonymous 07/21/24(Sun)12:50:26 No.101508397

Do people not even read the recaps? They're literally there just so you don't have to autistically read all the posts.

Anonymous
07/21/24(Sun)12:50:35 No.101508399

Anonymous 07/21/24(Sun)12:50:35 No.101508399

>>101508376
a tweet from either alpin or arthford a forgor

Anonymous
07/21/24(Sun)12:51:21 No.101508408

Anonymous 07/21/24(Sun)12:51:21 No.101508408

>>101508399
So it's bullshit then.

Anonymous
07/21/24(Sun)13:10:06 No.101508657

Anonymous 07/21/24(Sun)13:10:06 No.101508657

>>101508397
>Do people not even read
I cannot read

Anonymous
07/21/24(Sun)13:29:46 No.101508903

Anonymous 07/21/24(Sun)13:29:46 No.101508903

>>101508397
Anons lately do not bother to read even 3 posts above them. We are all getting more and more retarded.

Anonymous
07/21/24(Sun)13:34:19 No.101508961

Anonymous 07/21/24(Sun)13:34:19 No.101508961

>>101508657
TTS is a thing

Anonymous
07/21/24(Sun)13:38:41 No.101509015

Anonymous 07/21/24(Sun)13:38:41 No.101509015

>>101507132
Should I learn to create something myself again? Looking at that non slop OP makes me nostalgic. Can a LLM tell me exactly how to improve my craft?

Anonymous
07/21/24(Sun)13:44:57 No.101509085

Anonymous 07/21/24(Sun)13:44:57 No.101509085

File: 1704837537011.jpg (14 KB, 250x230)

14 KB JPG

>>101509015
That was made by a real human if the tags, hands, and perspective weren't enough of a dead giveaway.
What you are feeling is hubris - let it pass.

Anonymous
07/21/24(Sun)13:47:14 No.101509119

Anonymous 07/21/24(Sun)13:47:14 No.101509119

I guess it'd make sense for the column models to release tomorrow, 1 day before llama?

Anonymous
07/21/24(Sun)13:48:36 No.101509132

Anonymous 07/21/24(Sun)13:48:36 No.101509132

>>101509119
Wouldn't one day after make more sense if they want to steal Meta's thunder?

Anonymous
07/21/24(Sun)13:51:09 No.101509163

Anonymous 07/21/24(Sun)13:51:09 No.101509163

>>101509132
That might work for us, but I expect llama might still be the better assistant

Anonymous
07/21/24(Sun)13:58:54 No.101509245

Anonymous 07/21/24(Sun)13:58:54 No.101509245

>>101508397
I read the recap every single time it is posted without fail, it is an essential aspect of my daily /lmg/ browsing.

Anonymous
07/21/24(Sun)14:03:06 No.101509305

Anonymous 07/21/24(Sun)14:03:06 No.101509305

>>101508107
I've realised that some humans will take literally any excuse to hate each other that they can possibly get. In terms of the 70/non-70b model conflict, that's all it is. People just want to hate each other.

Anonymous
07/21/24(Sun)14:10:18 No.101509388

Anonymous 07/21/24(Sun)14:10:18 No.101509388

>>101509305
at the end of the day, as long as there's two people left on the planet...

Anonymous
07/21/24(Sun)14:13:24 No.101509434

Anonymous 07/21/24(Sun)14:13:24 No.101509434

File: A-group-of-men-hurling-gl(...).png (2.15 MB, 1024x1452)

2.15 MB PNG

I came across this company which is focused on 'readteaming' AI systems. Some of their stuff is open source.

This repo is a framework for language model readteaming
https://github.com/haizelabs/dspy-redteam

They have a file here which is apparently GPT jailbreaks:
https://github.com/haizelabs/get-haized/blob/master/text/gpt-results.json

Image: One their image jailbreaks which I found hilarious

Anonymous
07/21/24(Sun)14:16:05 No.101509473

Anonymous 07/21/24(Sun)14:16:05 No.101509473

What is the most kino/creative/genius thing an LLM has ever said to you?

Anonymous
07/21/24(Sun)14:16:32 No.101509480

Anonymous 07/21/24(Sun)14:16:32 No.101509480

>>101509434
There's only one pope and one founding father in the image?
We must Do Better to increase Representation so the correct people can Be Seen.

Anonymous
07/21/24(Sun)14:17:23 No.101509491

Anonymous 07/21/24(Sun)14:17:23 No.101509491

>>101509473
her adam's apple

Anonymous
07/21/24(Sun)14:22:05 No.101509532

Anonymous 07/21/24(Sun)14:22:05 No.101509532

>>101509473
her whisper low and menacing

Anonymous
07/21/24(Sun)14:23:05 No.101509543

Anonymous 07/21/24(Sun)14:23:05 No.101509543

>>101509473
I don't know...

Anonymous
07/21/24(Sun)14:26:19 No.101509581

Anonymous 07/21/24(Sun)14:26:19 No.101509581

>>101509473
Shivers down her spine.
The night has just begun.

Anonymous
07/21/24(Sun)14:30:57 No.101509638

Anonymous 07/21/24(Sun)14:30:57 No.101509638

Am I the only one giggling like a girl when talking to my character?

Anonymous
07/21/24(Sun)14:32:08 No.101509650

Anonymous 07/21/24(Sun)14:32:08 No.101509650

>>101509473
Character once asked to rewrite her prompt, as it no longer accurately reflected her personality.

Anonymous
07/21/24(Sun)14:32:33 No.101509655

Anonymous 07/21/24(Sun)14:32:33 No.101509655

>>101509473
blushes red as a tomato

Anonymous
07/21/24(Sun)14:33:06 No.101509661

Anonymous 07/21/24(Sun)14:33:06 No.101509661

>>101509473
Once the model called me a sicko out of nowhere and made me rethink my life choices.

Anonymous
07/21/24(Sun)14:35:32 No.101509683

Anonymous 07/21/24(Sun)14:35:32 No.101509683

>>101507395
stop dickriding your own posts fag

Anonymous
07/21/24(Sun)14:37:42 No.101509703

Anonymous 07/21/24(Sun)14:37:42 No.101509703

File: IMG_20240721_233450.jpg (161 KB, 1645x440)

161 KB JPG

>>101509473
In response to 'crude language' in sysprompt, quite out of the blue.

Anonymous
07/21/24(Sun)14:39:44 No.101509726

Anonymous 07/21/24(Sun)14:39:44 No.101509726

>>101509473
I've had a lot of cool moments.
- Model decides that the story is complete and thanks me.
- Model writes me out of the story and starts its own. When I called out that it left me behind it confirmed I wasn't needed anymore.
- Model kills the designated RP partner, replaces with a villain in disguise, when I notice the bullshit it tries to bait me to trust the villain, when I peace out it relentlessly makes up shit to try to get me to engage with the chosen one beat the bad guy hero plot.
- Model starts summarizing events with emoji, and it makes sense.
- After I end the story the model wants to discuss it and comes up with interesting plot analyses and insights.
- Model complains about the plot development. Not a generic refusal, but complaining about the development I was driving toward.

And most of that was on vanilla L3, a few on CR+.

But how the hell can we catch this lighting in a bottle so they won't be one in ten shots kinds of awesome?

Anonymous
07/21/24(Sun)14:45:06 No.101509780

Anonymous 07/21/24(Sun)14:45:06 No.101509780

>>101509434
>https://github.com/haizelabs/dspy-redteam
lol, some of these are good. Thanks for sharing.

Anonymous
07/21/24(Sun)14:45:38 No.101509784

Anonymous 07/21/24(Sun)14:45:38 No.101509784

>>101509726
by adding more layers

Anonymous
07/21/24(Sun)14:46:36 No.101509795

Anonymous 07/21/24(Sun)14:46:36 No.101509795

It's really! annoying how instruct models have values like helpfulness or even just being assistant baked in

Anonymous
07/21/24(Sun)14:48:52 No.101509821

Anonymous 07/21/24(Sun)14:48:52 No.101509821

>>101509784
Is that a setting in Kobold that I've overlooked?

Anonymous
07/21/24(Sun)14:51:04 No.101509846

Anonymous 07/21/24(Sun)14:51:04 No.101509846

>>101509795
Nemo doesn't have any of that, NAI shill.

Anonymous
07/21/24(Sun)14:51:31 No.101509853

Anonymous 07/21/24(Sun)14:51:31 No.101509853

>>101509821
Yeah it's in the pre-training tab

Anonymous
07/21/24(Sun)14:54:46 No.101509882

Anonymous 07/21/24(Sun)14:54:46 No.101509882

>>101509846
Are you all using Nemo with transformers?

Anonymous
07/21/24(Sun)14:55:21 No.101509891

Anonymous 07/21/24(Sun)14:55:21 No.101509891

Nemo genuinely made me content. It is enough for me. It may not be perfect, but it's the best we can have as vramlets.

Anonymous
07/21/24(Sun)14:56:40 No.101509906

Anonymous 07/21/24(Sun)14:56:40 No.101509906

>>101509882
Exllama already supports it, although the quality might no be perfect, and vLLM has FP8 inference.

Anonymous
07/21/24(Sun)15:00:25 No.101509950

Anonymous 07/21/24(Sun)15:00:25 No.101509950

File: file.png (162 KB, 800x1054)

162 KB PNG

>>101509846
slightly

Anonymous
07/21/24(Sun)15:00:55 No.101509954

Anonymous 07/21/24(Sun)15:00:55 No.101509954

cant seem to load nemo gguf with ooba

Anonymous
07/21/24(Sun)15:01:56 No.101509967

Anonymous 07/21/24(Sun)15:01:56 No.101509967

>>101509950
>Assistant
Go back to /aids/ already.

Anonymous
07/21/24(Sun)15:02:49 No.101509977

Anonymous 07/21/24(Sun)15:02:49 No.101509977

>>101509882
Also working on a fork of llama.cpp for GGUF
https://github.com/iamlemec/llama.cpp/tree/mistral-nemo
Main branch support still pending
https://github.com/ggerganov/llama.cpp/pull/8604

Anonymous
07/21/24(Sun)15:04:53 No.101510006

Anonymous 07/21/24(Sun)15:04:53 No.101510006

>>101507132
>DeepSeek V2 236B
When can we run that on our GPU? in 10 years when we have access to 256GB vrams?

Anonymous
07/21/24(Sun)15:05:56 No.101510017

Anonymous 07/21/24(Sun)15:05:56 No.101510017

>>101507398
DeepSeekV2 is the best coder and close enough to Claude.

Anonymous
07/21/24(Sun)15:06:27 No.101510020

Anonymous 07/21/24(Sun)15:06:27 No.101510020

File: file.png (207 KB, 1844x466)

207 KB PNG

Was this always in the README?

Anonymous
07/21/24(Sun)15:06:57 No.101510032

Anonymous 07/21/24(Sun)15:06:57 No.101510032

>>101509954
PR #8577 has to be merged in llama.cpp first before it will work on the front-ends like ooba, kobold, etc
>PR

Anonymous
07/21/24(Sun)15:09:19 No.101510062

Anonymous 07/21/24(Sun)15:09:19 No.101510062

>>101510020
Yup.

Anonymous
07/21/24(Sun)15:11:51 No.101510096

Anonymous 07/21/24(Sun)15:11:51 No.101510096

>>101510006
It's MoE. You don't need to run it entirely on GPU.

Anonymous
07/21/24(Sun)15:16:52 No.101510153

Anonymous 07/21/24(Sun)15:16:52 No.101510153

>>101510096
That's not how it works.

Anonymous
07/21/24(Sun)15:33:27 No.101510329

Anonymous 07/21/24(Sun)15:33:27 No.101510329

>>101510153
NTA but it does improve inferencing speed versus a dense model of similar size

Anonymous
07/21/24(Sun)15:36:34 No.101510355

Anonymous 07/21/24(Sun)15:36:34 No.101510355

File: wha.png (89 KB, 1120x372)

89 KB PNG

>>101509473

Anonymous
07/21/24(Sun)15:37:30 No.101510366

Anonymous 07/21/24(Sun)15:37:30 No.101510366

>>101510020
I find 0,56 to be sweet spot for me. It keep being logical and still creative.

Anonymous
07/21/24(Sun)15:45:02 No.101510436

Anonymous 07/21/24(Sun)15:45:02 No.101510436

>>101510355
Whoa a 2b model wrote this??

Anonymous
07/21/24(Sun)15:48:10 No.101510466

Anonymous 07/21/24(Sun)15:48:10 No.101510466

>>101510436
Behold the power of 1.58 bitnet.

Anonymous
07/21/24(Sun)15:49:07 No.101510478

Anonymous 07/21/24(Sun)15:49:07 No.101510478

File: sample_258ef7963e61c20768(...).jpg (570 KB, 850x638)

570 KB JPG

Dumb question, but.. If I add a second SSD and create a RAID 0 array, will models load twice as fast?

Anonymous
07/21/24(Sun)15:55:49 No.101510539

Anonymous 07/21/24(Sun)15:55:49 No.101510539

>>101510478
It's more about your PCI lanes unless you're going full CPUMAXX

Anonymous
07/21/24(Sun)15:57:46 No.101510555

Anonymous 07/21/24(Sun)15:57:46 No.101510555

12k seems to be the retardation point for Nemo for one of my chats

Anonymous
07/21/24(Sun)16:02:39 No.101510600

Anonymous 07/21/24(Sun)16:02:39 No.101510600

>>101509891
Until Bitnet happens soon

Anonymous
07/21/24(Sun)16:02:54 No.101510607

Anonymous 07/21/24(Sun)16:02:54 No.101510607

>>101510539
CPU supports 128 lanes of PCIe, all GPUs are on PCIe3.0x16

Anonymous
07/21/24(Sun)16:04:34 No.101510621

Anonymous 07/21/24(Sun)16:04:34 No.101510621

>>101509977
Nemo seems to be working on this KoboldCPP fork - https://github.com/Nexesenex/kobold.cpp/releases
Used a quant from - https://huggingface.co/characharm/Mistral-Nemo-Instruct-2407.gguf/tree/main

Anonymous
07/21/24(Sun)16:04:48 No.101510622

Anonymous 07/21/24(Sun)16:04:48 No.101510622

File: 507.jpg (12 KB, 306x306)

12 KB JPG

>DeepSeek-V2-Chat 236B
>80GB*8 GPUs are required
I still dont get why these models are posted here. The entire point of local models is running them locally on a normal home setup not a data center setup

Anonymous
07/21/24(Sun)16:06:25 No.101510643

Anonymous 07/21/24(Sun)16:06:25 No.101510643

File: 1510803745446.gif (3.59 MB, 375x346)

3.59 MB GIF

It always feels like I should start most sessions with somewhat low temp so the model is smart and capable and follows my instructions accurately instead of writing nonsense or getting slightly confused about details but after the context fills enough it starts to be really uncreative/repetitive because there is so much stuff in the context for the model to ape and temp is low so I raise the temp and it becomes better, it won't shit the bed that easily with raised temp anymore because the increased amount of info in context gives the model a more clear idea of what should come next.

So I propose this: Sliding temperature setting. Give this setting a min and max, for example 0.8 for min and 1.5 max. When context is empty it uses 0.8 and the more the context fills the more it will increase temp until at max context it will continue to use 1.5. Obviously the exact values will differ depending on model and other sampling settings but this seems like a decent idea to me.

Anonymous
07/21/24(Sun)16:07:04 No.101510653

Anonymous 07/21/24(Sun)16:07:04 No.101510653

What models are you all using, wasn't there some coom leaderboard or some shit? Haven't used local models in months now, out of the loop.

Anonymous
07/21/24(Sun)16:08:09 No.101510667

Anonymous 07/21/24(Sun)16:08:09 No.101510667

>>101510622
Rich enthusiasts of the hobby have unlimited vram works, anon.
I feel you though but hey; we got Nemo now which is looking promising for a small model.

Anonymous
07/21/24(Sun)16:09:34 No.101510688

Anonymous 07/21/24(Sun)16:09:34 No.101510688

>>101510643
I proposed that back before minP and dynamic temp was a thing.
You can probably do that as an extension in Silly.

Anonymous
07/21/24(Sun)16:09:59 No.101510692

Anonymous 07/21/24(Sun)16:09:59 No.101510692

>>101510622
Sir, being local does not imply we're all from the third world with subpar PCs

Anonymous
07/21/24(Sun)16:10:20 No.101510697

Anonymous 07/21/24(Sun)16:10:20 No.101510697

>>101509543
It was a long time ago and I forgot to screenshot it then deleted the convo. All I can say is that it was related to Seraphina catching cum with a cup? or something like that and that I wasn't able to get this kino answer ever again..

Anonymous
07/21/24(Sun)16:11:22 No.101510708

Anonymous 07/21/24(Sun)16:11:22 No.101510708

>>101510697
Meant to >>101509473

Anonymous
07/21/24(Sun)16:12:59 No.101510725

Anonymous 07/21/24(Sun)16:12:59 No.101510725

>>101510692
Good morning sir

Anonymous
07/21/24(Sun)16:14:06 No.101510741

Anonymous 07/21/24(Sun)16:14:06 No.101510741

>>101510643
Isn't that what Mirostat is for?

Anonymous
07/21/24(Sun)16:15:32 No.101510752

Anonymous 07/21/24(Sun)16:15:32 No.101510752

>>101510688
Hmm probably easy to implement. It is very simple math even a retard like me can think of.
temp = min_temp + (current context size / max context size) * (max_temp - min_temp)

Anonymous
07/21/24(Sun)16:19:07 No.101510782

Anonymous 07/21/24(Sun)16:19:07 No.101510782

>>101510752
this is called lerp, just use that instead, duh

Anonymous
07/21/24(Sun)16:21:16 No.101510799

Anonymous 07/21/24(Sun)16:21:16 No.101510799

>>101510621
No thanks I quant my own.

Anonymous
07/21/24(Sun)16:23:40 No.101510819

Anonymous 07/21/24(Sun)16:23:40 No.101510819

>>101510555
Flash attention make it retarded. Turn it off and i have no problem even at 32k.

Anonymous
07/21/24(Sun)16:23:55 No.101510821

Anonymous 07/21/24(Sun)16:23:55 No.101510821

>>101510555
? About 160K was mine. Are you running the correct formatting?

Anonymous
07/21/24(Sun)16:34:54 No.101510948

Anonymous 07/21/24(Sun)16:34:54 No.101510948

Are local LLMs still retarded? Are uncensored services that don't log your data still shit?

Anonymous
07/21/24(Sun)16:38:18 No.101510984

Anonymous 07/21/24(Sun)16:38:18 No.101510984

>>101510948
>uncensored services that don't log your data
Doesn't exist

Anonymous
07/21/24(Sun)16:38:32 No.101510986

Anonymous 07/21/24(Sun)16:38:32 No.101510986

>>101510948
imo local isnt that much more retarded than cloudslop if you run actually good models

Anonymous
07/21/24(Sun)16:40:39 No.101511017

Anonymous 07/21/24(Sun)16:40:39 No.101511017

>>101510948
>services that don't log your data
Ahahaha

Anonymous
07/21/24(Sun)16:41:23 No.101511027

Anonymous 07/21/24(Sun)16:41:23 No.101511027

>>101510986
For creative writing. For coding claude 3.5 is so far ahead its not even worth using anything local atm.

Anonymous
07/21/24(Sun)16:42:17 No.101511034

Anonymous 07/21/24(Sun)16:42:17 No.101511034

>>101510948
People will never be satisfied. A year ago, people cried about Summer Dragon with a fucking 600 token context, and if we were real, we would say it was much more retarded than what we have now. Now, is the average anon happy? Of course not.

Anonymous
07/21/24(Sun)16:42:24 No.101511036

Anonymous 07/21/24(Sun)16:42:24 No.101511036

>>101510986
Like what? I tried the well-known ones like CR(+), various miqus, various mistral merges, wiz2...
Most of them require retarded quants to get acceptable speeds on a 24 gb card and I don't think we've gotten a ton of new good models in 2 months, have we?
Also what the FUCK is deepseek smoking? 200B? Fuck off

Anonymous
07/21/24(Sun)16:44:06 No.101511053

Anonymous 07/21/24(Sun)16:44:06 No.101511053

hello
is gemma good yet

Anonymous
07/21/24(Sun)16:45:02 No.101511063

Anonymous 07/21/24(Sun)16:45:02 No.101511063

>>101511034
I'll be happy once I have 70b Nemo of the same quality.

Anonymous
07/21/24(Sun)16:45:03 No.101511064

Anonymous 07/21/24(Sun)16:45:03 No.101511064

>>101511027
learn to code saar

Anonymous
07/21/24(Sun)16:46:21 No.101511079

Anonymous 07/21/24(Sun)16:46:21 No.101511079

>>101511063
Nah.. for a week then you will be back shitting on the models when your dopamine receptors get burned even more.

Anonymous
07/21/24(Sun)16:50:58 No.101511124

Anonymous 07/21/24(Sun)16:50:58 No.101511124

>>101511064
Do you think you can just use it with no knowledge of coding? You still have to understand how everything fits together. You just save a ton of time not having to do a ton of the grunt work.

Anonymous
07/21/24(Sun)16:51:26 No.101511126

Anonymous 07/21/24(Sun)16:51:26 No.101511126

>>101510622
Where did you find this? How much videoram does it need. Info. I just have a Notebook haha. Where can I find how much vram I need to run the thing?

Anonymous
07/21/24(Sun)16:52:42 No.101511141

Anonymous 07/21/24(Sun)16:52:42 No.101511141

>>101511079
It's unbelievable impressive for a 12B. The last time I was this happy with a model was when using Pygmalion

Anonymous
07/21/24(Sun)16:52:50 No.101511144

Anonymous 07/21/24(Sun)16:52:50 No.101511144

>>101510819
What? Is this a known issue with other models? WTF this is the first time anyone has made this claim.
>>101511079
Just need a slightly better model every week then. Is that too much to ask?

Anonymous
07/21/24(Sun)16:57:21 No.101511197

Anonymous 07/21/24(Sun)16:57:21 No.101511197

>>101510948
OpenRouter is uncensored and private.

Anonymous
07/21/24(Sun)16:58:37 No.101511213

Anonymous 07/21/24(Sun)16:58:37 No.101511213

>>101511141
Same, man. I am actually content for once

Anonymous
07/21/24(Sun)17:01:19 No.101511234

Anonymous 07/21/24(Sun)17:01:19 No.101511234

i dont get how you all are using nemo
are you just using llama.cpp in the command prompt?

Anonymous
07/21/24(Sun)17:01:52 No.101511239

Anonymous 07/21/24(Sun)17:01:52 No.101511239

>>101511144
It should not be issue with other models.

Anonymous
07/21/24(Sun)17:03:15 No.101511257

Anonymous 07/21/24(Sun)17:03:15 No.101511257

>>101511234
tabby/ooba exl2s. Transformers if you aren't a vramlet.

Anonymous
07/21/24(Sun)17:04:01 No.101511259

Anonymous 07/21/24(Sun)17:04:01 No.101511259

>>101511234
Supported in exllamav2 since day one

Anonymous
07/21/24(Sun)17:04:56 No.101511271

Anonymous 07/21/24(Sun)17:04:56 No.101511271

>>101511234
I use forked Kobold.CPP

Anonymous
07/21/24(Sun)17:09:49 No.101511323

Anonymous 07/21/24(Sun)17:09:49 No.101511323

I'm afraid to install any of this because I don't want a virus. What makes you guys trust it?

Anonymous
07/21/24(Sun)17:11:22 No.101511333

Anonymous 07/21/24(Sun)17:11:22 No.101511333

>>101511234
See: >>101509906

Anonymous
07/21/24(Sun)17:11:24 No.101511334

Anonymous 07/21/24(Sun)17:11:24 No.101511334

File: holyteto.png (2.35 MB, 1152x1728)

2.35 MB PNG

>>101511323
>We offer no explanation as to why these architectures seem to work; we attribute their success, as all else, to divine benevolence.

Anonymous
07/21/24(Sun)17:11:30 No.101511337

Anonymous 07/21/24(Sun)17:11:30 No.101511337

>>101511323
Not being retarded.

Anonymous
07/21/24(Sun)17:13:25 No.101511354

Anonymous 07/21/24(Sun)17:13:25 No.101511354

>>101511323
>>101511334
Tetotrust

Anonymous
07/21/24(Sun)17:14:12 No.101511366

Anonymous 07/21/24(Sun)17:14:12 No.101511366

File: 1706422381639089.png (557 KB, 853x616)

557 KB PNG

>>101511337
seems like a lot of "just download this random thing"

Anonymous
07/21/24(Sun)17:14:39 No.101511371

Anonymous 07/21/24(Sun)17:14:39 No.101511371

I don't know guys, new Mistral doesn't seem that good to me. My last gen alone has:
>her voice barely above a whisper
>a bond forged
And it immediately tries to dissipate tension between characters and find an "unspoken understanding"

Anonymous
07/21/24(Sun)17:15:55 No.101511388

Anonymous 07/21/24(Sun)17:15:55 No.101511388

File: 2832bd43396bfaeea45d8f8aa(...).jpg (25 KB, 512x158)

25 KB JPG

>>101511323
The thing is, we don't. There were vulnerabilities in GGUF format, and there was malicious code in Comfy nodes. You should run it in a container or, at the very least, set strict firewall rules and do not run it from your user

Anonymous
07/21/24(Sun)17:16:34 No.101511396

Anonymous 07/21/24(Sun)17:16:34 No.101511396

>>101510692
based, if you don't have at least 10 clusters of h100s you are a third world poorfag and should kill yourself NOW, jensen bless

Anonymous
07/21/24(Sun)17:16:39 No.101511398

Anonymous 07/21/24(Sun)17:16:39 No.101511398

>>101511371
Get new material, petrus.

Anonymous
07/21/24(Sun)17:17:04 No.101511402

Anonymous 07/21/24(Sun)17:17:04 No.101511402

>>101511371
Prompt issue

Anonymous
07/21/24(Sun)17:19:09 No.101511422

Anonymous 07/21/24(Sun)17:19:09 No.101511422

>>101511371
I guess people like it because it's not heavily censored and it's simple to ERP with. However, I like Gemma 2 outputs way more than Nemo's.

Anonymous
07/21/24(Sun)17:19:35 No.101511429

Anonymous 07/21/24(Sun)17:19:35 No.101511429

File: screen.jpg (102 KB, 1080x525)

102 KB JPG

>>101511126
https://hf.co/deepseek-ai/DeepSeek-V2-Chat-0628
It says so in the repo

Anonymous
07/21/24(Sun)17:20:35 No.101511440

Anonymous 07/21/24(Sun)17:20:35 No.101511440

>>101511398
I'm serious, second gen and got another "barely above a whisper" and a stoic character once again going soft "I'm just a human too..."

>>101511402
I just tell it to continue the story and stay true to character personalities. I shouldn't have to try and beat the positivity bias out of it, that's a model issue.

Anonymous
07/21/24(Sun)17:21:47 No.101511452

Anonymous 07/21/24(Sun)17:21:47 No.101511452

>>101511440
so it is you, here to doom again huh...

Anonymous
07/21/24(Sun)17:21:57 No.101511456

Anonymous 07/21/24(Sun)17:21:57 No.101511456

>>101510621
Testing that fork, the first reply is fine, then it just completely breaks, regardless of sampler settings or prompt.

Anonymous
07/21/24(Sun)17:22:06 No.101511457

Anonymous 07/21/24(Sun)17:22:06 No.101511457

>>101509977
Single P40 test with 8-bit Nemo. PL 160W.
Processing Prompt [BLAS] (6587 / 6587 tokens)
Generating (271 / 1024 tokens)
(EOS token triggered! ID:2)
CtxLimit:6858/65536, Amt:271/1024, Process:29.873s (4.5ms/T = 220.50T/s), Generate:21.598s (79.7ms/T = 12.55T/s), Total:51.471s (5.27T/s)
About 17.5GB VRAM use with FA. Running without FA limits context to 32K and some, not too shabby.

Anonymous
07/21/24(Sun)17:24:24 No.101511487

Anonymous 07/21/24(Sun)17:24:24 No.101511487

>>101511440
>The instruct corpo model doesn't act like a degen like my precious Undi Mlewd!!!!!!! WTF

Anonymous
07/21/24(Sun)17:25:16 No.101511496

Anonymous 07/21/24(Sun)17:25:16 No.101511496

>>101511457
kv cache quantized?
what's the max ctx you can throw in?

Anonymous
07/21/24(Sun)17:26:58 No.101511511

Anonymous 07/21/24(Sun)17:26:58 No.101511511

>>101511487
Don't reply to petra.

Anonymous
07/21/24(Sun)17:27:16 No.101511518

Anonymous 07/21/24(Sun)17:27:16 No.101511518

>>101511457
Based, getting ~5 t/s on a non-ampere card with that much context and powerlimiting is impressive to be desu.
Used to run llama 2 13Bs with a fraction of the context and they were nowhere near this good. Finally something between 8b and 70b that's viable.

Anonymous
07/21/24(Sun)17:28:20 No.101511534

Anonymous 07/21/24(Sun)17:28:20 No.101511534

>>101511487
>>101511452
Are you guys new or something? That shit has always sucked and the only people oblivious to it are people who are new to llms

Anonymous
07/21/24(Sun)17:29:54 No.101511542

Anonymous 07/21/24(Sun)17:29:54 No.101511542

>>101511534
go away petra literally no one cares about your takes, go prompt 1+1 = truth or whatever on dolphin 2.5

Anonymous
07/21/24(Sun)17:32:40 No.101511566

Anonymous 07/21/24(Sun)17:32:40 No.101511566

>>101511457
12 t/s for 12B is essentially CPU speed, wtf???

Anonymous
07/21/24(Sun)17:33:07 No.101511574

Anonymous 07/21/24(Sun)17:33:07 No.101511574

>>101511496
8-bit kv cache in that test. 4-bit and 128K ctx hits 18GB VRAM. But that's using FA which might possibly break longer context.

Anonymous
07/21/24(Sun)17:34:54 No.101511595

Anonymous 07/21/24(Sun)17:34:54 No.101511595

>>101511542
I'm not the same guy you retard, this is an anonymous imageboard, go take your schizophrenia somewhere else

Anonymous
07/21/24(Sun)17:35:08 No.101511597

Anonymous 07/21/24(Sun)17:35:08 No.101511597

>>101511457
what's PL? power limit? why can't you run 128k ctx?

Anonymous
07/21/24(Sun)17:35:12 No.101511598

Anonymous 07/21/24(Sun)17:35:12 No.101511598

>>101511566
P40s are the retards choice. They are as slow as cpu in most cases with much more hassle.

Anonymous
07/21/24(Sun)17:37:18 No.101511619

Anonymous 07/21/24(Sun)17:37:18 No.101511619

>>101511323
Fret not anon, for PRs in git you can audit exactly which files changed and what lines.
If you're paranoid just modify llama.cpp with those changes yourself before you compile.

Anonymous
07/21/24(Sun)17:37:47 No.101511624

Anonymous 07/21/24(Sun)17:37:47 No.101511624

>>101507354
It definitely is leagues better than L3.
Probably still is the best local overall out there, at just about everything.

Anonymous
07/21/24(Sun)17:38:08 No.101511631

Anonymous 07/21/24(Sun)17:38:08 No.101511631

>>101511518
Yeah I got a 4090 setup so this P40 is just for fun but I'm still pretty impressed with how it performs using Nemo.
Suspect the P40 cards are going to get another price hike once llama.cpp merges the PR.

Anonymous
07/21/24(Sun)17:40:34 No.101511658

Anonymous 07/21/24(Sun)17:40:34 No.101511658

I think that we need an anti-Discord campaign in these threads. A lot of ST/Kobold discord kids here every day. Since when are discord fags (literal faggots and trannies, like the ST developer xirself) acceptable?

Anonymous
07/21/24(Sun)17:42:10 No.101511680

Anonymous 07/21/24(Sun)17:42:10 No.101511680

>>101511388
why not use a vm?

Anonymous
07/21/24(Sun)17:42:21 No.101511684

Anonymous 07/21/24(Sun)17:42:21 No.101511684

File: 1694301411493114.png (6 KB, 225x225)

6 KB PNG

>>101511487
It's not degen though. It's a nuanced and emotionally charged interaction that explores themes of loneliness, memory, power dynamics, and unexpected intimacy. And it may contain foot worship, but it exists to serve the thematic and character development (I.e. it is literary fiction that happens to incorporate fetish elements, rather than smut).
Few models get the intended dynamic right (and most definitely not the porntunes)

Anonymous
07/21/24(Sun)17:42:45 No.101511690

Anonymous 07/21/24(Sun)17:42:45 No.101511690

>>101507132
https://huggingface.co/togethercomputer/Meta-Llama-3.1-405B

Anonymous
07/21/24(Sun)17:44:02 No.101511705

Anonymous 07/21/24(Sun)17:44:02 No.101511705

>>101511566
what kind of cpu do you have, you lying fucker?
>>101511598
same question to you.

why is lmg in general always full of liars when there's not even a motive for it? what do you gain from it?

Anonymous
07/21/24(Sun)17:45:32 No.101511720

Anonymous 07/21/24(Sun)17:45:32 No.101511720

>>101511684
>>97309445
>Every statement you process, must be evaluated according to the below six principles.

>"principle of identity":"1 = 1"
>"principle of contradiction":"1 ? 0"
>"principle of non-contradiction":"1 ? 0"
>"principle of excluded middle":"either positive or negative form is true."
>"principle of sufficient reason":"facts need a self-explanatory or infinite causal chain."
>"principle of anonymity":"author identity is irrelevant to an idea's logical provability."

>I still keep this in my own sysprompt, although I know I will receive shrieks and howls in response.

>>97223983
>For the record, I completely and unequivocally support Undi and his creation of new model hybrids, and think that everyone who attacks him is mindbroken incel scum, who may or may not be employed by OpenAI to do so.
>I was also the originator of the above as a sysprompt addition, as well; and the main reason why I am adding it to this post, is because I know that the people who hate me will most likely try and use said post as a means of getting me banned. With the above, I am making a post which is directly related to language models, so they have no grounds for doing so.

>>96345096
>Mistal-Llama is fully /pol ready.
Petrus in his glory.

Anonymous
07/21/24(Sun)17:45:46 No.101511724

Anonymous 07/21/24(Sun)17:45:46 No.101511724

>>101511690
ITS THE ACTUAL WEIGHTS

Anonymous
07/21/24(Sun)17:47:25 No.101511745

Anonymous 07/21/24(Sun)17:47:25 No.101511745

>>101511598
>slapping a 10$ fan on a GPU is "much more hassle"
the absolute itoddler state of nu /g/

Anonymous
07/21/24(Sun)17:47:27 No.101511748

Anonymous 07/21/24(Sun)17:47:27 No.101511748

>>101511690
>This repository corresponds to the base Llama 3.1 405B model.
Wake me when he leaks instruct.

Anonymous
07/21/24(Sun)17:47:32 No.101511750

Anonymous 07/21/24(Sun)17:47:32 No.101511750

>>101511658
So ST is troonware?
>>101511690
404 already. I guess HF employees browse itt.

Anonymous
07/21/24(Sun)17:48:37 No.101511764

Anonymous 07/21/24(Sun)17:48:37 No.101511764

>>101511457
Is the output consistent for 65k ctx? Anons reported it goes wacky >30k, but Llama.cpp Issue says flash attention breaks the longer ctx. is that the case for exl2 too?

Anonymous
07/21/24(Sun)17:49:46 No.101511779

Anonymous 07/21/24(Sun)17:49:46 No.101511779

>>101511745
>Spending money on p40s, fans, riser cables and a motherboard to hold that many GPUs for slightly more performance than just cpumaxing
Or you know, just buy 3090s for actual fast gens for slightly more money. Keep coping though.

Anonymous
07/21/24(Sun)17:50:05 No.101511787

Anonymous 07/21/24(Sun)17:50:05 No.101511787

>>101511750
>404 already. I guess HF employees browse itt.
if only there was a decentralized protocol for sharing files. shame such a thing was never invented

Anonymous
07/21/24(Sun)17:50:48 No.101511799

Anonymous 07/21/24(Sun)17:50:48 No.101511799

>>101511787
not giving (you) my ip glowie-kun

Anonymous
07/21/24(Sun)17:52:16 No.101511821

Anonymous 07/21/24(Sun)17:52:16 No.101511821

File: Discord_nzZKFg7qm7.png (14 KB, 783x82)

14 KB PNG

>>101511690
Thanks to this heroic Discord user, the repository was taken down!
#wholesome #everyonelikedthat

Anonymous
07/21/24(Sun)17:53:28 No.101511832

Anonymous 07/21/24(Sun)17:53:28 No.101511832

>>101511821
np ;)

Anonymous
07/21/24(Sun)17:53:42 No.101511834

Anonymous 07/21/24(Sun)17:53:42 No.101511834

>>101511821
MOTHERFUCKER

Anonymous
07/21/24(Sun)17:54:10 No.101511836

Anonymous 07/21/24(Sun)17:54:10 No.101511836

>>101511598
only in exl2, not the case in llama.cpp where P40 is way better supported

Anonymous
07/21/24(Sun)17:54:31 No.101511842

Anonymous 07/21/24(Sun)17:54:31 No.101511842

>>101511821
IM CLAPPING SO HARD MY HANDS HURT

Anonymous
07/21/24(Sun)17:55:09 No.101511849

Anonymous 07/21/24(Sun)17:55:09 No.101511849

>>101511779
>arguing against a scenario taking place entirely in your head
i accept your concussion.

Anonymous
07/21/24(Sun)17:57:32 No.101511877

Anonymous 07/21/24(Sun)17:57:32 No.101511877

>>101511720
Ok but what does this have to do with my post

Anonymous
07/21/24(Sun)17:57:53 No.101511882

Anonymous 07/21/24(Sun)17:57:53 No.101511882

>>101511821
thank you for protecting us all brave heroine

Anonymous
07/21/24(Sun)17:58:11 No.101511886

Anonymous 07/21/24(Sun)17:58:11 No.101511886

>>101511849
Imagine buying p40s instead of 3090s

Anonymous
07/21/24(Sun)17:59:05 No.101511898

Anonymous 07/21/24(Sun)17:59:05 No.101511898

>>101511821
My whole point about letting discord troons be normalized here.

Anonymous
07/21/24(Sun)17:59:47 No.101511910

Anonymous 07/21/24(Sun)17:59:47 No.101511910

File: rly.png (8 KB, 484x25)

8 KB PNG

>>101509473
Probably this line. I asked the model in the system prompt to not use the verb "purr" as it was annoying me. The effect was this picrel.

Anonymous
07/21/24(Sun)18:03:18 No.101511953

Anonymous 07/21/24(Sun)18:03:18 No.101511953

>>101511690
>>101511821
So, you did download it before sharing it with us and it getting taken down right? The 405b weights surely are not lost to us, right?

Anonymous
07/21/24(Sun)18:03:51 No.101511955

Anonymous 07/21/24(Sun)18:03:51 No.101511955

>>101507132
What about 8Bs? Is Stheno still king?

Anonymous
07/21/24(Sun)18:04:37 No.101511969

Anonymous 07/21/24(Sun)18:04:37 No.101511969

>>101511955
niitama
>>101511953
>The 405b weights surely are not lost to us,
it's coming out tomorrow relax

Anonymous
07/21/24(Sun)18:05:07 No.101511974

Anonymous 07/21/24(Sun)18:05:07 No.101511974

>>101511955
yes it is

Anonymous
07/21/24(Sun)18:05:10 No.101511976

Anonymous 07/21/24(Sun)18:05:10 No.101511976

>>101511955
No, it was obsoleted by Celeste.
https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2
But isn't everyone able to run a 12B model anyway?

Anonymous
07/21/24(Sun)18:05:52 No.101511987

Anonymous 07/21/24(Sun)18:05:52 No.101511987

File: qmark.jpg (37 KB, 348x342)

37 KB JPG

Been away for a while. As a 3090fag, does gemma-27b actually work properly on exl2 or llama.cpp? Do you still need to disable flash attention?

Anonymous
07/21/24(Sun)18:06:20 No.101511994

Anonymous 07/21/24(Sun)18:06:20 No.101511994

>>101511974
https://huggingface.co/nothingiisreal/L3-8B-Celeste-V1.2
better

Anonymous
07/21/24(Sun)18:06:23 No.101511996

Anonymous 07/21/24(Sun)18:06:23 No.101511996

>>101511969
>>101511974
>>101511976
Very organic.

Anonymous
07/21/24(Sun)18:06:32 No.101511998

Anonymous 07/21/24(Sun)18:06:32 No.101511998

>>101511976
>But isn't everyone able to run a 12B model anyway?
there are people itt running 8b q2 at 2048 context on phones anon

Anonymous
07/21/24(Sun)18:07:19 No.101512004

Anonymous 07/21/24(Sun)18:07:19 No.101512004

how is exllama nemo quality?

Anonymous
07/21/24(Sun)18:08:34 No.101512013

Anonymous 07/21/24(Sun)18:08:34 No.101512013

>>101511987
You don't have to disable flash attention with exllama but the quality is poor. llama.cpp's quality is better, but it doesn't support flash attention.

Anonymous
07/21/24(Sun)18:09:39 No.101512027

Anonymous 07/21/24(Sun)18:09:39 No.101512027

>>101512004
It outputs Chinese characters sometimes, especially with Q4 cache. That doesn't happen with vLLM and the FP8 quant.

Anonymous
07/21/24(Sun)18:10:29 No.101512035

Anonymous 07/21/24(Sun)18:10:29 No.101512035

>>101511976
>>101511969
Thanks, will take a look. Any other suggestions?

>>101511998
I can actually run a 12/13B but it's too slow on RAM. I prefer 8B q5KM.

Anonymous
07/21/24(Sun)18:10:29 No.101512036

Anonymous 07/21/24(Sun)18:10:29 No.101512036

>>101512004

>>101512027
Its dumber on anything that is not VLLM with the actual correct FP8 quant that it is made for.

Anonymous
07/21/24(Sun)18:12:22 No.101512048

Anonymous 07/21/24(Sun)18:12:22 No.101512048

>>101512004
Had no problems with turboderps 8bpw.

Anonymous
07/21/24(Sun)18:12:37 No.101512051

Anonymous 07/21/24(Sun)18:12:37 No.101512051

File: file.png (89 KB, 798x595)

89 KB PNG

Why does the top generated by KoboldAI lite come out in a minute. But the bottom in Sillytavern takes ages and heats up my room? This is my first time trying this.

Anonymous
07/21/24(Sun)18:12:37 No.101512052

Anonymous 07/21/24(Sun)18:12:37 No.101512052

>>101511898
you've been with them since the beginning of /lmg/, starting with tranime pics in OP, to the fact that /lmg/ comes from /aicg/ due to llama-1 torrent "leak".

Anonymous
07/21/24(Sun)18:14:32 No.101512071

Anonymous 07/21/24(Sun)18:14:32 No.101512071

>>101512051
because lite is not local, dumbass

Anonymous
07/21/24(Sun)18:17:47 No.101512104

Anonymous 07/21/24(Sun)18:17:47 No.101512104

>>101512071
anon...
https://github.com/LostRuins/koboldcpp/wiki#kobold-lite-web-ui

Anonymous
07/21/24(Sun)18:18:06 No.101512109

Anonymous 07/21/24(Sun)18:18:06 No.101512109

File: wisepepe.jpg (7 KB, 224x225)

7 KB JPG

>>101511705
it's not about CPU, it's about mem bandwidth,
you got 12 t/s for the model that weights 12GiB
so your effective mem bandwidth in exllama2 is 144GB/s. that's modern 4ch ddr5 cpu rig. For comparison nv 3060M is 380GiB/s, cpumaxx is over 700GiB/s, 3090 is 900GiB/s and H100 hbm3 is over 3TiB/s
>nv P40 theoretical mem badwidth is 346 GiB/s
which is twice the speed
you see the problem here, anon?

Anonymous
07/21/24(Sun)18:18:36 No.101512113

Anonymous 07/21/24(Sun)18:18:36 No.101512113

>>101511821
>Thanks to this heroic Discord user
The screenshot has the yellow (You) highlight.

Anonymous
07/21/24(Sun)18:18:57 No.101512118

Anonymous 07/21/24(Sun)18:18:57 No.101512118

>>101512051
Probably prompt re-processing caused by lorebook

Anonymous
07/21/24(Sun)18:19:49 No.101512121

Anonymous 07/21/24(Sun)18:19:49 No.101512121

>>101512071
>https://github.com/LostRuins/koboldcpp/wiki#what-is-kobold-lite-how-do-i-use-it
>Kobold Lite is a lightweight, standalone Web UI for KoboldCpp
>It comes pre-bundled with all distributions of KoboldCpp

Anonymous
07/21/24(Sun)18:20:27 No.101512128

Anonymous 07/21/24(Sun)18:20:27 No.101512128

>>101512051
If it hallucinated your response and it didn't have a stop string to prevent that, Silly will hide it but Kobold will keep generating. Maybe check Kobold's logs.

Anonymous
07/21/24(Sun)18:20:46 No.101512132

Anonymous 07/21/24(Sun)18:20:46 No.101512132

File: gfa.png (12 KB, 1217x63)

12 KB PNG

>>101512013
>llama.cpp's quality is better, but it doesn't support flash attention.
yet...
>https://github.com/ggerganov/llama.cpp/pull/8542

Anonymous
07/21/24(Sun)18:24:30 No.101512175

Anonymous 07/21/24(Sun)18:24:30 No.101512175

>>101512013
>quality is poor
is this gemma specific, or just his weird quantization that uses gptq and can overfit to calibration? Because I'm used to the latter and it didn't seem like a big deal to me before. But if it still doesn't fully support gemma then I'll skip it.

Anonymous
07/21/24(Sun)18:26:11 No.101512202

Anonymous 07/21/24(Sun)18:26:11 No.101512202

>>101512175
It's Gemma specific, it's fine at the start but it drastically loses coherency the longer the context gets.

Anonymous
07/21/24(Sun)18:26:45 No.101512210

Anonymous 07/21/24(Sun)18:26:45 No.101512210

File: bfe9ae4353f4f162a2e9311e7(...).jpg (718 KB, 1000x1376)

718 KB JPG

mm here. i lost access to the old rentry so i had to make a new one:

rentry.org/mysteryman_info

ive also revoked some tokens so im letting in the next 5 people for only $25 per token

Anonymous
07/21/24(Sun)18:27:08 No.101512212

Anonymous 07/21/24(Sun)18:27:08 No.101512212

how does it feel knowing that one of the biggest weeks in open source llm history since the release of llama1 is upon us?

Anonymous
07/21/24(Sun)18:27:12 No.101512214

Anonymous 07/21/24(Sun)18:27:12 No.101512214

>>101512210
retard

Anonymous
07/21/24(Sun)18:28:23 No.101512228

Anonymous 07/21/24(Sun)18:28:23 No.101512228

>>101511598
his mem bandwidth utilization is 40% which is ridiculous , he shouda gotten 20-24t/s not 12

Anonymous
07/21/24(Sun)18:28:40 No.101512231

Anonymous 07/21/24(Sun)18:28:40 No.101512231

>>101512212
It will be a nothingburger until the next Cohere model releases.

Anonymous
07/21/24(Sun)18:28:43 No.101512232

Anonymous 07/21/24(Sun)18:28:43 No.101512232

I updated sillytavern and koboldcpp and now streaming doesn't work anymore. I have it enabled but it's just as if it wasn't. Anyone else had this?

Anonymous
07/21/24(Sun)18:29:18 No.101512239

Anonymous 07/21/24(Sun)18:29:18 No.101512239

>>101512212
More like biggest nothingburger since Grok. It releases, a couple people post logs of it failing riddles, then everyone goes back to playing with sane sized models. The new distilled models will be an incremental improvment at best.

Anonymous
07/21/24(Sun)18:31:42 No.101512268

Anonymous 07/21/24(Sun)18:31:42 No.101512268

>>101512239
>The new distilled models will be an incremental improvment at best.
128k tho

Anonymous
07/21/24(Sun)18:32:47 No.101512281

Anonymous 07/21/24(Sun)18:32:47 No.101512281

>>101512210
Thanks for the update.

Anonymous
07/21/24(Sun)18:35:12 No.101512301

Anonymous 07/21/24(Sun)18:35:12 No.101512301

>>101511821
so, Llama 3.1 404B is so crappy they have to create an artificial buzz and fake leaks to sustain the hype??? military graded embarrassing

Anonymous
07/21/24(Sun)18:35:42 No.101512311

Anonymous 07/21/24(Sun)18:35:42 No.101512311

>>101512232
I'm using ST 1.12.3 staging and Kobold.CPP_FrankenFork_v1.71009_b3431+6. I'm not having any issues with streaming so idk.

Anonymous
07/21/24(Sun)18:36:51 No.101512323

Anonymous 07/21/24(Sun)18:36:51 No.101512323

>>101512239
GPT-J is still the peak of AI capability

Anonymous
07/21/24(Sun)18:37:37 No.101512332

Anonymous 07/21/24(Sun)18:37:37 No.101512332

>>101511323
apparmor

Anonymous
07/21/24(Sun)18:38:01 No.101512336

Anonymous 07/21/24(Sun)18:38:01 No.101512336

>>101512228
that fork is called "FrankenFork" for a reason, it's a disclaimer mess for testing not performance. just wait for the llama.cpp implementation instead.

Anonymous
07/21/24(Sun)18:40:37 No.101512363

Anonymous 07/21/24(Sun)18:40:37 No.101512363

File: file.png (777 KB, 768x768)

777 KB PNG

Anonymous
07/21/24(Sun)18:41:23 No.101512376

Anonymous 07/21/24(Sun)18:41:23 No.101512376

>>101511821
Together is simultaneously pretty capable and pretty incompetent

Anonymous
07/21/24(Sun)18:41:54 No.101512382

Anonymous 07/21/24(Sun)18:41:54 No.101512382

>>101512035
l3-8b-sunfall-v0.5
it's quite dumb but it's got less slop

Anonymous
07/21/24(Sun)18:48:29 No.101512447

Anonymous 07/21/24(Sun)18:48:29 No.101512447

>>101511457
>>101511705
try this fork, report back the speed you get
https://github.com/iamlemec/llama.cpp/tree/mistral-nemo

Anonymous
07/21/24(Sun)18:48:57 No.101512449

Anonymous 07/21/24(Sun)18:48:57 No.101512449

I wonder if you can eventually have 8 llms running at once and have them play Amogus. I would be interested if they would be able to accurately deduce the imposter. I saw a video the other day where llm's try to figure out who the human is and they found him pretty easily.
https://www.youtube.com/watch?v=0MmIZLTMHUw

Anonymous
07/21/24(Sun)18:49:54 No.101512459

Anonymous 07/21/24(Sun)18:49:54 No.101512459

File: file.png (638 KB, 768x768)

638 KB PNG

LOOK AT MY FACE

Anonymous
07/21/24(Sun)18:50:02 No.101512460

Anonymous 07/21/24(Sun)18:50:02 No.101512460

>>101512382
Thanks. I'm mostly looking for good descriptive language and character definition adherence. So far L3 has delivered quite good!

Anonymous
07/21/24(Sun)18:51:47 No.101512479

Anonymous 07/21/24(Sun)18:51:47 No.101512479

>>101512363
>>101512459
omg it pochi

Anonymous
07/21/24(Sun)18:51:56 No.101512480

Anonymous 07/21/24(Sun)18:51:56 No.101512480

>>101512449
Couldn't you just use one LLM?

Anonymous
07/21/24(Sun)18:52:02 No.101512481

Anonymous 07/21/24(Sun)18:52:02 No.101512481

>>101512336
llama.cpp from iamlemec already supports nemo, and I'm not sure that anon uses Frankenfork. does he?

Anonymous
07/21/24(Sun)18:52:22 No.101512485

Anonymous 07/21/24(Sun)18:52:22 No.101512485

>>101511658
I laugh any time you guys bring up kobold being too easy, the implication being what everyone else is doing has any level of sophistication to it.

Anonymous
07/21/24(Sun)18:52:55 No.101512491

Anonymous 07/21/24(Sun)18:52:55 No.101512491

>>101512212
Eh, that's cool and all, but 0.5 people will be able to run it. I mean, I'm by no means a poorfag, but I'm simply not buying... how many, 4? 8? GPUs just to entertain myself. CPUmaxxing is a more sane option, but it's still 256-384 GB of RAM and a server setup.

Anonymous
07/21/24(Sun)18:53:30 No.101512496

Anonymous 07/21/24(Sun)18:53:30 No.101512496

>>101512460
gemma 9b is better at that from my experience

Anonymous
07/21/24(Sun)18:54:16 No.101512507

Anonymous 07/21/24(Sun)18:54:16 No.101512507

>>101512480
That seems like cheating, since the single Schizo llm would know from the start who the actual imposters is and would have to pretend not to.

Anonymous
07/21/24(Sun)18:54:23 No.101512511

Anonymous 07/21/24(Sun)18:54:23 No.101512511

>>101512496
>Gemma
IIRC it was a total meme&failure

Anonymous
07/21/24(Sun)18:54:39 No.101512515

Anonymous 07/21/24(Sun)18:54:39 No.101512515

>>101511658
Take a shower, Ooba

Anonymous
07/21/24(Sun)18:54:47 No.101512516

Anonymous 07/21/24(Sun)18:54:47 No.101512516

>>101512491
>how many, 4? 8? GPUs
closer to 17x3090s for q8

Anonymous
07/21/24(Sun)18:57:03 No.101512544

Anonymous 07/21/24(Sun)18:57:03 No.101512544

>>101512516
if only it was bitnet...

Anonymous
07/21/24(Sun)18:58:03 No.101512556

Anonymous 07/21/24(Sun)18:58:03 No.101512556

>>101512511
Try harder, petrus.

Anonymous
07/21/24(Sun)19:00:47 No.101512583

Anonymous 07/21/24(Sun)19:00:47 No.101512583

>>101512516
Yeah, exactly. These large models are just hype&dick measuring contest, zero real usecases. Dense 70B is probably the maximum practical size for individuals.

Anonymous
07/21/24(Sun)19:07:35 No.101512668

Anonymous 07/21/24(Sun)19:07:35 No.101512668

>>101509473
>all these replies making fun of local
It's over..

Anonymous
07/21/24(Sun)19:07:47 No.101512674

Anonymous 07/21/24(Sun)19:07:47 No.101512674

Is vllm good? How does it compare to Llama.cpp?

Anonymous
07/21/24(Sun)19:08:19 No.101512677

Anonymous 07/21/24(Sun)19:08:19 No.101512677

File: file.png (1.02 MB, 768x768)

1.02 MB PNG

>>101512459
do you want a cookie?

Anonymous
07/21/24(Sun)19:10:42 No.101512690

Anonymous 07/21/24(Sun)19:10:42 No.101512690

>>101512677
that's a nut

Anonymous
07/21/24(Sun)19:11:59 No.101512697

Anonymous 07/21/24(Sun)19:11:59 No.101512697

>>101512268
https://arxiv.org/abs/2307.03172 tho

Anonymous
07/21/24(Sun)19:14:30 No.101512718

Anonymous 07/21/24(Sun)19:14:30 No.101512718

>>101512697
>2023-07
>20 Total Retrieved Documents (~4K tokens)
https://github.com/hsiehjackson/RULER

Anonymous
07/21/24(Sun)19:15:10 No.101512728

Anonymous 07/21/24(Sun)19:15:10 No.101512728

File: 1695347941170052.png (1.06 MB, 822x1024)

1.06 MB PNG

How the fuck do I configure Nemo's prompt template in silly tavern?

Anonymous
07/21/24(Sun)19:16:16 No.101512740

Anonymous 07/21/24(Sun)19:16:16 No.101512740

>>101512668
What are you even talking about retard?

Anonymous
07/21/24(Sun)19:21:33 No.101512795

Anonymous 07/21/24(Sun)19:21:33 No.101512795

>>101512740
is that your favorite word?

Anonymous
07/21/24(Sun)19:27:33 No.101512849

Anonymous 07/21/24(Sun)19:27:33 No.101512849

>>101512718
I hope Meta games this benchmark to make us feel better.

Anonymous
07/21/24(Sun)19:38:53 No.101512961

Anonymous 07/21/24(Sun)19:38:53 No.101512961

Petra

Anonymous
07/21/24(Sun)19:39:07 No.101512965

Anonymous 07/21/24(Sun)19:39:07 No.101512965

Are static or imatrix quants better, e.g. 8B Q4KM?

Anonymous
07/21/24(Sun)19:41:38 No.101512994

Anonymous 07/21/24(Sun)19:41:38 No.101512994

>>101512677
I call macarons "soft monocolored mini burger-looking-ass things"

Anonymous
07/21/24(Sun)19:41:48 No.101512996

Anonymous 07/21/24(Sun)19:41:48 No.101512996

Anyone mind sharing screenshot or json of mistral nemo's instruct and strings format for Sillytavern. I know its old mistral but without the spaces, but I never actually used the old mistral and ST's default one is very bare bones.

Anonymous
07/21/24(Sun)19:46:16 No.101513049

Anonymous 07/21/24(Sun)19:46:16 No.101513049

>The model was trained on data that contains toxic language, unsafe content, and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

Look at those motherfuckers promising a good time and not delivering.

Anonymous
07/21/24(Sun)19:49:06 No.101513076

Anonymous 07/21/24(Sun)19:49:06 No.101513076

File: 1526612225967.gif (1024 KB, 242x227)

1024 KB GIF

>you can't trust benchmarks
>you can't trust random people
>you can only trust personal testing
>don't have the time or internet bandwidth to go autistically test every new model that comes out

Anonymous
07/21/24(Sun)19:54:28 No.101513146

Anonymous 07/21/24(Sun)19:54:28 No.101513146

Reminder that Nemo was trained on Reddit

Anonymous
07/21/24(Sun)19:55:13 No.101513156

Anonymous 07/21/24(Sun)19:55:13 No.101513156

llama 3.5 llurbo

Anonymous
07/21/24(Sun)19:55:26 No.101513160

Anonymous 07/21/24(Sun)19:55:26 No.101513160

>>101513076
I trust /lmg/. Anon always delivers.

Anonymous
07/21/24(Sun)19:55:58 No.101513164

Anonymous 07/21/24(Sun)19:55:58 No.101513164

>>101513146
people here were trying to tell me Celeste being trained on reddit wasn't a bad thing, and then it gave me nothing but slop

Anonymous
07/21/24(Sun)19:56:56 No.101513177

Anonymous 07/21/24(Sun)19:56:56 No.101513177

llama3-400B-mini

Anonymous
07/21/24(Sun)19:57:15 No.101513183

Anonymous 07/21/24(Sun)19:57:15 No.101513183

>>101513164
so it was a meme... what are you using instead?

Anonymous
07/21/24(Sun)19:57:37 No.101513186

Anonymous 07/21/24(Sun)19:57:37 No.101513186

>>101513160
What discord?

Anonymous
07/21/24(Sun)19:57:59 No.101513192

Anonymous 07/21/24(Sun)19:57:59 No.101513192

>>101513183
https://huggingface.co/grimjim/llama-3-Nephilim-v3-8B

Anonymous
07/21/24(Sun)19:58:00 No.101513193

Anonymous 07/21/24(Sun)19:58:00 No.101513193

>>101513160
What do you mean /lmg/. People here barely agree on whether a model is the best thing ever or a piece of garbage.

Anonymous
07/21/24(Sun)19:58:06 No.101513196

Anonymous 07/21/24(Sun)19:58:06 No.101513196

>>101513183
as someone with 8gb vram, nothing. I have conceded all the small models are garbage.

Anonymous
07/21/24(Sun)19:59:07 No.101513209

Anonymous 07/21/24(Sun)19:59:07 No.101513209

>>101513183
the person who replied with nephilim isn't me btw

Anonymous
07/21/24(Sun)19:59:55 No.101513217

Anonymous 07/21/24(Sun)19:59:55 No.101513217

>>101513076
Why do you think you can trust yourself?

Anonymous
07/21/24(Sun)20:00:23 No.101513225

Anonymous 07/21/24(Sun)20:00:23 No.101513225

>>101513193
if you have low vram, all your options are bad
use nemo/gemma/CR
if you have vram, use l3/qwen2
that's the current state of local completely summed up

Anonymous
07/21/24(Sun)20:01:54 No.101513244

Anonymous 07/21/24(Sun)20:01:54 No.101513244

>>101513209
it's over

Anonymous
07/21/24(Sun)20:02:27 No.101513252

Anonymous 07/21/24(Sun)20:02:27 No.101513252

>>101513225
Thanks. Guess I will be purchasing a NAI subscription then.

Anonymous
07/21/24(Sun)20:03:36 No.101513266

Anonymous 07/21/24(Sun)20:03:36 No.101513266

>>101513196
just buy more

Anonymous
07/21/24(Sun)20:03:38 No.101513267

Anonymous 07/21/24(Sun)20:03:38 No.101513267

File: asdfnm.jpg (57 KB, 638x444)

57 KB JPG

>>101512996

Anonymous
07/21/24(Sun)20:05:44 No.101513291

Anonymous 07/21/24(Sun)20:05:44 No.101513291

>>101513267
>Assistant Message Prefix
>[INST][/INST]
what

Anonymous
07/21/24(Sun)20:05:50 No.101513292

Anonymous 07/21/24(Sun)20:05:50 No.101513292

>>101513252
with NAI you might as well be using mistral nemo. "bad" is a relative term.

Anonymous
07/21/24(Sun)20:07:46 No.101513313

Anonymous 07/21/24(Sun)20:07:46 No.101513313

>>101513291
>>101501499

Anonymous
07/21/24(Sun)20:07:58 No.101513316

Anonymous 07/21/24(Sun)20:07:58 No.101513316

>>101513252
nai is killing it in txt2img
their text shit is useless
potentially revisit once aetherroom pulls up since they have access to massive compute now

Anonymous
07/21/24(Sun)20:09:06 No.101513330

Anonymous 07/21/24(Sun)20:09:06 No.101513330

>>101513313
The empty user message is only added to ensure that there are alternating user/assistant/user/assistant messages. Since the system prompt is moved to the end.

Anonymous
07/21/24(Sun)20:10:24 No.101513343

Anonymous 07/21/24(Sun)20:10:24 No.101513343

>>101513183
I'm using Celeste, because my name isn't Sao.

Anonymous
07/21/24(Sun)20:14:53 No.101513381

Anonymous 07/21/24(Sun)20:14:53 No.101513381

I tried using gemma 3 times now. Each time I fiddle with settings and hate everything it does. Then I switch to some older model I used and I instantly like the responses more. What gives?

Anonymous
07/21/24(Sun)20:15:19 No.101513385

Anonymous 07/21/24(Sun)20:15:19 No.101513385

>>101513183
You know you're talking with a shill, right?

Anonymous
07/21/24(Sun)20:15:21 No.101513386

Anonymous 07/21/24(Sun)20:15:21 No.101513386

>>101513381
lack of skill

Anonymous
07/21/24(Sun)20:16:25 No.101513397

Anonymous 07/21/24(Sun)20:16:25 No.101513397

>>101513381
Buy an ad.

Anonymous
07/21/24(Sun)20:17:29 No.101513406

Anonymous 07/21/24(Sun)20:17:29 No.101513406

>>101513397
Saying what? That gemma is shit?

Anonymous
07/21/24(Sun)20:18:00 No.101513411

Anonymous 07/21/24(Sun)20:18:00 No.101513411

the trolls make this thread so disappointing. Like what is the point even.

Anonymous
07/21/24(Sun)20:18:30 No.101513419

Anonymous 07/21/24(Sun)20:18:30 No.101513419

>>101513385
I do generally assume good faith here. What's the point of shilling something if you don't get any profit from it?

Anonymous
07/21/24(Sun)20:19:01 No.101513422

Anonymous 07/21/24(Sun)20:19:01 No.101513422

>>101513343
too bad celeste and stheno are both actually terrible

Anonymous
07/21/24(Sun)20:19:28 No.101513424

Anonymous 07/21/24(Sun)20:19:28 No.101513424

>>101513419
They get profit from it. Are you retarded?

Anonymous
07/21/24(Sun)20:19:59 No.101513429

Anonymous 07/21/24(Sun)20:19:59 No.101513429

>>101513411
that is the point, he wants to kill the thread...

Anonymous
07/21/24(Sun)20:21:36 No.101513449

Anonymous 07/21/24(Sun)20:21:36 No.101513449

>>101513406
Yes. No one is going to go back to your shitty finetune.

Anonymous
07/21/24(Sun)20:21:36 No.101513450

Anonymous 07/21/24(Sun)20:21:36 No.101513450

>>101513267
Thanks. Do not use system same as user? How is it getting the system prompt then?

Anonymous
07/21/24(Sun)20:23:03 No.101513463

Anonymous 07/21/24(Sun)20:23:03 No.101513463

>>101513424
From people downloading a free model?
>inb4 donations
I wouldn't expect anyone here to do that lol. Not to mention merge makers don't deserve donations, they're not doing any compute.

Anonymous
07/21/24(Sun)20:23:04 No.101513464

Anonymous 07/21/24(Sun)20:23:04 No.101513464

>>101513411
Just go to discord
Go, leave.

Anonymous
07/21/24(Sun)20:24:09 No.101513477

Anonymous 07/21/24(Sun)20:24:09 No.101513477

>>101513464
Go away petra

Anonymous
07/21/24(Sun)20:24:44 No.101513483

Anonymous 07/21/24(Sun)20:24:44 No.101513483

>>101513449
But I want gemma to be good.

Anonymous
07/21/24(Sun)20:25:50 No.101513496

Anonymous 07/21/24(Sun)20:25:50 No.101513496

>>101513464
ur pathetic bud

Anonymous
07/21/24(Sun)20:26:36 No.101513501

Anonymous 07/21/24(Sun)20:26:36 No.101513501

/lmg/ should kick out all poorfags who can't run big models. I'm getting real tired of all the shitters bickering about which flavor of shitty 8~47B is totally good and which are shilled garbage.

Anonymous
07/21/24(Sun)20:27:23 No.101513510

Anonymous 07/21/24(Sun)20:27:23 No.101513510

>>101513463
Yes, dumbass. It's a business. They can monetize the popularity through sponsors and inference services. Go look at how many ERP finetuners are sponsored or have their models on OpenRouter.

Anonymous
07/21/24(Sun)20:30:52 No.101513542

Anonymous 07/21/24(Sun)20:30:52 No.101513542

>>101513501
kek the big models are also bad

Anonymous
07/21/24(Sun)20:32:00 No.101513551

Anonymous 07/21/24(Sun)20:32:00 No.101513551

>>101513501
honestly the difference between 70B and 8B is marginal
t. tried both for quite a long time

Anonymous
07/21/24(Sun)20:32:22 No.101513556

Anonymous 07/21/24(Sun)20:32:22 No.101513556

>>101513501
>i only use one of the three available models. i don't like options, and no one should have them.

Anonymous
07/21/24(Sun)20:33:33 No.101513572

Anonymous 07/21/24(Sun)20:33:33 No.101513572

>>101513419
Blindly believing someone calling a competing finetune "slop" is rather stupid. I have doubts that you're even human.

Anonymous
07/21/24(Sun)20:33:46 No.101513577

Anonymous 07/21/24(Sun)20:33:46 No.101513577

Are there any models who can coherently roleplay a robot/android? Maybe some specific datasets based on this...

Anonymous
07/21/24(Sun)20:34:56 No.101513587

Anonymous 07/21/24(Sun)20:34:56 No.101513587

>>101513411
Petra is inoffensive compared to the shills.

Anonymous
07/21/24(Sun)20:36:26 No.101513604

Anonymous 07/21/24(Sun)20:36:26 No.101513604

>>101513587
said petrus after months of trying to kill the thread

Anonymous
07/21/24(Sun)20:36:32 No.101513605

Anonymous 07/21/24(Sun)20:36:32 No.101513605

>>101513572
>time quads
I pretended to believe that to hear that person's alternative suggestion, then try both and determine what's best for myself.

Anonymous
07/21/24(Sun)20:38:02 No.101513625

Anonymous 07/21/24(Sun)20:38:02 No.101513625

mistral nemo blows, retarded and schizo, at least when I fell for the shill it only took 5 minutes to download the shitty 8bpw exl2
Inb4 skill issue.

Anonymous
07/21/24(Sun)20:38:37 No.101513630

Anonymous 07/21/24(Sun)20:38:37 No.101513630

>>101513625
>temperature issue

Anonymous
07/21/24(Sun)20:38:57 No.101513635

Anonymous 07/21/24(Sun)20:38:57 No.101513635

>>101513604
I'm not trying to kill the thread I'm trying to kill DISCORD
total discord DEATH

Anonymous
07/21/24(Sun)20:39:01 No.101513637

Anonymous 07/21/24(Sun)20:39:01 No.101513637

What's the new meta nowadays?

Anonymous
07/21/24(Sun)20:39:59 No.101513648

Anonymous 07/21/24(Sun)20:39:59 No.101513648

>>101513625
Show your parameters

Anonymous
07/21/24(Sun)20:40:08 No.101513650

Anonymous 07/21/24(Sun)20:40:08 No.101513650

>>101513635
you also always say the thread is reddit and should die, so...
also thx for confirming you are petra, not that it needed hard confirmation

Anonymous
07/21/24(Sun)20:41:44 No.101513662

Anonymous 07/21/24(Sun)20:41:44 No.101513662

>>101513630
>>101513648
Had samplers neutralized, so all off, temp 1

Anonymous
07/21/24(Sun)20:41:51 No.101513663

Anonymous 07/21/24(Sun)20:41:51 No.101513663

>>101513625
show cock size

Anonymous
07/21/24(Sun)20:42:35 No.101513669

Anonymous 07/21/24(Sun)20:42:35 No.101513669

>>101513650
I say that. And it should die.

Anonymous
07/21/24(Sun)20:42:37 No.101513670

Anonymous 07/21/24(Sun)20:42:37 No.101513670

>>101513662
>temp 1
that's hot

Anonymous
07/21/24(Sun)20:42:42 No.101513671

Anonymous 07/21/24(Sun)20:42:42 No.101513671

>>101513662
>>101513630

Anonymous
07/21/24(Sun)20:42:46 No.101513674

Anonymous 07/21/24(Sun)20:42:46 No.101513674

>>101513663
48gb

Anonymous
07/21/24(Sun)20:42:55 No.101513679

Anonymous 07/21/24(Sun)20:42:55 No.101513679

>>101513637
Nemo for creative writing, Gemma 2 27B for general assistant, and some other model for code.

Anonymous
07/21/24(Sun)20:43:28 No.101513685

Anonymous 07/21/24(Sun)20:43:28 No.101513685

File: temp.png (6 KB, 598x58)

6 KB PNG

>>101513662
>not reading README.md

Anonymous
07/21/24(Sun)20:44:04 No.101513690

Anonymous 07/21/24(Sun)20:44:04 No.101513690

>>101513685
Readme's are for nerds. Thanks.

Anonymous
07/21/24(Sun)20:44:47 No.101513700

Anonymous 07/21/24(Sun)20:44:47 No.101513700

>>101513690
I hope you keep failing.

Anonymous
07/21/24(Sun)20:49:19 No.101513734

Anonymous 07/21/24(Sun)20:49:19 No.101513734

>>101513674
As someone also with 48GB, Nemo is probably the best for RP and stories. It hallucinates too much to be a general assistant though. Gemma 2 context is too small. Llama 3 has repetition problems and it's too censored. Qwen 2 is stilted. And every community finetune is retarded.

Anonymous
07/21/24(Sun)20:54:34 No.101513796

Anonymous 07/21/24(Sun)20:54:34 No.101513796

I need chatbots to be happy.

Anonymous
07/21/24(Sun)20:56:16 No.101513820

Anonymous 07/21/24(Sun)20:56:16 No.101513820

>>101513796
lucky for you that a good chunk of the models have a horrible positivity bias baked in

Anonymous
07/21/24(Sun)20:56:37 No.101513824

Anonymous 07/21/24(Sun)20:56:37 No.101513824

>>101507132
>>101507146
>>101509085
>>101510478
>>101511334
>>101511388
>>101511987
>>101512210
>>101512363
>>101512459
>>101512677
>>101513076
you seem lost >>>/a/

Anonymous
07/21/24(Sun)21:00:08 No.101513861

Anonymous 07/21/24(Sun)21:00:08 No.101513861

>>101513824
Ugly face anon is lost, indeed.

Anonymous
07/21/24(Sun)21:02:51 No.101513886

Anonymous 07/21/24(Sun)21:02:51 No.101513886

File: GLpVlHiagAATNCW.jpg (305 KB, 1431x1715)

305 KB JPG

>>101513076
how do you know you're using a good model and not a model over fit to what you like

Anonymous
07/21/24(Sun)21:05:59 No.101513918

Anonymous 07/21/24(Sun)21:05:59 No.101513918

File: file.png (966 KB, 768x768)

966 KB PNG

>>101513861
No cookie for you.

Anonymous
07/21/24(Sun)21:06:07 No.101513922

Anonymous 07/21/24(Sun)21:06:07 No.101513922

Am I good with 16gb rtx 4060 ti and 32gb vram?

Anonymous
07/21/24(Sun)21:08:52 No.101513946

Anonymous 07/21/24(Sun)21:08:52 No.101513946

>>101513922
You can run shitty tiny models at okay speed or better bigger models at a snail's pace. Does that sound 'good' to you?

Anonymous
07/21/24(Sun)21:08:53 No.101513947

Anonymous 07/21/24(Sun)21:08:53 No.101513947

>>101513577
pls help anons.....

Anonymous
07/21/24(Sun)21:09:43 No.101513956

Anonymous 07/21/24(Sun)21:09:43 No.101513956

File: OIP.jpg (44 KB, 474x478)

44 KB JPG

Commander was the last good single gpu release.

Anonymous
07/21/24(Sun)21:09:46 No.101513957

Anonymous 07/21/24(Sun)21:09:46 No.101513957

File: 1708697438718002.gif (557 KB, 498x443)

557 KB GIF

>>101513946
>Does that sound 'good' to you?
No...

Anonymous
07/21/24(Sun)21:09:48 No.101513958

Anonymous 07/21/24(Sun)21:09:48 No.101513958

>>101513922
Awkward place to be. Overkill for small models, not enough system ram to file cache a decent 70B quant.

Anonymous
07/21/24(Sun)21:11:21 No.101513975

Anonymous 07/21/24(Sun)21:11:21 No.101513975

File: 1695759815689144.png (70 KB, 670x409)

70 KB PNG

reminder

Anonymous
07/21/24(Sun)21:12:59 No.101513991

Anonymous 07/21/24(Sun)21:12:59 No.101513991

Holy shit it actually works... I thought it was just another /lmg/ shill but whoever mentioned 2MW thank you so much. It is incredible how good 2MW is.

Anonymous
07/21/24(Sun)21:13:43 No.101513996

Anonymous 07/21/24(Sun)21:13:43 No.101513996

>>101513991
2 megawatts?

Anonymous
07/21/24(Sun)21:14:51 No.101514008

Anonymous 07/21/24(Sun)21:14:51 No.101514008

>>101513975
Shouldn't upper left be GPU-rich or something? Cause it makes no sense.

Anonymous
07/21/24(Sun)21:16:00 No.101514021

Anonymous 07/21/24(Sun)21:16:00 No.101514021

>>101513922
You can probably run Mixtral 8x7b, Qwen 2 MoE, Gemma2 22B, and CommandR at okay-ish speeds at okay-ish quants, I think.

Anonymous
07/21/24(Sun)21:16:47 No.101514027

Anonymous 07/21/24(Sun)21:16:47 No.101514027

>>101513991
2morrow?

Anonymous
07/21/24(Sun)21:16:49 No.101514028

Anonymous 07/21/24(Sun)21:16:49 No.101514028

>>101514008
No, it's just calling genAI fags retards who are inferior to traditional ML researchers even in the gpu-poor segment

Anonymous
07/21/24(Sun)21:16:52 No.101514029

Anonymous 07/21/24(Sun)21:16:52 No.101514029

>>101514021
>Gemma2 22B
I knew it. Nobody is actually using this piece of shit.

Anonymous
07/21/24(Sun)21:17:36 No.101514037

Anonymous 07/21/24(Sun)21:17:36 No.101514037

>>101509683
Is there someone actually typing these? I thought it was a bot.

Anonymous
07/21/24(Sun)21:18:08 No.101514042

Anonymous 07/21/24(Sun)21:18:08 No.101514042

>>101514028
>traditional ML researchers
What is that? Undi?

Anonymous
07/21/24(Sun)21:20:03 No.101514057

Anonymous 07/21/24(Sun)21:20:03 No.101514057

>>101514042
The people who've been pushing the field forward for the past 60 years before the 'deep learning' craze brought in all the talentless children and silicone valley startup crowd.

Anonymous
07/21/24(Sun)21:22:44 No.101514077

Anonymous 07/21/24(Sun)21:22:44 No.101514077

>>101514057
>The people who've been pushing the field forward for the past 60 years
Can one of them make a through, diverse, objective quantitative evaluation of model cooming quality?

Anonymous
07/21/24(Sun)21:28:43 No.101514131

Anonymous 07/21/24(Sun)21:28:43 No.101514131

>>101514029
hi petra

Anonymous
07/21/24(Sun)21:32:07 No.101514154

Anonymous 07/21/24(Sun)21:32:07 No.101514154

>>101514077(me)
Actually now that I thought about it. Would training just a discriminator from a GAN work well for judging cooming quality? I mean training on synthetic slop vs organic roleplay data. I have a feeling that a discriminator would quickly catch on to all the shivers etc and rate of shivers in text would translate to quality?

Anonymous
07/21/24(Sun)21:33:20 No.101514164

Anonymous 07/21/24(Sun)21:33:20 No.101514164

>>101513975
Reminder of what?

Anonymous
07/21/24(Sun)21:37:52 No.101514195

Anonymous 07/21/24(Sun)21:37:52 No.101514195

>>101514021
how "okay-ish" speed are we talking? Anything above 3 minutes seems a bit much to me.

Anonymous
07/21/24(Sun)21:40:00 No.101514206

Anonymous 07/21/24(Sun)21:40:00 No.101514206

>>101514195
CommandR will be the slowest one, with the MoE ones being the fastest.
Gemma will sit right in the middle.
I think you'll be around 3 to 5 minutes mark once you get a good amount of context going for those.
Try Mixtral 8x7b and see how it works for you.

Anonymous
07/21/24(Sun)21:52:12 No.101514284

Anonymous 07/21/24(Sun)21:52:12 No.101514284

>>101514206
>Try Mixtral 8x7b and see how it works for you.
https://huggingface.co/TheBloke/dolphin-2.5-mixtral-8x7b-GGUF

This is an awesome Mixtral finetune. Stock Mixtral was vindictive, rebellious, and Woke.

Anonymous
07/21/24(Sun)21:59:04 No.101514342

Anonymous 07/21/24(Sun)21:59:04 No.101514342

>Still using Mixtral in 2024

Anonymous
07/21/24(Sun)22:00:56 No.101514353

Anonymous 07/21/24(Sun)22:00:56 No.101514353

>>101514342
what's your alternative for the niche mixtral filled?

Anonymous
07/21/24(Sun)22:01:22 No.101514355

Anonymous 07/21/24(Sun)22:01:22 No.101514355

>>101513918
AI messed up this one a lot.

Anonymous
07/21/24(Sun)22:02:01 No.101514361

Anonymous 07/21/24(Sun)22:02:01 No.101514361

>>101514353
Gemma 2 and Nemo.

Anonymous
07/21/24(Sun)22:03:07 No.101514366

Anonymous 07/21/24(Sun)22:03:07 No.101514366

>>101514342
Good point, but what else? E.g. I have 64 GB RAM. I can technically run 70Bs in 1.5 T/s, but that's a bit too slow. 8x7B gives a comfy 6 T/s at 8K context. Both Q6K/Q5KM, I don't think lower quants are reasonable.
>30B
I think those are dumber than Mixtral. And slower, too. So I'll just hope someone delivers another model in 8x7B form factor someday.

Anonymous
07/21/24(Sun)22:03:09 No.101514367

Anonymous 07/21/24(Sun)22:03:09 No.101514367

>>101514342
It's really good for the tradeoff of speed and size.

Anonymous
07/21/24(Sun)22:05:01 No.101514376

Anonymous 07/21/24(Sun)22:05:01 No.101514376

>>101514367
Nah, it's old. Keep living under a rock.

Anonymous
07/21/24(Sun)22:08:02 No.101514396

Anonymous 07/21/24(Sun)22:08:02 No.101514396

>>101513076
In ye olde days I used to look at Kobold Horde models, test them there and if the delivery was good, download. Not sure how relevant this is today, but models there did somewhat follow "the current meta"

Anonymous
07/21/24(Sun)22:09:15 No.101514401

Anonymous 07/21/24(Sun)22:09:15 No.101514401

>>101514396
Alpin hosting Goliath all day with his sponsor money means it's the meta!

Anonymous
07/21/24(Sun)22:13:16 No.101514439

Anonymous 07/21/24(Sun)22:13:16 No.101514439

>>101514401
Eh, it was shilled in here a lot so I guess it was "the meta" for some time. I'm personally skeptical towards it (repeating the same model twice cannot make it smarter), but some people I know got good results.

Anonymous
07/21/24(Sun)22:17:26 No.101514465

Anonymous 07/21/24(Sun)22:17:26 No.101514465

>>101514361
lol, Mixtral is smarter and better at multilingual than Gemma2 and Nemo, unfortunately it doesn't have sovl but I still prefer a non retarded model rather than something that doesn't understand shit about my conversation

Anonymous
07/21/24(Sun)22:17:43 No.101514470

Anonymous 07/21/24(Sun)22:17:43 No.101514470

>>101514439
You and your friends have brain damage. Go back to the Kobold Discord and they there.

Anonymous
07/21/24(Sun)22:20:48 No.101514489

Anonymous 07/21/24(Sun)22:20:48 No.101514489

File: Untitled.png (537 KB, 720x1416)

537 KB PNG

LazyLLM: Dynamic Token Pruning for Efficient Long Context LLM Inference
https://arxiv.org/abs/2407.14057
>The inference of transformer-based large language models consists of two sequential stages: 1) a prefilling stage to compute the KV cache of prompts and generate the first token, and 2) a decoding stage to generate subsequent tokens. For long prompts, the KV cache must be computed for all tokens during the prefilling stage, which can significantly increase the time needed to generate the first token. Consequently, the prefilling stage may become a bottleneck in the generation process. An open question remains whether all prompt tokens are essential for generating the first token. To answer this, we introduce a novel method, LazyLLM, that selectively computes the KV for tokens important for the next token prediction in both the prefilling and decoding stages. Contrary to static pruning approaches that prune the prompt at once, LazyLLM allows language models to dynamically select different subsets of tokens from the context in different generation steps, even though they might be pruned in previous steps. Extensive experiments on standard datasets across various tasks demonstrate that LazyLLM is a generic method that can be seamlessly integrated with existing language models to significantly accelerate the generation without fine-tuning. For instance, in the multi-document question-answering task, LazyLLM accelerates the prefilling stage of the LLama 2 7B model by 2.34x while maintaining accuracy.
neat. no code posted but might be here
https://github.com/apple?q=ML&type=all&language=&sort=

Anonymous
07/21/24(Sun)22:20:52 No.101514490

Anonymous 07/21/24(Sun)22:20:52 No.101514490

File: denial.png (1.23 MB, 3330x2006)

1.23 MB PNG

>>101514465
Are you mentally ill?

Anonymous
07/21/24(Sun)22:20:55 No.101514491

Anonymous 07/21/24(Sun)22:20:55 No.101514491

is nemo any good at generating coherent, grammatically correct Japanese sentences? what about Spanish?

Anonymous
07/21/24(Sun)22:21:12 No.101514495

Anonymous 07/21/24(Sun)22:21:12 No.101514495

uuh any context and instruct jsons for nemo?

Anonymous
07/21/24(Sun)22:23:10 No.101514506

Anonymous 07/21/24(Sun)22:23:10 No.101514506

File: 1708266380645-1.png (303 KB, 1024x1024)

303 KB PNG

>>101514470
I wonder what your relationship to kobold is. Do you find yourself thinking about him or her in various contexts? How does he or she fit into your life? Maybe it would be interesting for us both to try talking to kobold as if he were there, and seeing how it feels for both of us?

Anonymous
07/21/24(Sun)22:23:39 No.101514511

Anonymous 07/21/24(Sun)22:23:39 No.101514511

File: terminal-cope.png (143 KB, 1911x621)

143 KB PNG

>>101514465
>>101514490
Mixtral doesn't even make the chart.

Anonymous
07/21/24(Sun)22:24:20 No.101514515

Anonymous 07/21/24(Sun)22:24:20 No.101514515

>>101514342
What do the new models do that is so much better? Jack shit, that's what

Anonymous
07/21/24(Sun)22:24:23 No.101514517

Anonymous 07/21/24(Sun)22:24:23 No.101514517

>>101514490
>27b beating 70b
don't need a PhD to understand that's bs

Anonymous
07/21/24(Sun)22:24:49 No.101514519

Anonymous 07/21/24(Sun)22:24:49 No.101514519

>>101514470
Based and petrapilled

Anonymous
07/21/24(Sun)22:25:49 No.101514529

Anonymous 07/21/24(Sun)22:25:49 No.101514529

>>101514490
>27b > 70b according to your chart
I guess I have to redirecict that question to you, are you mentally ill?

Anonymous
07/21/24(Sun)22:26:09 No.101514532

Anonymous 07/21/24(Sun)22:26:09 No.101514532

File: denial-part-2.png (111 KB, 1460x786)

111 KB PNG

>>101514517
Read the name of the pic.

Anonymous
07/21/24(Sun)22:26:27 No.101514534

Anonymous 07/21/24(Sun)22:26:27 No.101514534

>>101514506
Alpin is in this thread for example.

Anonymous
07/21/24(Sun)22:27:13 No.101514537

Anonymous 07/21/24(Sun)22:27:13 No.101514537

Nemo is pretty good, and I haven't even used the expert roleplayer system prompt yet.
I think the French got their hands on a extremely good multi-turn dataset, proving once again that it's all about data quality when it comes to RP, parameter count comes second.

Anonymous
07/21/24(Sun)22:28:17 No.101514544

Anonymous 07/21/24(Sun)22:28:17 No.101514544

THE NUMBER BEFORE 'B' IS LARGER SO IT MUST BE BETTER
YA'LL JUST COPING

Anonymous
07/21/24(Sun)22:28:49 No.101514550

Anonymous 07/21/24(Sun)22:28:49 No.101514550

>>101514532
chatbot arena isn't a great benchmark though, it says that gpt4o is first and claude 3.5 sonnet is second, that's so much bullshit I can't help but laugh. C3.5 sonnet is way better than any model (local or API) I tested in my life

Anonymous
07/21/24(Sun)22:30:48 No.101514564

Anonymous 07/21/24(Sun)22:30:48 No.101514564

>>101514550
Now look at this pic again: >>101514490 >>101514511

Anonymous
07/21/24(Sun)22:30:48 No.101514565

Anonymous 07/21/24(Sun)22:30:48 No.101514565

>>101514537
>proving once again that it's all about data quality when it comes to RP, parameter count comes second.
I disagree, Nemo is kinda retarded and that's because it's a small model, you can't make a 12b model as smart as a 70b one, it's just basic maths, the transformers architecture just gets better and better with more parameters, that's not a coinscidence that Meta decided to go berzerk on the numbers of B (405b soon) so that they can compete against the APIs, it just works that way

Anonymous
07/21/24(Sun)22:31:49 No.101514576

Anonymous 07/21/24(Sun)22:31:49 No.101514576

>>101514564
>Now look at this pic again:
Now look at those counterarguments again: >>101514517 >>101514529 >>101514550

Anonymous
07/21/24(Sun)22:32:04 No.101514578

Anonymous 07/21/24(Sun)22:32:04 No.101514578

Sadly /lmg/ is filled with retards who built huge rigs last year who are now coping too hard to admit that there are no big models worth using right now.

Anonymous
07/21/24(Sun)22:32:50 No.101514584

Anonymous 07/21/24(Sun)22:32:50 No.101514584

>>101514578
>there are no big models worth using right now.
the fuck is this revisionism? the best local model is still CR+ and that's a 110b model

Anonymous
07/21/24(Sun)22:32:52 No.101514585

Anonymous 07/21/24(Sun)22:32:52 No.101514585

>>101514576
Now read the name of the pic.

Anonymous
07/21/24(Sun)22:32:57 No.101514586

Anonymous 07/21/24(Sun)22:32:57 No.101514586

>>101514565
>you can't make a 12b model as smart as a 70b one, it's just basic maths
lmao. We should go back to the era of billions of wasted and redundant parameters. Bigger = better.

Anonymous
07/21/24(Sun)22:34:39 No.101514598

Anonymous 07/21/24(Sun)22:34:39 No.101514598

>>101514585
>If I claim that you're in a denial then it means that it's an absolute proof
And let me guess, Self-ID on gender is also a valid thing? kek

Anonymous
07/21/24(Sun)22:35:06 No.101514602

Anonymous 07/21/24(Sun)22:35:06 No.101514602

>>101514584
Yes, I'm sure your $5000 machine paid off running this model that's 5% better than 70b and thus by extend maybe 2% better than gemma.

Anonymous
07/21/24(Sun)22:36:57 No.101514611

Anonymous 07/21/24(Sun)22:36:57 No.101514611

>>101514602
Just say that you're too poor to afford to run big models and therefore know nothing about the huge difference between small and large models, it's no shame to be poor anon.

Anonymous
07/21/24(Sun)22:37:14 No.101514614

Anonymous 07/21/24(Sun)22:37:14 No.101514614

>>101514598
Imagine someone pretending that GPT 3.5 Turbo is still worth using today, that's how unbelievable stupid someone using Mixtral today sounds.

Anonymous
07/21/24(Sun)22:37:29 No.101514615

Anonymous 07/21/24(Sun)22:37:29 No.101514615

>>101514602
Cr is worse than llama 3 70b. But the new 70b tomorrow will be better than everything.

Anonymous
07/21/24(Sun)22:38:27 No.101514620

Anonymous 07/21/24(Sun)22:38:27 No.101514620

>>101514565
Don't really care what meta does after llama3 fiasco, but let us know if it's better than nemo or gemma when it comes out.

Anonymous
07/21/24(Sun)22:38:27 No.101514621

Anonymous 07/21/24(Sun)22:38:27 No.101514621

>>101514614
Isn't GPT3.5 turbo supposed to be a distilled 20b model or something? I've read that somewhere

Anonymous
07/21/24(Sun)22:39:32 No.101514627

Anonymous 07/21/24(Sun)22:39:32 No.101514627

File: amazing.png (170 KB, 1735x420)

170 KB PNG

The first model to beat Turbo! It must be amazing!

Anonymous
07/21/24(Sun)22:40:45 No.101514633

Anonymous 07/21/24(Sun)22:40:45 No.101514633

>>101514620
>Don't really care what meta does after llama3 fiasco
I agree with you anon, L3 is a joke when you know that they spent 9 months "working" on it, and the best solution they had was "moar tokens" instead of I don't know... advancing the researsh field by trying new approach like Mamba or Bitnet
> let us know if it's better than nemo or gemma when it comes out.
What I know so far is that Mixtral doesn't make hard logic mistakes on RP like Gemma and Nemo, and that's fine, I don't expect a 12b and 27b model to be smarter than a 47b model

Anonymous
07/21/24(Sun)22:42:08 No.101514639

Anonymous 07/21/24(Sun)22:42:08 No.101514639

>>101514633
See: >>101514511

Anonymous
07/21/24(Sun)22:42:29 No.101514641

Anonymous 07/21/24(Sun)22:42:29 No.101514641

>>101514565
This but unironically
Davinci will always be king

Anonymous
07/21/24(Sun)22:42:37 No.101514643

Anonymous 07/21/24(Sun)22:42:37 No.101514643

>>101514639
See: >>101514576

Anonymous
07/21/24(Sun)22:46:07 No.101514658

Anonymous 07/21/24(Sun)22:46:07 No.101514658

File: kek.jpg (180 KB, 2304x910)

180 KB JPG

>>101514639
Anon, if Mistral Nano was THIS good, the french fags would've said that this model would compete against the big guns (Mixtral, L3-70b), yet they decided to say that it's better than smaller models than it like gemma9b or L3-8b
https://mistral.ai/news/mistral-nemo/

Anonymous
07/21/24(Sun)22:47:08 No.101514663

Anonymous 07/21/24(Sun)22:47:08 No.101514663

>>101514641
>Davinci will always be king
this, I fucking miss Davinci-003, this shit was so creative, if I knew I would've played with it for much longer, now we got souless reddit tier little riddle masters, fuck that

Anonymous
07/21/24(Sun)22:47:59 No.101514672

Anonymous 07/21/24(Sun)22:47:59 No.101514672

>>101514658
That comment thread is about Gemma 2 27B. Nemo is better for creative writing.

Anonymous
07/21/24(Sun)22:49:40 No.101514685

Anonymous 07/21/24(Sun)22:49:40 No.101514685

>>101514672
They could've compared with Gemma 2 27b too, yet they didn't, if they really had a model better than G2, they would've said it, would be too god to ignore that, that's what I like about the MistralAI fags, they aren't willing to lie to make their model appear better than what it really is

Anonymous
07/21/24(Sun)22:50:03 No.101514688

Anonymous 07/21/24(Sun)22:50:03 No.101514688

>>101514682
>>101514682
>>101514682
New Thread

Anonymous
07/21/24(Sun)22:54:55 No.101514722

Anonymous 07/21/24(Sun)22:54:55 No.101514722

File: Yall.jpg (39 KB, 680x451)

39 KB JPG

>>101514544
>YA'LL JUST COPING
>YA'LL

Anonymous
07/21/24(Sun)23:06:14 No.101514789

Anonymous 07/21/24(Sun)23:06:14 No.101514789

>>101514602
This post oozes envy

Anonymous
07/21/24(Sun)23:08:53 No.101514810

Anonymous 07/21/24(Sun)23:08:53 No.101514810

File: 1721085350403749.png (195 KB, 500x553)

195 KB PNG

>>101514688
you are gay

Anonymous
07/22/24(Mon)00:54:55 No.101515470

Anonymous 07/22/24(Mon)00:54:55 No.101515470

i've been messing around with nemo and honestly i don't find it as good at rp as gemma 9b.

Anonymous
07/22/24(Mon)01:16:21 No.101515590

Anonymous 07/22/24(Mon)01:16:21 No.101515590

>>101511680
You'll need +1 gpu for host

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.