/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/30/24(Tue)18:04:12 No.101643089

File: 1707201855037480.png (2.05 MB, 1792x1024)

2.05 MB PNG

/lmg/ - Local Models General Anonymous 07/30/24(Tue)18:04:12 No.101643089 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101636887 & >>101628398

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/30/24(Tue)18:04:33 No.101643094

Anonymous 07/30/24(Tue)18:04:33 No.101643094

File: c0028679_4d15dcfa60fc7.jpg (45 KB, 400x800)

45 KB JPG

►Recent Highlights from the Previous Thread: >>101636887

--Paper: Meta-Rewarding language models paper sparks discussion on superintelligence and censorship: >>101638118 >>101638197 >>101640693
--Speculative token decoding discussion, benefits and challenges: >>101637618 >>101637764 >>101637766 >>101637699 >>101637785 >>101637855 >>101637909 >>101638033 >>101638191 >>101638004 >>101638070 >>101638131 >>101638151 >>101638181 >>101638262 >>101638304 >>101640593 >>101640704 >>101640769 >>101641008 >>101641053 >>101641163
--Quantized KV cache and RPC issues with CUDA: >>101637007 >>101637166 >>101637788 >>101638420 >>101638565
--Optimizing MoE models with GPU offloading: >>101637198 >>101637597 >>101637655
--Nemo 12B model performance with 16gb vram and context size limitations: >>101641028 >>101641077 >>101641188 >>101641235 >>101641318 >>101641411 >>101641433 >>101641249
--Local alternatives to character.ai and model benchmarking discussion: >>101640281 >>101640417 >>101640465 >>101640502 >>101640537 >>101640592 >>101640468
--LLMTree and ST allow branching conversations: >>101642024 >>101642041
--Anon rants about Largestral's repetitive writing style, others discuss workarounds and limitations of emulating authors' styles: >>101639479 >>101639536 >>101639624 >>101639682 >>101640309 >>101640761 >>101640789 >>101640839 >>101640913 >>101640806
--SDXL Lightning model for text2img2vid and potential use cases: >>101638766 >>101638823 >>101638875
--Hospitals using AI tools and data security concerns: >>101637309 >>101637432 >>101637622 >>101637627 >>101638445
--Chatbot arena update: >>101642285
--Looking for monitoring solution for OpenAI API: >>101637457 >>101637540 >>101637626 >>101641833
--Miku (free space): >>101637264 >>101637453 >>101637772 >>101637940 >>101638162 >>101638280 >>101640481 >>101641758 >>101642120 >>101642736 >>101642752 >>101642754 >>101642831 >>101642918 >>101643012 >>101643032 >>101643065

►Recent Highlight Posts from the Previous Thread: >>101636892

Anonymous
07/30/24(Tue)18:06:37 No.101643128

Anonymous 07/30/24(Tue)18:06:37 No.101643128

>>101643089
>6 fingers on the left hand
its over

Anonymous
07/30/24(Tue)18:09:10 No.101643160

Anonymous 07/30/24(Tue)18:09:10 No.101643160

Could somebody share some cards with complex scenarios, please?

Anonymous
07/30/24(Tue)18:09:21 No.101643162

Anonymous 07/30/24(Tue)18:09:21 No.101643162

>>101643089
>not looking at the camera
>6 fingers
>LNG
>clover of 5 leafs
mikufags have low standards

Anonymous
07/30/24(Tue)18:09:50 No.101643167

Anonymous 07/30/24(Tue)18:09:50 No.101643167

>>101643162
Yeah, >>101642831 was better idk why OP didn't pick it

Anonymous
07/30/24(Tue)18:10:20 No.101643174

Anonymous 07/30/24(Tue)18:10:20 No.101643174

>>101643160
Pokemon Breeding Wall and Tokiko come to mind.
Also, Dark and Darker.
There was also that one fallout rpg.

Anonymous
07/30/24(Tue)18:10:36 No.101643176

Anonymous 07/30/24(Tue)18:10:36 No.101643176

File: _3940ddc8-aaa1-4e51-9b2a-(...).jpg (121 KB, 1024x1024)

121 KB JPG

Oh, are we allowed to post dalle miku gens again
I haven't made one in a long time though, here's an oldie I never posted, from when we were doing this prompt.

Anonymous
07/30/24(Tue)18:11:08 No.101643183

Anonymous 07/30/24(Tue)18:11:08 No.101643183

>>101643176
>Oh, are we allowed to post dalle miku gens again
Yes, but try to refrain from vivid style slop and do more creative gens

Anonymous
07/30/24(Tue)18:12:15 No.101643203

Anonymous 07/30/24(Tue)18:12:15 No.101643203

>>101643176
you were never disallowed to do so

Anonymous
07/30/24(Tue)18:17:09 No.101643261

Anonymous 07/30/24(Tue)18:17:09 No.101643261

File: 582.jpg (154 KB, 1600x1064)

154 KB JPG

What current machine AI handles a solid conversational flow for RP?

Every RP i've tried so far has the same sterile robotic feel to it that I dislike (waterboard you with questions that while, seems kinda normal, the frequency just instantly reminds you you're talking to a bot)

Really pissing me off, been trying Gemma 27B trying to fine tune it and it's super bad for this. Command R is a little better, Mistral Nemo is also pretty bad.

For reference, I have a 4090, so not running any x3 3090 setups for the actual nutty models

Anonymous
07/30/24(Tue)18:17:55 No.101643269

Anonymous 07/30/24(Tue)18:17:55 No.101643269

>>101643261
genuinely sounds like a prompt issue if its the same across different models

Anonymous
07/30/24(Tue)18:19:15 No.101643283

Anonymous 07/30/24(Tue)18:19:15 No.101643283

>>101643176
Pet the Miku

Anonymous
07/30/24(Tue)18:20:14 No.101643293

Anonymous 07/30/24(Tue)18:20:14 No.101643293

File: _7a604ae3-0620-4620-83b1-(...).jpg (104 KB, 1024x1024)

104 KB JPG

>>101643183
does this count

Anonymous
07/30/24(Tue)18:20:32 No.101643297

Anonymous 07/30/24(Tue)18:20:32 No.101643297

>>101643293
uhhhhh

Anonymous
07/30/24(Tue)18:24:19 No.101643347

Anonymous 07/30/24(Tue)18:24:19 No.101643347

File: file.png (349 KB, 1170x1560)

349 KB PNG

>>101643094
>No (you)s

Anonymous
07/30/24(Tue)18:25:27 No.101643365

Anonymous 07/30/24(Tue)18:25:27 No.101643365

>>101643269
Is there some sort of guide just to cover the basics? I always make the same basic ass card because there's no point trying any complex ones if even a basic one that's one on one RP gives me such weird results.

Anonymous
07/30/24(Tue)18:27:50 No.101643400

Anonymous 07/30/24(Tue)18:27:50 No.101643400

Where do people set their DRY settings? Is it bad that I just set them in the Silly T itself? Whenever I see settings posted, the UI looks way different to ST making me think there's something on kobold I should configure too

Anonymous
07/30/24(Tue)18:36:36 No.101643525

Anonymous 07/30/24(Tue)18:36:36 No.101643525

File: Screenshot from 2024-07-3(...).png (210 KB, 1390x648)

210 KB PNG

>Can remove 20-50% of layers without noticeable drop in quality
https://arxiv.org/abs/2403.17887
I saw this a while ago and I'm surprised there hasn't been more on it. Did it get deboonked? Has anyone tried this with L405 or Misty Large yet?

Anonymous
07/30/24(Tue)18:40:50 No.101643585

Anonymous 07/30/24(Tue)18:40:50 No.101643585

I'm doing a little bit of experimentation where I run base nemo on koboldcpp and opus on ST, and paste opus' reply to kobold, then have nemo respond back on my behalf.

I have to say that nemo complements opus well, and it's somehow slightly better than opus at ERP.

Too bad it's not that good on its own though.

Anonymous
07/30/24(Tue)18:42:49 No.101643617

Anonymous 07/30/24(Tue)18:42:49 No.101643617

>>101643525
Go play with llama3 42B or whatever the number was. Let us know.

Anonymous
07/30/24(Tue)18:44:54 No.101643640

Anonymous 07/30/24(Tue)18:44:54 No.101643640

>>101643389
>i think your first focus should be making an artificial hippocampus.
Stuff like that doesn't help or work, what works is using more compute

Anonymous
07/30/24(Tue)18:45:16 No.101643646

Anonymous 07/30/24(Tue)18:45:16 No.101643646

>>101643400
What you're talking about is probably Kobold Lite. It is just a frontend, just like Silly tavern

Anonymous
07/30/24(Tue)18:45:53 No.101643652

Anonymous 07/30/24(Tue)18:45:53 No.101643652

Did anyone try vLLM's CPU off-loading? I tried it for a bit with Mistral Large AWQ on 2x3090. The prompt processing was somewhat decent at, I think, 200-400 T/s, but generation speed was like 0.5 T/s...

Anonymous
07/30/24(Tue)18:47:52 No.101643683

Anonymous 07/30/24(Tue)18:47:52 No.101643683

>>101643525
I think it's likelier that there is in fact a drop in quality but current measurements methods aren't very good at measuring them. Then again I didn't read the paper, so I wouldn't know how much they've done to prove their claims.

Anonymous
07/30/24(Tue)18:53:40 No.101643742

Anonymous 07/30/24(Tue)18:53:40 No.101643742

Are the good parts of nemo there because it is overfitted to hell? And that is kinda how frankenmerges worked too? The model gets input and instead of actually processing it and trying to generalize, it just pulls out the closest training example therefore it is schizo but also sounds pretty good and more soulful.

Anonymous
07/30/24(Tue)18:56:09 No.101643770

Anonymous 07/30/24(Tue)18:56:09 No.101643770

File: 123 (2).png (492 KB, 960x779)

492 KB PNG

>>101643585
here's a comparison btw. Top is nemo and bottom is opus.

when it comes to ERP, I like nemo's raw style better, if it weren't so retarded i'd prefer it to opus

Anonymous
07/30/24(Tue)18:59:21 No.101643810

Anonymous 07/30/24(Tue)18:59:21 No.101643810

Are we able to load Mistral large with reasonable context on 48 vram? I tried a 3.0bpw but wouldn't even load at the usual 32764 context length. Had to use 8168 context.

Anonymous
07/30/24(Tue)19:03:57 No.101643865

Anonymous 07/30/24(Tue)19:03:57 No.101643865

For those without a 'standard' setup, how much did you spend on it?

Anonymous
07/30/24(Tue)19:04:40 No.101643880

Anonymous 07/30/24(Tue)19:04:40 No.101643880

>>101643770
>boobs
>boobs
jesus christ, mistral needs to scrape https://greensdictofslang.com asap

otherwise looks bretty good

Anonymous
07/30/24(Tue)19:09:41 No.101643929

Anonymous 07/30/24(Tue)19:09:41 No.101643929

What's the most natural sounding chatbot model currently actually runnable?

Because Nemo fucking sucks

Anonymous
07/30/24(Tue)19:10:43 No.101643946

Anonymous 07/30/24(Tue)19:10:43 No.101643946

>>101643929

https://huggingface.co/ykilcher/gpt-4chan

Anonymous
07/30/24(Tue)19:12:01 No.101643963

Anonymous 07/30/24(Tue)19:12:01 No.101643963

>>101643929
sorry to hear about your shit taste anon

Anonymous
07/30/24(Tue)19:14:54 No.101643991

Anonymous 07/30/24(Tue)19:14:54 No.101643991

>>101643963

Anonymous
07/30/24(Tue)19:16:27 No.101644008

Anonymous 07/30/24(Tue)19:16:27 No.101644008

File: 1697582032922759.png (50 KB, 1190x261)

50 KB PNG

>>101643946
sir

Anonymous
07/30/24(Tue)19:17:47 No.101644021

Anonymous 07/30/24(Tue)19:17:47 No.101644021

>>101644008
you can find the torrent

Anonymous
07/30/24(Tue)19:21:24 No.101644061

Anonymous 07/30/24(Tue)19:21:24 No.101644061

Does anyone have the torrent to llama 3.1 8b?

Anonymous
07/30/24(Tue)19:22:16 No.101644074

Anonymous 07/30/24(Tue)19:22:16 No.101644074

Post the best response you ever got from a model

Anonymous
07/30/24(Tue)19:23:37 No.101644093

Anonymous 07/30/24(Tue)19:23:37 No.101644093

>>101644074
no

Anonymous
07/30/24(Tue)19:24:29 No.101644101

Anonymous 07/30/24(Tue)19:24:29 No.101644101

>>101644008
a tool that does nothing but generate speech, and the society of "free speech" decided to ban it. what a joke

Anonymous
07/30/24(Tue)19:25:09 No.101644107

Anonymous 07/30/24(Tue)19:25:09 No.101644107

>>101644101
HF had to specifically create this disclaimer for GPT-4chan btw, they literally didn't have it beforehand. I think it's still the only model with that disclaimer on HF>

Anonymous
07/30/24(Tue)19:26:02 No.101644117

Anonymous 07/30/24(Tue)19:26:02 No.101644117

>>101643963
Stheno >>>>>>>> Nemo for RP coomery

Anonymous
07/30/24(Tue)19:27:22 No.101644125

Anonymous 07/30/24(Tue)19:27:22 No.101644125

>>101644117
Wrong. https://huggingface.co/nothingiisreal/Celeste-12B-V1.6

Anonymous
07/30/24(Tue)19:28:04 No.101644132

Anonymous 07/30/24(Tue)19:28:04 No.101644132

>>101644117
stheno is retarded dude
man I'm sick of retards who judge a model entirely on 'sovl' and don't care whether it's fucking moronic and can't keep a scene straight in its head

Anonymous
07/30/24(Tue)19:30:36 No.101644152

Anonymous 07/30/24(Tue)19:30:36 No.101644152

>>101644132
Not that anon, but I had great success with stheno on some seemingly complicated RPG scenarios.
I did get the whole fucked anatomy from time to time however.
Shit like sucking your dick while you fuck her, that kind of thing.
The thing about Stheno to me is that it's very one note, and horny.
It's style is very baked in, and it deviates very little from even with clever prompting.

Anonymous
07/30/24(Tue)19:30:53 No.101644153

Anonymous 07/30/24(Tue)19:30:53 No.101644153

File: ewewewqeweq.jpg (31 KB, 426x341)

31 KB JPG

If I wanna replicate the conversational flow that AIs have on websites like Character AI, what would be the best model?

So far i've tried Nemo (decent but tends to go pretty schizo quickly no matter the temps), Stheno is super good but I feel like there's probably something better. I have a 3090 + 32GB RAM, so obviously shit like Command R+ or Mistral Large i've not even tried

Anonymous
07/30/24(Tue)19:31:46 No.101644166

Anonymous 07/30/24(Tue)19:31:46 No.101644166

>>101644093
Okay babe

Anonymous
07/30/24(Tue)19:32:12 No.101644174

Anonymous 07/30/24(Tue)19:32:12 No.101644174

>>101644132
>retarded
Nigger, Nemo can barely coherently handle a 1 on 1 date scene without constantly trying to take initiative of the entire scene, asking me some schizo shit like "how's your dad" or some other random garbage.

What fucking coom scenes do people who shill this new Nemo shite even partake in? Because it's utter ass

Anonymous
07/30/24(Tue)19:32:42 No.101644182

Anonymous 07/30/24(Tue)19:32:42 No.101644182

Stheno is amazing. I will stick with Llama 3.0.

Anonymous
07/30/24(Tue)19:33:46 No.101644193

Anonymous 07/30/24(Tue)19:33:46 No.101644193

>>101644174
I agree, it doesn't touch Sao's finetunes at all. That dude alone is the only person advancing the whole ecosystem.

Anonymous
07/30/24(Tue)19:34:01 No.101644195

Anonymous 07/30/24(Tue)19:34:01 No.101644195

>>101644174
nta but that seems like a normal question to ask someone you're having a conversation with

Anonymous
07/30/24(Tue)19:34:13 No.101644199

Anonymous 07/30/24(Tue)19:34:13 No.101644199

>>101644174
>some other random garbage.
>not appreciating engaging sovl
poor taste

Anonymous
07/30/24(Tue)19:36:00 No.101644220

Anonymous 07/30/24(Tue)19:36:00 No.101644220

>>101644195
>>101644199
>first date
>date doesn't even know if i'm fucking batman or not, with no parents
>"How's your dad"
Kys, Nemo will always be dogshit that's literally only shilled because poorfags with 16GB dogshit GPUs can run it decently well. That's it

Anonymous
07/30/24(Tue)19:36:17 No.101644224

Anonymous 07/30/24(Tue)19:36:17 No.101644224

>>101644153
There's nothing better than Stheno. Come back next year.

Anonymous
07/30/24(Tue)19:36:21 No.101644225

Anonymous 07/30/24(Tue)19:36:21 No.101644225

>>101644174
there's nothing schizo about someone asking that question during a conversation you autistic retard

Anonymous
07/30/24(Tue)19:37:27 No.101644237

Anonymous 07/30/24(Tue)19:37:27 No.101644237

101644224
obvious bait, (You) denied

Anonymous
07/30/24(Tue)19:37:28 No.101644238

Anonymous 07/30/24(Tue)19:37:28 No.101644238

>>101644220
>anon has never been on a date

Anonymous
07/30/24(Tue)19:38:11 No.101644242

Anonymous 07/30/24(Tue)19:38:11 No.101644242

>>101644225
That's cool, but Stheno doesn't ask that question.

Anonymous
07/30/24(Tue)19:38:42 No.101644245

Anonymous 07/30/24(Tue)19:38:42 No.101644245

>>101644220
>only shilled because poorfags with 16GB dogshit GPUs can run it decently well
Anon. Stheno is smaller and easier to run than Nemo. What you just said applies to Stheno more than it does Nemo. Are you drunk?

Anonymous
07/30/24(Tue)19:39:53 No.101644258

Anonymous 07/30/24(Tue)19:39:53 No.101644258

>>101644220
My dude, I've had that asked of me in first dates as many times as not. People usually don't assume my village has been raided by orks.

Anonymous
07/30/24(Tue)19:40:02 No.101644266

Anonymous 07/30/24(Tue)19:40:02 No.101644266

>>101644245
yea but Nemo is the new flavor of the month

Anonymous
07/30/24(Tue)19:40:31 No.101644269

Anonymous 07/30/24(Tue)19:40:31 No.101644269

>>101644245
I run Stheno at FP32. It's something that you do when you aren't poor.

Anonymous
07/30/24(Tue)19:41:02 No.101644278

Anonymous 07/30/24(Tue)19:41:02 No.101644278

>>101644225
Yeah, this is making me realize that a lot of the poor judgement about which models are good in this thread comes down to autistic men literally not understanding how normal human beings talk
so the model says something completely normal, and they think it's "schizo"

Anonymous
07/30/24(Tue)19:41:56 No.101644299

Anonymous 07/30/24(Tue)19:41:56 No.101644299

>>101644258
>how's your dad
You've never been on a date if you're unironically telling me, without any prior mention of your family, that a girl just asks you "how's your dad".

Not unless you know the girl. Like just think of the logic behind it, what would prompt a girl to ask you how your dad is when she doesn't even know if you're close with your dad or anything about your family situation?

Also don't pretend you've ever even held more than 3 seconds of eye contact with a female in your life.

Anonymous
07/30/24(Tue)19:42:17 No.101644307

Anonymous 07/30/24(Tue)19:42:17 No.101644307

>>101644269
I run Nemo at FP32. It's something that you do when you aren't poor.

Anonymous
07/30/24(Tue)19:42:42 No.101644311

Anonymous 07/30/24(Tue)19:42:42 No.101644311

>>101644278
It's deeper than that.
The impression I get from some anons is that anything that deviates from their baseline expectation is schizo or bad. And I'm not talking about quality expectation but behavior, meaning that if a model can't read their mind, it's bad.

Anonymous
07/30/24(Tue)19:43:32 No.101644321

Anonymous 07/30/24(Tue)19:43:32 No.101644321

>>101644269
It was trained at BF16, so you are a retard wasting compute and electricity for zero possible benefit.

Anonymous
07/30/24(Tue)19:43:35 No.101644323

Anonymous 07/30/24(Tue)19:43:35 No.101644323

>>101644278
the only female hand you ever held was when your trans dad led you into the closet.

Anonymous
07/30/24(Tue)19:45:13 No.101644342

Anonymous 07/30/24(Tue)19:45:13 No.101644342

File: 235235432.jpg (21 KB, 438x438)

21 KB JPG

>>101644323

Anonymous
07/30/24(Tue)19:45:55 No.101644354

Anonymous 07/30/24(Tue)19:45:55 No.101644354

What version of Nemo are you guys even running?

I haven't checked any of the fine tuned ones and just tried out the instruct GGUF version, seemed ok, but with the way some people talk about it you would think it was GPT4

Anonymous
07/30/24(Tue)19:46:12 No.101644357

Anonymous 07/30/24(Tue)19:46:12 No.101644357

>>101644342
>No you are, but what am I?
unironically kys.

Anonymous
07/30/24(Tue)19:46:39 No.101644366

Anonymous 07/30/24(Tue)19:46:39 No.101644366

>>101644357
your hands are shaking rn

Anonymous
07/30/24(Tue)19:46:55 No.101644370

Anonymous 07/30/24(Tue)19:46:55 No.101644370

>>101644342
>"you are a woman, no matter what this mirror says"
Cute pic anon, happy transitionday

Anonymous
07/30/24(Tue)19:50:02 No.101644402

Anonymous 07/30/24(Tue)19:50:02 No.101644402

>>101644357
You did the same thing when you tried to turn the accusation of autism back around on him
Incredibly low quality thread hours atm, I blame Americans

Anonymous
07/30/24(Tue)19:53:44 No.101644443

Anonymous 07/30/24(Tue)19:53:44 No.101644443

>>101644354
I just use the q8 gguf of nemo-instruct, its main advantage is that it's fast, since I don't have much vram. I summarize and hide messages to keep the context below 20-25k for longer term. It's not too stupid but sometimes I have to edit or regen stuff or add notes if it's unable to do something right. This takes less time than running a better model at 0.5T/s or something though.

Anonymous
07/30/24(Tue)20:08:48 No.101644637

Anonymous 07/30/24(Tue)20:08:48 No.101644637

File: aca.jpg (58 KB, 511x562)

58 KB JPG

>>101643094
>--Local alternatives to character.ai and model benchmarking discussion:

Wait, is there finally a local alternative to character.ai that is actually usable on rigs that aren't triple GPU setups?

Anonymous
07/30/24(Tue)20:09:10 No.101644641

Anonymous 07/30/24(Tue)20:09:10 No.101644641

if you wanna try mistral large 2 for free: >>101644574

Anonymous
07/30/24(Tue)20:10:30 No.101644663

Anonymous 07/30/24(Tue)20:10:30 No.101644663

What are these proxies? Are they truly just free? Who's paying for them?

Anonymous
07/30/24(Tue)20:11:19 No.101644678

Anonymous 07/30/24(Tue)20:11:19 No.101644678

>>101644663
>Who's paying for them?
People in /aicg/ steal keys from GitHub and other places, and make proxies
>Are they truly just free?
Some yes, some not. And of course they could always be logging your prompts and outputs

Anonymous
07/30/24(Tue)20:11:51 No.101644685

Anonymous 07/30/24(Tue)20:11:51 No.101644685

>>101644641
Fuck I opened that shit with people next to me

Anonymous
07/30/24(Tue)20:12:23 No.101644691

Anonymous 07/30/24(Tue)20:12:23 No.101644691

>>101644685
Uohhhhhhhhhhh. Did they sob?

Anonymous
07/30/24(Tue)20:13:05 No.101644701

Anonymous 07/30/24(Tue)20:13:05 No.101644701

>>101644685
>clicking shit from anonymous in public
le mao

Anonymous
07/30/24(Tue)20:19:14 No.101644777

Anonymous 07/30/24(Tue)20:19:14 No.101644777

Interesting.
Testing Celeste 12B now.
I ask it the same chain of questions I ask the other models with this card, and it produces a response that's much like the other nemo finetunes, but it fucks up the formatting of a single specific piece of information in the exact same way.
Either this was fine tuned on top of -instruct or that is a characteristic of the base model, which I don't think it is since the other fine tunes (such as mini-magnum) didn't produce this error. Although, it uses chatMl, so it probably was fine tuned on top of base right?
Anyhow, no thoughts about it so far other than that it was annoying as hell to get the card's prefill working, but it did produce a decent first response with the working version in place.

Anonymous
07/30/24(Tue)20:20:43 No.101644793

Anonymous 07/30/24(Tue)20:20:43 No.101644793

>>101644641
If you don't want to see japanese kiddie NSFW cartoons in your face, https://seekers-str-garlic-prediction.trycloudflare.com/ (there's still a dubious video, just ignore it)

Anonymous
07/30/24(Tue)20:21:55 No.101644809

Anonymous 07/30/24(Tue)20:21:55 No.101644809

Is ST still the best frontend? Are you guys using something else?

Anonymous
07/30/24(Tue)20:22:29 No.101644815

Anonymous 07/30/24(Tue)20:22:29 No.101644815

Is TensorRT-LLM another backend like llama.cpp and vLLM?

Anonymous
07/30/24(Tue)20:22:48 No.101644819

Anonymous 07/30/24(Tue)20:22:48 No.101644819

>>101644793
>If you don't want to see japanese kiddie NSFW cartoons in your face,
Why wouldn't I want to see that?

Anonymous
07/30/24(Tue)20:23:07 No.101644824

Anonymous 07/30/24(Tue)20:23:07 No.101644824

File: pure kino.jpg (178 KB, 968x1150)

178 KB JPG

>>101644637
Character.AI if it somehow removed its filter would unironically clear every single dogshit model that people spend $6000 worth on PCs to run.

>AI actually understands sarcasm
>holds a conversation realistically

Utter brutal mogging. Come back in 5 years and maybe we'll be close.

Anonymous
07/30/24(Tue)20:24:49 No.101644847

Anonymous 07/30/24(Tue)20:24:49 No.101644847

>>101644824
>>AI actually understands sarcasm
Have you tried anything besides local models? That's not something new

Anonymous
07/30/24(Tue)20:25:21 No.101644860

Anonymous 07/30/24(Tue)20:25:21 No.101644860

>>101644641
how the fuck do you even use it?

Anonymous
07/30/24(Tue)20:25:39 No.101644865

Anonymous 07/30/24(Tue)20:25:39 No.101644865

>>101644860
see >>101644854

Anonymous
07/30/24(Tue)20:27:02 No.101644884

Anonymous 07/30/24(Tue)20:27:02 No.101644884

>>101644815
lol

Anonymous
07/30/24(Tue)20:27:57 No.101644894

Anonymous 07/30/24(Tue)20:27:57 No.101644894

>>101644865
>risk to data privacy

Ooft.. How big is the risk. I'm not gonna get keylogged am I

Anonymous
07/30/24(Tue)20:27:59 No.101644896

Anonymous 07/30/24(Tue)20:27:59 No.101644896

are there any existing prompt sets for jp > en translation

Anonymous
07/30/24(Tue)20:28:30 No.101644904

Anonymous 07/30/24(Tue)20:28:30 No.101644904

>>101644894
The risk is that the proxy owner might log all of your prompts + their completions along with your IP, so don't share any private info in your prompts, and better use a VPN.

Anonymous
07/30/24(Tue)20:30:22 No.101644924

Anonymous 07/30/24(Tue)20:30:22 No.101644924

>>101644904
ah sweet

Anonymous
07/30/24(Tue)20:31:01 No.101644934

Anonymous 07/30/24(Tue)20:31:01 No.101644934

Is nemo smarter than mixtral 8x7b?

Anonymous
07/30/24(Tue)20:32:40 No.101644953

Anonymous 07/30/24(Tue)20:32:40 No.101644953

>>101644934
absolutely not lol

Anonymous
07/30/24(Tue)20:33:47 No.101644969

Anonymous 07/30/24(Tue)20:33:47 No.101644969

>>101644953
Then why is it so popular now? I thought that meant it had surpassed it. And now people say it's only good with 32k context, so may as well keep using mixtral?

Anonymous
07/30/24(Tue)20:36:38 No.101645000

Anonymous 07/30/24(Tue)20:36:38 No.101645000

>>101644969
It's much smarter, that anon is a lying hater faggot.

Anonymous
07/30/24(Tue)20:36:47 No.101645003

Anonymous 07/30/24(Tue)20:36:47 No.101645003

>>101644969
It's popular because it's good at the current FoTM fetishes for how resource-friendly it is.

Anonymous
07/30/24(Tue)20:37:31 No.101645013

Anonymous 07/30/24(Tue)20:37:31 No.101645013

>>101644934
No.
It's saving grace is being a lot smaller and having a long ass context window, even if it degrades the more you fill it.
It's competition to llama 3 8b, not mixtral 8x7b.

Anonymous
07/30/24(Tue)20:37:47 No.101645019

Anonymous 07/30/24(Tue)20:37:47 No.101645019

If you're a high roller Mistral Large is probably your best option.

Anonymous
07/30/24(Tue)20:38:00 No.101645021

Anonymous 07/30/24(Tue)20:38:00 No.101645021

the hospital discussion from last thread is interesting. you can definitely get decent voice transcripts locally even on really cheap phones. i use the futo voice input thing and it works great as long as i speak english.

Anonymous
07/30/24(Tue)20:39:13 No.101645033

Anonymous 07/30/24(Tue)20:39:13 No.101645033

File: 1694513432075190.png (3 KB, 286x65)

3 KB PNG

>>101645019
If you don't have a local rig that can run it, 3.5 Sonnet is much, much better than Largestral 2 for almost the same price point (same $3 for input, $15 for output vs $9 for mistral)

Anonymous
07/30/24(Tue)20:39:22 No.101645035

Anonymous 07/30/24(Tue)20:39:22 No.101645035

File: 42deec376f0062de221789b2e(...).png (448 KB, 1597x1600)

448 KB PNG

>>101645000
>It's much smarter, that anon is a lying hater faggot.

Anonymous
07/30/24(Tue)20:39:53 No.101645040

Anonymous 07/30/24(Tue)20:39:53 No.101645040

>>101644847
Local models and Silly Tavern is what every faggot shills tho

Anonymous
07/30/24(Tue)20:41:11 No.101645050

Anonymous 07/30/24(Tue)20:41:11 No.101645050

LlamaCPP and KoboldCPP both do prompt processing only on CPU for Mistral Large. It's so painfully slow, loading 1k tokens takes more than 10 minutes. I want to rope myself. Fix when?

Anonymous
07/30/24(Tue)20:42:56 No.101645067

Anonymous 07/30/24(Tue)20:42:56 No.101645067

>>101645033
If you're an API paypig you're probably looking for /aicg/

Anonymous
07/30/24(Tue)20:43:57 No.101645087

Anonymous 07/30/24(Tue)20:43:57 No.101645087

>>101645050
That's your fault for not having enough VRAM to fit the kv cache.

Anonymous
07/30/24(Tue)20:44:25 No.101645096

Anonymous 07/30/24(Tue)20:44:25 No.101645096

>>101645050
Have you tried… gitting gud? I run it on GPU for like 5 days now.

Anonymous
07/30/24(Tue)20:45:37 No.101645108

Anonymous 07/30/24(Tue)20:45:37 No.101645108

>>101643128
Haha... before I saw that I was thinking that was some sort of Koikatsu posed model, given the flat low-poly look.

Anonymous
07/30/24(Tue)20:46:06 No.101645113

Anonymous 07/30/24(Tue)20:46:06 No.101645113

>>101645108
Check >>101643167, it's a better version. Would you think it was AI-generated without knowing the resolution?

Anonymous
07/30/24(Tue)20:46:08 No.101645114

Anonymous 07/30/24(Tue)20:46:08 No.101645114

>>101645087
>>101645096
I have 8GB VRAM, I am using Vulkan and offloading 0 layers. The model size is 38.7GB, I have run bigger models than this. What am I doing wrong?

Anonymous
07/30/24(Tue)20:47:35 No.101645133

Anonymous 07/30/24(Tue)20:47:35 No.101645133

>>101645013
Interesting, is the best option to run just mixtral-instruct directly instead of one of the other versions people talk about?

Anonymous
07/30/24(Tue)20:49:28 No.101645153

Anonymous 07/30/24(Tue)20:49:28 No.101645153

>>101645133
Compared to Nemo, If you have the hardware to get the speeds you want, yeah.
Otherwise, if you can run miqu (mistral medium) or mistral large, then those should be better. I can't attest to this first hand since I don't have the hardware, but as far a I know, that's how it goes.

Anonymous
07/30/24(Tue)20:52:05 No.101645180

Anonymous 07/30/24(Tue)20:52:05 No.101645180

File: Screenshot 2024-07-30 184816.png (131 KB, 1532x694)

131 KB PNG

Altman is laughing at us...

Anonymous
07/30/24(Tue)20:52:55 No.101645191

Anonymous 07/30/24(Tue)20:52:55 No.101645191

>>101645114
>The model size is 38.7GB
Sweaty I...
When you quant a model that much it becomes retarded. If you can't run it at at least 4.25bpw you can't run it. That's the reality. You can make it coherent with things like imatrix etc, but it's just not the same experience at that point.

Anonymous
07/30/24(Tue)20:53:25 No.101645203

Anonymous 07/30/24(Tue)20:53:25 No.101645203

>>101644847
Stheno understands sarcasm perfectly fine.

Anonymous
07/30/24(Tue)20:53:35 No.101645207

Anonymous 07/30/24(Tue)20:53:35 No.101645207

>>101645153
What's the smallest quant of miqu that would actually be worthwhile? It's always been pretty slow for me.

Anonymous
07/30/24(Tue)20:53:58 No.101645214

Anonymous 07/30/24(Tue)20:53:58 No.101645214

>>101645180
he won though, lmsys just proves that most humans are retarded and don't actually read LLM outputs:
https://huggingface.co/spaces/lmsys/gpt-4o-mini_battles
I think they nitpicked specifically bad 3.5 sonnet examples to show it in a bad light, but it still shows that gpt-4o mini just does extremely verbose and long outputs with excessive markdown formating, and people prefer that over 3.5 sonnet's default of plaintext and only answering the actual question. Also, 3.5 Sonnet does more refusals.

Anonymous
07/30/24(Tue)20:54:00 No.101645215

Anonymous 07/30/24(Tue)20:54:00 No.101645215

>>101645180
Why is he laughing? Didn't he just post a 5 billion dollar loss?

Anonymous
07/30/24(Tue)20:54:25 No.101645219

Anonymous 07/30/24(Tue)20:54:25 No.101645219

>>101645191
I am using IQ2_M it's still better than everything else out there.

Anonymous
07/30/24(Tue)20:54:39 No.101645222

Anonymous 07/30/24(Tue)20:54:39 No.101645222

>>101645215
>Didn't he just post a 5 billion dollar loss?
That's fake news by retarded news outlets

Anonymous
07/30/24(Tue)20:55:10 No.101645232

Anonymous 07/30/24(Tue)20:55:10 No.101645232

File: 1711880795163244.png (235 KB, 2009x1141)

235 KB PNG

>>101645214

Anonymous
07/30/24(Tue)20:55:25 No.101645233

Anonymous 07/30/24(Tue)20:55:25 No.101645233

File: KL-divergence_quants.png (111 KB, 1771x944)

111 KB PNG

>>101645207
I personally wouldn't go any lower than 4.5bpw~ish/Q4_K_S, but people say that larger models are more resilient to quanting.
I think Q3_some_something is 4bpw~ish?

Anonymous
07/30/24(Tue)20:55:44 No.101645239

Anonymous 07/30/24(Tue)20:55:44 No.101645239

>>101645180
C3.5 Sonnet is way better, that mememark is shit

Anonymous
07/30/24(Tue)20:55:59 No.101645242

Anonymous 07/30/24(Tue)20:55:59 No.101645242

File: 1708802259809863.png (70 KB, 1547x659)

70 KB PNG

>>101645214
look at this shit

Anonymous
07/30/24(Tue)20:57:29 No.101645264

Anonymous 07/30/24(Tue)20:57:29 No.101645264

File: 1705015046019661.png (245 KB, 1901x1049)

245 KB PNG

gpt4o and gpt4o mini basically write mini-essays for every fucking answer, and le normie AI ENJOYERS enjoy this too much

Anonymous
07/30/24(Tue)20:57:43 No.101645266

Anonymous 07/30/24(Tue)20:57:43 No.101645266

>>101645222
You wish, sammy boy. Microsluts took everything with the slightest profit potential from OAI and then gave it away for free for non commercial users as a loss leader and will continue to do so. As far as commercial users goes anyone handling sensitive information can now just run 405B on a single H100 cluster instead of trusting a bunch of pajeet dataminers with it. There's no real way forward for OAI at this point.

Anonymous
07/30/24(Tue)20:57:45 No.101645267

Anonymous 07/30/24(Tue)20:57:45 No.101645267

https://x.com/ManuVision/status/1818412120373182928
https://x.com/Evinst3in/status/1818423736942342389

Why did zuck not drop the voice mode already.
Images with chameleon was interesting too and its been taken out.
I hope we get voice stuff soon. Even for TTS there is almost nothing locally. xtts2 isnt that good.

Anonymous
07/30/24(Tue)20:58:14 No.101645271

Anonymous 07/30/24(Tue)20:58:14 No.101645271

>>101645266
>There's no real way forward for OAI at this point.
gpt4o mini is $0.15/$0.6

Anonymous
07/30/24(Tue)21:00:06 No.101645287

Anonymous 07/30/24(Tue)21:00:06 No.101645287

>>101645271
A databreach when a bunch of jeet diversity hires click on a phishing email costs millions. Vs. a few hundred thou for an H100 cluster.

Anonymous
07/30/24(Tue)21:00:39 No.101645296

Anonymous 07/30/24(Tue)21:00:39 No.101645296

>>101645114
llama_kv_cache_init: AMD Radeon RX 6600 XT KV buffer size = 16.00 MiB
llama_kv_cache_init: Vulkan_Host KV buffer size = 1392.00 MiB
llama_new_context_with_model: KV self size = 1408.00 MiB, K (f16): 704.00 MiB, V (f16): 704.00 MiB
llama_new_context_with_model: Vulkan_Host output buffer size = 0.13 MiB
llama_new_context_with_model: AMD Radeon RX 6600 XT compute buffer size = 1700.00 MiB
llama_new_context_with_model: Vulkan_Host compute buffer size = 1696.01 MiB
llama_new_context_with_model: graph nodes = 2822
llama_new_context_with_model: graph splits = 799
Load Text Model OK: True

Everything seems fine yet it's still using the CPU for prompt processing.

Anonymous
07/30/24(Tue)21:01:58 No.101645311

Anonymous 07/30/24(Tue)21:01:58 No.101645311

>>101645296
I don't know about vulkan but yeah that does seem kind of weird.

Anonymous
07/30/24(Tue)21:03:11 No.101645325

Anonymous 07/30/24(Tue)21:03:11 No.101645325

What are the potential implications of this PR getting merged?
https://github.com/vllm-project/vllm/pull/5191

Anonymous
07/30/24(Tue)21:03:37 No.101645331

Anonymous 07/30/24(Tue)21:03:37 No.101645331

>>101645325
The implication is that vLLM will support loading GGUF models

Anonymous
07/30/24(Tue)21:05:42 No.101645348

Anonymous 07/30/24(Tue)21:05:42 No.101645348

Why does Saltman think sending his jeet shills here is somehow going to get his 5 billion dollar loss back?

Anonymous
07/30/24(Tue)21:06:34 No.101645358

Anonymous 07/30/24(Tue)21:06:34 No.101645358

>>101645348
Anon, I hate to be that guy, but let's be real... you're coping because OpenAI is successful.

Anonymous
07/30/24(Tue)21:07:17 No.101645367

Anonymous 07/30/24(Tue)21:07:17 No.101645367

>>101645331
Will it make using GGUFs about 800% faster compared to using llama.cpp?

Anonymous
07/30/24(Tue)21:07:28 No.101645369

Anonymous 07/30/24(Tue)21:07:28 No.101645369

>>101645358
So successful that they basically had to give GPT4|o away for free...

Anonymous
07/30/24(Tue)21:07:51 No.101645375

Anonymous 07/30/24(Tue)21:07:51 No.101645375

>>101645267
i've been having fun with piper t b h. it's surprisingly fast.

Anonymous
07/30/24(Tue)21:10:01 No.101645403

Anonymous 07/30/24(Tue)21:10:01 No.101645403

>>101645296
Have you tried the rocM fork of koboldcpp?
That probably runs better than vulkan.

Anonymous
07/30/24(Tue)21:10:49 No.101645414

Anonymous 07/30/24(Tue)21:10:49 No.101645414

File: ComfyUI_00073.jpg (1 MB, 2048x2048)

1 MB JPG

>>101644824
>pic related
>"good"
Local has been better than whatever garbage that is for quite a while
>$6000
Poorfag detected

Anonymous
07/30/24(Tue)21:11:23 No.101645418

Anonymous 07/30/24(Tue)21:11:23 No.101645418

>>101645414
>Poorfag detected
Give me $6k and I'll stop using corpo models immediately.

Anonymous
07/30/24(Tue)21:11:29 No.101645419

Anonymous 07/30/24(Tue)21:11:29 No.101645419

>>101645348
He does not need shills. He simply makes unfiltered GPT4 bots argue with each other here to drown out real discussion. He's been doing this for more than a year.

Anonymous
07/30/24(Tue)21:11:34 No.101645420

Anonymous 07/30/24(Tue)21:11:34 No.101645420

>>101645403
I don't want to boot into loonix every time I wanna RP.

Anonymous
07/30/24(Tue)21:12:12 No.101645428

Anonymous 07/30/24(Tue)21:12:12 No.101645428

File: 1693951440467878.png (21 KB, 880x213)

21 KB PNG

People: le humans are so heckin smart compared to models
People in reality: picrel

Anonymous
07/30/24(Tue)21:12:47 No.101645431

Anonymous 07/30/24(Tue)21:12:47 No.101645431

>>101645358
Shouldn't you be busy deepthroating VC dick instead of posting here, Sam? That $5B hole ain't gonna fill itself without a lot of spit shine.

Anonymous
07/30/24(Tue)21:13:14 No.101645436

Anonymous 07/30/24(Tue)21:13:14 No.101645436

>>101645420
https://github.com/YellowRoseCx/koboldcpp-rocm/releases
There's pre-compiled windows binaries.
Thos is not an endorsement, by the way. I use llama-server with a Nvidia GPU, but you might as well try it.

Anonymous
07/30/24(Tue)21:13:21 No.101645438

Anonymous 07/30/24(Tue)21:13:21 No.101645438

>>101645431
You're really salty, anon. Why is that? You are only running local models right now thanks to OpenAI who kicked this industry.

Anonymous
07/30/24(Tue)21:16:19 No.101645473

Anonymous 07/30/24(Tue)21:16:19 No.101645473

>>101645418
>Gibs money plz
You will forever be a poorfag with that attitude
>>101645438
Yes, saars Saltman kicked industry very good. Vishnu bless.

Anonymous
07/30/24(Tue)21:17:58 No.101645495

Anonymous 07/30/24(Tue)21:17:58 No.101645495

>>101645428
The issue with Arena always will be that there's no real life usecase for it. People will sit down, as it the Sally question and some basic programming problems and give the win to whatever sucks their dick enough while doing so. It's yet another worthless benchmark amongst many.

Anonymous
07/30/24(Tue)21:20:45 No.101645527

Anonymous 07/30/24(Tue)21:20:45 No.101645527

File: pepe-happy.gif (55 KB, 498x498)

55 KB GIF

>>101645436
HOLY MOLY, IT WORKS!! It's so fast!
Thanks fren!

Anonymous
07/30/24(Tue)21:21:21 No.101645534

Anonymous 07/30/24(Tue)21:21:21 No.101645534

>>101645527
Have fun.

Anonymous
07/30/24(Tue)21:24:36 No.101645573

Anonymous 07/30/24(Tue)21:24:36 No.101645573

>>101645495
>people ask for something
>vote for model that gives them what they want / like
>useless becuase... just because, ok!

Anonymous
07/30/24(Tue)21:26:31 No.101645592

Anonymous 07/30/24(Tue)21:26:31 No.101645592

>>101644125
>>101644777
Okay, yeah, this is not bad so far.
If this can do the mechanics part of the RPG as well as base nemo can I might keep this one as my main model.

Anonymous
07/30/24(Tue)21:26:41 No.101645596

Anonymous 07/30/24(Tue)21:26:41 No.101645596

>>101645573
Thanks for your input, GPT4o-mini

Anonymous
07/30/24(Tue)21:27:52 No.101645610

Anonymous 07/30/24(Tue)21:27:52 No.101645610

File: Screenshot 2024-07-30 192709.png (49 KB, 1569x294)

49 KB PNG

>>101645573
>Sam himself acknowledges that L3.1 8B is ahead of OG GPT-4
Based

Anonymous
07/30/24(Tue)21:28:15 No.101645614

Anonymous 07/30/24(Tue)21:28:15 No.101645614

>>101645610
AHAHAHAHAH

Anonymous
07/30/24(Tue)21:28:19 No.101645616

Anonymous 07/30/24(Tue)21:28:19 No.101645616

>>101645420
i legitimately can't imagine how bad local stuff must be to run on windows

Anonymous
07/30/24(Tue)21:28:25 No.101645618

Anonymous 07/30/24(Tue)21:28:25 No.101645618

>>101645419
Nah. I mean he probably does that too. But being a billionaire he's probably anhedonic from all the endless rape orgies and designer drugs. He needs the tactile sensation of actually being there, getting under somebodies skin. He's out there.

Anonymous
07/30/24(Tue)21:31:05 No.101645650

Anonymous 07/30/24(Tue)21:31:05 No.101645650

>>101645573
Deciding something like this by vote is retarded. There is no wisdom of the crowd.
If collective decision making by voting was viable, there would be no need for markets.

Anonymous
07/30/24(Tue)21:32:30 No.101645664

Anonymous 07/30/24(Tue)21:32:30 No.101645664

Any anon ran Largestral with dual 4090s? What quant? And is it better than Mixtral 8x7B instruct dual 4090?

Anonymous
07/30/24(Tue)21:35:34 No.101645697

Anonymous 07/30/24(Tue)21:35:34 No.101645697

>>101645527
On windows + amd?

Anonymous
07/30/24(Tue)21:37:27 No.101645719

Anonymous 07/30/24(Tue)21:37:27 No.101645719

>>101645664
why on earth would you need dual 4090 for 8x7b lol

Anonymous
07/30/24(Tue)21:38:33 No.101645733

Anonymous 07/30/24(Tue)21:38:33 No.101645733

>>101645697
Yes!

Anonymous
07/30/24(Tue)21:39:43 No.101645749

Anonymous 07/30/24(Tue)21:39:43 No.101645749

>>101645650
if we decide everything by consensus I would still need groceries
it's a question of perceived quality (by average users) vs actual quality. why'd you bring up markets?

Anonymous
07/30/24(Tue)21:40:43 No.101645767

Anonymous 07/30/24(Tue)21:40:43 No.101645767

>>101645733
Oh shiet that's nice, enjoy

Anonymous
07/30/24(Tue)21:44:49 No.101645820

Anonymous 07/30/24(Tue)21:44:49 No.101645820

>>101645035
I tried it and Mixtral doesn't seem much smarter if at all and it's less willing to do stuff.

Anonymous
07/30/24(Tue)21:44:50 No.101645821

Anonymous 07/30/24(Tue)21:44:50 No.101645821

>>101645267
Sounds like shit

Anonymous
07/30/24(Tue)21:47:25 No.101645851

Anonymous 07/30/24(Tue)21:47:25 No.101645851

>>101645821
Yeah, I dont want to talk to a black woman either.
Thats why we need local for the anime finetune.
Video/audio in and audio out seems really cool though. Hope by the end of the year we have that shit too. Dont care if the quality will be worse or whatever.

Anonymous
07/30/24(Tue)21:49:24 No.101645878

Anonymous 07/30/24(Tue)21:49:24 No.101645878

Oh shit that's right. Udio apparently released a new model I should give it a whirl. I know it's not local, but let's be honest, local is only good for imagegen and text

Anonymous
07/30/24(Tue)21:50:23 No.101645889

Anonymous 07/30/24(Tue)21:50:23 No.101645889

>>101645878
the 1.5 version one? yeah it's something "new" 2 weeks ago, I indeed noticed an improvement of quality, desu at this point if you give someone a song from udio he won't notice it's AI, that's crazy

Anonymous
07/30/24(Tue)21:50:43 No.101645893

Anonymous 07/30/24(Tue)21:50:43 No.101645893

>>101645878
Why is /vsg/ in cryostasis? Music industry having too powerful jewish lawyers?

Anonymous
07/30/24(Tue)21:52:29 No.101645914

Anonymous 07/30/24(Tue)21:52:29 No.101645914

How can I format and place a summary of the events of a previous chat so the model doesn't try to replicate a concise list of events in its replies?

Anonymous
07/30/24(Tue)21:55:24 No.101645932

Anonymous 07/30/24(Tue)21:55:24 No.101645932

>>101645592
Lot's of "testament to this, testament to that" however, even if the general writing is more natural than nemo-instruct.
Interesting.

>>101645914
What model?
I think a system message (via author's notes or w/e) with a header saying that that's a summary of past events is the usual way to do it.
You might need to couple it with some instructions in the First Assistant Output "telling the model" that the actual chat start at that point.

Anonymous
07/30/24(Tue)21:55:30 No.101645934

Anonymous 07/30/24(Tue)21:55:30 No.101645934

>>101645851
>Thats why we need local
Have we even gotten speech to speech (another voice) yet? (p.s. not a tranny)

Anonymous
07/30/24(Tue)21:56:12 No.101645942

Anonymous 07/30/24(Tue)21:56:12 No.101645942

>>101645893
Two factors:
>a certain someone kept trying to kill it
>it was literally toted on the back of the 'ick on 'eck faggot trying to salvage Tortoise TTS

Anonymous
07/30/24(Tue)21:56:32 No.101645944

Anonymous 07/30/24(Tue)21:56:32 No.101645944

>>101645934
RVC works quite well for that

Anonymous
07/30/24(Tue)22:00:04 No.101645973

Anonymous 07/30/24(Tue)22:00:04 No.101645973

File: file.png (16 KB, 424x115)

16 KB PNG

>>101645944
>https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI
Why the fuck do they have a live time counter lol

Anonymous
07/30/24(Tue)22:01:16 No.101645983

Anonymous 07/30/24(Tue)22:01:16 No.101645983

>>101645267
ClosedAI is trying to make it really safe, whether for legal and regulatory reasons or they genuinely believe in that crap (probably both). Same story with basically every other corp training models. No one is going to risk the potential fallout if things go badly because some journo made you into a controversy and your stock plummets. The only way we get to the point where corpos release the weights for these kinds of models is for them to already exist elsewhere and it has been normalized. Like how the Llama 1 leak helped FB see that there would both be value in releasing it (making it into a product) and that it would not result in huge backlash, since it was basically already out and nothing bad happened to them.

Anonymous
07/30/24(Tue)22:01:34 No.101645986

Anonymous 07/30/24(Tue)22:01:34 No.101645986

Also fun fact as far as voice goes... I know people want real-time tts and this won't cut it. But Suno 3.5 is half decent at voice duplication if you have a 30-60 second sample of somebody speaking clearly. Then you just continue it with "speech" in the genre field

Anonymous
07/30/24(Tue)22:02:56 No.101646003

Anonymous 07/30/24(Tue)22:02:56 No.101646003

>>101645325
I observed weird nondeterministic behavior when I used vllm, so I will not be using it until that's fixed. At least with Llama.cpp, I can disable caching to fix that (even though it makes chats slower).

Anonymous
07/30/24(Tue)22:03:58 No.101646015

Anonymous 07/30/24(Tue)22:03:58 No.101646015

>>101645610
lol wtf

Anonymous
07/30/24(Tue)22:07:43 No.101646050

Anonymous 07/30/24(Tue)22:07:43 No.101646050

File: f2ec64ae1995827ed91c851b5(...).png (927 KB, 1024x768)

927 KB PNG

I purposefully avoided this AI RP bot shit since it first came along around 2 years ago, but for some reason I decided to give local models a try a couple days ago and now I'm completely addicted. This shit is like crack. Fuck you anons.

Anonymous
07/30/24(Tue)22:09:09 No.101646063

Anonymous 07/30/24(Tue)22:09:09 No.101646063

>>101646050
Many such cases.
Welcome to the club.
[fakespoiler]You'll get bored eventually and then it'll just become an occasional thing.[/fakespoiler]

Anonymous
07/30/24(Tue)22:10:22 No.101646080

Anonymous 07/30/24(Tue)22:10:22 No.101646080

>>101646050

>Have enough social skills/looksmaxxing to ask cute girls for their phone number.
>Text them
>Realize talking to literal NPCs are more interesting.

Quit before it's too late. It's over for me.

Anonymous
07/30/24(Tue)22:13:57 No.101646120

Anonymous 07/30/24(Tue)22:13:57 No.101646120

>>101646050
You'll have your mind blown away a second time when you try claude and realize local is not worth it

Anonymous
07/30/24(Tue)22:14:40 No.101646131

Anonymous 07/30/24(Tue)22:14:40 No.101646131

>>101646050
Try 3.5 sonnet from >>101644651 (cunny pic), you'll become addicted to claude

Anonymous
07/30/24(Tue)22:15:40 No.101646142

Anonymous 07/30/24(Tue)22:15:40 No.101646142

>>101645932
That's probably my problem, I was just leaving a summary in the chat part.

Anonymous
07/30/24(Tue)22:17:48 No.101646163

Anonymous 07/30/24(Tue)22:17:48 No.101646163

>>101646120
Nta but paying per prompt seems like an ez way to funnel money into the shitter

Anonymous
07/30/24(Tue)22:19:17 No.101646185

Anonymous 07/30/24(Tue)22:19:17 No.101646185

File: file.png (131 KB, 455x457)

131 KB PNG

>>101646163
Paying?

Anonymous
07/30/24(Tue)22:20:32 No.101646197

Anonymous 07/30/24(Tue)22:20:32 No.101646197

>>101643089
Any good models for Japanese to English translations?

Anonymous
07/30/24(Tue)22:22:42 No.101646220

Anonymous 07/30/24(Tue)22:22:42 No.101646220

File: 1611784850937.jpg (25 KB, 570x367)

25 KB JPG

>>101646080
>Women sense my power, but I deny them my essence

Anonymous
07/30/24(Tue)22:29:30 No.101646282

Anonymous 07/30/24(Tue)22:29:30 No.101646282

File: udio fucking won.png (2 KB, 311x167)

2 KB PNG

>>101645878
holy shit.
Haven't even genned anything yet but Udio fucking won. Suno v4 better have this shit.

Anonymous
07/30/24(Tue)22:29:54 No.101646289

Anonymous 07/30/24(Tue)22:29:54 No.101646289

>>101646185
Well, using proxies intuitively seems like a bad idea. Don't know any alternatives at that point.

Anonymous
07/30/24(Tue)22:34:59 No.101646335

Anonymous 07/30/24(Tue)22:34:59 No.101646335

>>101646289
Honestly, I wouldn't mind paying for it (have the money + less rate limits + don't have to change proxies when they get taken down) but I don't want my name and cellphone number associated with ERP lol

Anonymous
07/30/24(Tue)22:35:38 No.101646345

Anonymous 07/30/24(Tue)22:35:38 No.101646345

File: 5fe08a86bb7f809e8de98b8a4(...).png (3.44 MB, 2097x3215)

3.44 MB PNG

>>101646120
>>101646131
Is claude really worth it? Llama 3 Stheno is already enough to get me rock hard and shooting massive loads. Barely ever messes anything up either.

Anonymous
07/30/24(Tue)22:37:47 No.101646364

Anonymous 07/30/24(Tue)22:37:47 No.101646364

>>101646345
>Is claude really worth it?
Only you can tell us that, if you are happy with your set up then rock on, but you don't lose anything by trying it out with a free proxy
My biggest issue is this >>101646335

Anonymous
07/30/24(Tue)22:45:35 No.101646417

Anonymous 07/30/24(Tue)22:45:35 No.101646417

>>101646335
so true

Anonymous
07/30/24(Tue)22:47:35 No.101646434

Anonymous 07/30/24(Tue)22:47:35 No.101646434

>toy model works when doing eval pass in training
>toy model doesn't work in inferencing
AAAAAAAAAAAAAAAA

Anonymous
07/30/24(Tue)22:48:02 No.101646440

Anonymous 07/30/24(Tue)22:48:02 No.101646440

>>101646345
Buy an ad, sao

Anonymous
07/30/24(Tue)23:16:14 No.101646652

Anonymous 07/30/24(Tue)23:16:14 No.101646652

File: Untitled.png (159 KB, 720x405)

159 KB PNG

Physics of Language Models: Part 2.1, Grade-School Math and the Hidden Reasoning Process
https://arxiv.org/abs/2407.20311
>Recent advances in language models have demonstrated their capability to solve mathematical reasoning problems, achieving near-perfect accuracy on grade-school level math benchmarks like GSM8K. In this paper, we formally study how language models solve these problems. We design a series of controlled experiments to address several fundamental questions: (1) Can language models truly develop reasoning skills, or do they simply memorize templates? (2) What is the model's hidden (mental) reasoning process? (3) Do models solve math questions using skills similar to or different from humans? (4) Do models trained on GSM8K-like datasets develop reasoning skills beyond those necessary for solving GSM8K problems? (5) What mental process causes models to make reasoning mistakes? (6) How large or deep must a model be to effectively solve GSM8K-level math questions? Our study uncovers many hidden mechanisms by which language models solve mathematical questions, providing insights that extend beyond current understandings of LLMs.
part 2.1 is up. 2.2 soon I guess. whole series is a good read
https://arxiv.org/abs/2305.13673
https://arxiv.org/abs/2309.14316
https://arxiv.org/abs/2309.14402
https://arxiv.org/abs/2404.05405

Anonymous
07/30/24(Tue)23:18:04 No.101646664

Anonymous 07/30/24(Tue)23:18:04 No.101646664

>>101646652
Any video presentation? I don't like reading papers.

Anonymous
07/30/24(Tue)23:22:06 No.101646690

Anonymous 07/30/24(Tue)23:22:06 No.101646690

>>101646664
yeah but ICML copyright claimed it for like 30 days? or something.
https://youtu.be/yBL7J0kgldU
that's the link he posted with working subs. great talk goes over everything even the unpublished 2.2. should be going back up sometime late august. I'll post about it when it does if I remember

Anonymous
07/30/24(Tue)23:35:01 No.101646773

Anonymous 07/30/24(Tue)23:35:01 No.101646773

File: 2.png (398 KB, 720x1141)

398 KB PNG

MoFO: Momentum-Filtered Optimizer for Mitigating Forgetting in LLM Fine-Tuning
https://arxiv.org/abs/2407.20999
>Recently, large language models (LLMs) have demonstrated remarkable capabilities in a wide range of tasks. Typically, an LLM is pre-trained on large corpora and subsequently fine-tuned on task-specific datasets. However, during finetuning, LLMs may forget the knowledge acquired in the pretraining stage, leading to a decline in general capabilities. To address this issue, we propose a new fine-tuning algorithm termed Momentum-Filtered Optimizer (MoFO). The key idea of MoFO is to iteratively select and update the model parameters with the largest momentum magnitudes. Compared to full-parameter training, MoFO achieves similar fine-tuning performance while keeping parameters closer to the pre-trained model, thereby mitigating knowledge forgetting. Unlike most existing methods for forgetting mitigation, MoFO combines the following two advantages. First, MoFO does not require access to pre-training data. This makes MoFO particularly suitable for fine-tuning scenarios where pre-training data is unavailable, such as fine-tuning checkpoint-only open-source LLMs. Second, MoFO does not alter the original loss function. This could avoid impairing the model performance on the fine-tuning tasks. We validate MoFO through rigorous convergence analysis and extensive experiments, demonstrating its superiority over existing methods in mitigating forgetting and enhancing fine-tuning performance.
neat. converges closer to pretrained weights so less forgetting

Anonymous
07/30/24(Tue)23:36:14 No.101646785

Anonymous 07/30/24(Tue)23:36:14 No.101646785

I'm enjoying Llama 70b 3.1

Anonymous
07/30/24(Tue)23:39:19 No.101646802

Anonymous 07/30/24(Tue)23:39:19 No.101646802

>>101646785
I need to try 8b 3.1 again now that there were some fixes.
After I'm done trying nemo finetunes.

Anonymous
07/30/24(Tue)23:39:22 No.101646804

Anonymous 07/30/24(Tue)23:39:22 No.101646804

I have 8 gigs of vRAM. How do I calculate how much context I can use with mistral before it explodes?

Anonymous
07/30/24(Tue)23:40:50 No.101646812

Anonymous 07/30/24(Tue)23:40:50 No.101646812

>>101646785
Buy an ad, zuck

Anonymous
07/30/24(Tue)23:41:12 No.101646814

Anonymous 07/30/24(Tue)23:41:12 No.101646814

>>101646804
I just test it out binary search style after doing the rough adjustments.
Look at vram usage using GPU-Z to make things easier.
Also, remember that FA, quantized cache, and a lower blas batch size all free up VRAM for context and layers.

Anonymous
07/31/24(Wed)00:02:02 No.101646950

Anonymous 07/31/24(Wed)00:02:02 No.101646950

gonna pick up a mac studio for local llm work, is there any reason to spring for the 192gb model over the 128gb? it seems like most models would fit fine in the latter.

Anonymous
07/31/24(Wed)00:06:06 No.101646981

Anonymous 07/31/24(Wed)00:06:06 No.101646981

File: _f3707e8d-2e2b-4679-a2d7-(...).jpg (123 KB, 1024x1024)

123 KB JPG

Do you like listening to anything while you're proompting, anonymous?

Anonymous
07/31/24(Wed)00:15:17 No.101647041

Anonymous 07/31/24(Wed)00:15:17 No.101647041

>>101646785
Then why are people still using miqu?

Anonymous
07/31/24(Wed)00:17:23 No.101647053

Anonymous 07/31/24(Wed)00:17:23 No.101647053

>bought chub mercury after industries sonnet died
>mercury is supposed to be lewd, intelligent and uncensored
>can never get it to do anything actually sexy
>never progresses sex or the story
>constantly repeats itself
How do I make it good

Anonymous
07/31/24(Wed)00:18:19 No.101647059

Anonymous 07/31/24(Wed)00:18:19 No.101647059

>>101647053
>he bought

Anonymous
07/31/24(Wed)00:18:57 No.101647066

Anonymous 07/31/24(Wed)00:18:57 No.101647066

File: 1722399530383403.png (1.16 MB, 680x1069)

1.16 MB PNG

>he buyed anything from lore
KEKYPOWWWW

Anonymous
07/31/24(Wed)00:19:12 No.101647068

Anonymous 07/31/24(Wed)00:19:12 No.101647068

>>101647053
AHAHAHAHHAHHA

Anonymous
07/31/24(Wed)00:19:17 No.101647069

Anonymous 07/31/24(Wed)00:19:17 No.101647069

>>101647059
I was super desperate ;-;
I got REALLY into this RP world and suddenly Claude died
I was in shock

Anonymous
07/31/24(Wed)00:20:16 No.101647075

Anonymous 07/31/24(Wed)00:20:16 No.101647075

>>101647069
nigger in WHAT WORLD would you think that chub's models (for $5/month nonetheless) would be even REMOTELY close to claude??????????????????????

Anonymous
07/31/24(Wed)00:21:23 No.101647084

Anonymous 07/31/24(Wed)00:21:23 No.101647084

>>101647075
A desperate man with nothing to lose can do some stupid things

Anonymous
07/31/24(Wed)00:23:19 No.101647092

Anonymous 07/31/24(Wed)00:23:19 No.101647092

>>101647084
well you lost $5, congrats, that'll be a cheap lesson. if you must know, claude 3.5 sonnet costs $3/$15 for 1M input tokens, so at ~20k context RP you'd be spending $0.075 (yes, 7.5 cents!) PER MESSAGE. $5/month doesn't compare in the slightest. Claude is the best RP model, but it's not cheap as it's a medium-tier flagship model (currently the best in the world actually, just waiting for 3.5 Opus)

Anonymous
07/31/24(Wed)00:23:45 No.101647094

Anonymous 07/31/24(Wed)00:23:45 No.101647094

>>101647084
I'd rather just wait for mistral large at 0.6T/s.

Anonymous
07/31/24(Wed)00:24:54 No.101647105

Anonymous 07/31/24(Wed)00:24:54 No.101647105

>>101647092
well fuck
>want to keep claude
>poorfag
Hal, please come up with a solution to this quandary

Anonymous
07/31/24(Wed)00:24:55 No.101647106

Anonymous 07/31/24(Wed)00:24:55 No.101647106

>>101647053
>be a retarded /aicg/ locust
>proxy dies
>throw money at an absolute shit cloud model
>cry about it in the local models general
???

Anonymous
07/31/24(Wed)00:25:41 No.101647110

Anonymous 07/31/24(Wed)00:25:41 No.101647110

>>101647105
>Hal, please come up with a solution to this quandary
Find a high paying job

Anonymous
07/31/24(Wed)00:27:29 No.101647119

Anonymous 07/31/24(Wed)00:27:29 No.101647119

So is there a local model on par with Claude?

Anonymous
07/31/24(Wed)00:28:23 No.101647125

Anonymous 07/31/24(Wed)00:28:23 No.101647125

>>101647119
yes, llama 3.1 405b

Anonymous
07/31/24(Wed)00:29:53 No.101647133

Anonymous 07/31/24(Wed)00:29:53 No.101647133

id pay for opus if they would let me

Anonymous
07/31/24(Wed)00:29:55 No.101647135

Anonymous 07/31/24(Wed)00:29:55 No.101647135

>>101646981
Nothing but coil whine/buzzing. The sound of magic.

Anonymous
07/31/24(Wed)00:30:33 No.101647137

Anonymous 07/31/24(Wed)00:30:33 No.101647137

>>101647133
openrouter, aws, gcp - they all let you pay

Anonymous
07/31/24(Wed)00:40:39 No.101647215

Anonymous 07/31/24(Wed)00:40:39 No.101647215

My PSU exploded...

Anonymous
07/31/24(Wed)00:42:19 No.101647226

Anonymous 07/31/24(Wed)00:42:19 No.101647226

>>101645878
It's good. But I still like the feel of suno better. They both prompt way differently and I'm more used to how suno prompts I guess.
I made you a song /lmg/ <3
https://suno.com/song/145a1f3c-965b-495d-8cbc-7749b3cf1c6f
Single prompt song.

Anonymous
07/31/24(Wed)00:42:28 No.101647227

Anonymous 07/31/24(Wed)00:42:28 No.101647227

>>101647215
photo or fake

Anonymous
07/31/24(Wed)00:43:19 No.101647232

Anonymous 07/31/24(Wed)00:43:19 No.101647232

>>101647215
my parents' PC PSU burned internally and had terrible chemical smell, it was a random chink psu for shitboxes then I replaced it with a CX550

Anonymous
07/31/24(Wed)00:44:29 No.101647242

Anonymous 07/31/24(Wed)00:44:29 No.101647242

>>101647215
Back in the day an exploding PSU would kill your PC.

Anonymous
07/31/24(Wed)00:45:46 No.101647254

Anonymous 07/31/24(Wed)00:45:46 No.101647254

>>101647242
I remember once I had a PSU go and it took the GPU along with it. big oof.

Anonymous
07/31/24(Wed)00:55:32 No.101647330

Anonymous 07/31/24(Wed)00:55:32 No.101647330

>>101647226
Lovely song, Anon

Anonymous
07/31/24(Wed)01:18:01 No.101647472

Anonymous 07/31/24(Wed)01:18:01 No.101647472

File: 1709648543085.png (284 KB, 512x512)

284 KB PNG

>>101647226
based cat man returns with a banger

Anonymous
07/31/24(Wed)01:23:02 No.101647502

Anonymous 07/31/24(Wed)01:23:02 No.101647502

>>101646981
Mozart, Chopin or Ludwig Van.

Anonymous
07/31/24(Wed)01:24:53 No.101647518

Anonymous 07/31/24(Wed)01:24:53 No.101647518

>>101647226
Masterpiece.

Anonymous
07/31/24(Wed)01:32:02 No.101647576

Anonymous 07/31/24(Wed)01:32:02 No.101647576

I released sunfall on llama 3.1 8b. Please try it.
Also to other model makers: why aren’t you using LimaRP-DS yet? Should I convert it to jsonl?

Anonymous
07/31/24(Wed)01:32:14 No.101647580

Anonymous 07/31/24(Wed)01:32:14 No.101647580

>>101647576
slop

Anonymous
07/31/24(Wed)01:33:29 No.101647586

Anonymous 07/31/24(Wed)01:33:29 No.101647586

>>101647580
What slop? It’s literally been deslopped. It’s even in the name.

Anonymous
07/31/24(Wed)01:37:19 No.101647612

Anonymous 07/31/24(Wed)01:37:19 No.101647612

>>101646690
Oh, well alright. Thanks, I'll watch it then.

Anonymous
07/31/24(Wed)01:54:53 No.101647751

Anonymous 07/31/24(Wed)01:54:53 No.101647751

>>101647576
I'm just not doing much training right now because there's a heat wave here at the moment and my rig is in my bedroom.

Anonymous
07/31/24(Wed)01:58:50 No.101647782

Anonymous 07/31/24(Wed)01:58:50 No.101647782

>>101647751
That's fair. Sauna world here too.

Anonymous
07/31/24(Wed)02:07:46 No.101647844

Anonymous 07/31/24(Wed)02:07:46 No.101647844

>>101647041
I'm still using Miqu almost a year later because all the recent shit is annoyingly aligned. Gemma2 just wants to talk in bulletpoints and overtalk you. Llama3.1 just outputs word salad after 8k at the q2 quant I have no troubles with miqu at. Nemo is too small. Large is too fucking large. Although too small and too large have basically been the whole fucking deal in 2024. I had high hopes for gemma2 since it's 27b but given all the bullshit I'll just stick with miqu until the mistral finally releases a 32b range model.
Really I just want something that'll fit at 16gb at at least 4km but preferably 5km so I have enough room for 32k context. Running these 70bs at q2s just barely squeezing 8k has sucked long enough.
Without fail, every model I've tried, and I've tried hundreds now just fails to meet decent expectations of either being an assistant or roleplayer. Only Claude has impressed me.

Anonymous
07/31/24(Wed)02:11:16 No.101647870

Anonymous 07/31/24(Wed)02:11:16 No.101647870

File: FHeqYmXWYAAdheh.jpg (193 KB, 1080x1080)

193 KB JPG

>>101647844
Couldn't have said it better myself, amen.

Anonymous
07/31/24(Wed)02:12:34 No.101647879

Anonymous 07/31/24(Wed)02:12:34 No.101647879

>>101647844
skill issue

Anonymous
07/31/24(Wed)02:23:33 No.101647950

Anonymous 07/31/24(Wed)02:23:33 No.101647950

File: GRXGUuvXkAE3DQL.png (2.2 MB, 1948x2757)

2.2 MB PNG

is nemo supposed to work on ooba or koboldcpp? i've been trying to load it in both for a while and not having much luck

Anonymous
07/31/24(Wed)02:24:23 No.101647962

Anonymous 07/31/24(Wed)02:24:23 No.101647962

>>101647879
I think so, unless I have a skill issue with miqu, I actually got good results with nemo.

Anonymous
07/31/24(Wed)02:25:16 No.101647971

Anonymous 07/31/24(Wed)02:25:16 No.101647971

>>101647844
>Only Claude has impressed me.
well just use claude then, retard

Anonymous
07/31/24(Wed)02:25:53 No.101647974

Anonymous 07/31/24(Wed)02:25:53 No.101647974

>>101647971
money doko..

Anonymous
07/31/24(Wed)02:32:28 No.101648009

Anonymous 07/31/24(Wed)02:32:28 No.101648009

>>101647950
99% broken quant, use bart's https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF

Anonymous
07/31/24(Wed)02:45:25 No.101648082

Anonymous 07/31/24(Wed)02:45:25 No.101648082

>>101647844
>Gemma2 just wants to talk in bulletpoints and overtalk you.
Skill issue. Also if you've used them enough you will notice Claude, Gemma and GPT4o all have the same intelligence, but Claude and GPT4o have more knowledge, so what we're getting is essentially compressed versions of those two.

Anonymous
07/31/24(Wed)02:46:18 No.101648088

Anonymous 07/31/24(Wed)02:46:18 No.101648088

>>101648082
>Claude, Gemma and GPT4o all have the same intelligence
LMAOOOOOOOOOOOOOOOOOOOOO

Anonymous
07/31/24(Wed)02:48:39 No.101648099

Anonymous 07/31/24(Wed)02:48:39 No.101648099

>>101648082
>Claude, Gemma and GPT4o all have the same intelligence
don't take the bait fucking please

Anonymous
07/31/24(Wed)02:48:57 No.101648100

Anonymous 07/31/24(Wed)02:48:57 No.101648100

>>101646197
https://huggingface.co/datasets/lmg-anon/vntl-leaderboard

Anonymous
07/31/24(Wed)02:52:52 No.101648112

Anonymous 07/31/24(Wed)02:52:52 No.101648112

>>101648009
thanks, which one have you used specifically? I'll get the exact same one just for testing purposes. So far i've tried dolphin at Q6 and the base at Q8 and neither are working

Anonymous
07/31/24(Wed)02:53:39 No.101648120

Anonymous 07/31/24(Wed)02:53:39 No.101648120

>>101647844
can u share logs miqu vs opus

Anonymous
07/31/24(Wed)03:00:37 No.101648170

Anonymous 07/31/24(Wed)03:00:37 No.101648170

File: file.png (158 KB, 734x801)

158 KB PNG

>>101648112
>which one have you used specifically
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/resolve/main/Mistral-Nemo-Instruct-2407-Q6_K.gguf?download=true
this works 100% in kobold 1.71.1

Anonymous
07/31/24(Wed)03:07:11 No.101648216

Anonymous 07/31/24(Wed)03:07:11 No.101648216

>>101648088
>>101648099
Have you tried giving them tasks they all know? You basically get same responses verbatim. 4o mini pretty much confirms what I'm saying to a large extent.

Anonymous
07/31/24(Wed)03:08:41 No.101648227

Anonymous 07/31/24(Wed)03:08:41 No.101648227

>>101648216
nah dude, Claude 3.5 just gets what I want from it, gpt4-o it can happen but less, Claude is nowhere their level you're tripping

Anonymous
07/31/24(Wed)03:10:46 No.101648241

Anonymous 07/31/24(Wed)03:10:46 No.101648241

The mistral nemo template seems to suggest you should put system message last, but Silly doesn't seem to be doing this. What am I missing? I could have sworn this exact convo happened here recently.

Anonymous
07/31/24(Wed)03:13:04 No.101648260

Anonymous 07/31/24(Wed)03:13:04 No.101648260

>>101648241
Maybe try putting something in last_assistant_prefix?

Anonymous
07/31/24(Wed)03:13:12 No.101648261

Anonymous 07/31/24(Wed)03:13:12 No.101648261

>>101648227
And how do you know Gemma is not just undertrained or too small in that specific task? More than likely the higher quality data used to trained Claude is paying off.

Anonymous
07/31/24(Wed)03:15:07 No.101648277

Anonymous 07/31/24(Wed)03:15:07 No.101648277

>>101648261
>And how do you know Gemma is not just undertrained or too small in that specific task?
why should I care? it's up to google to up their level to claude, I'm just playing around with the bests, I'm not here to do charity

Anonymous
07/31/24(Wed)03:16:44 No.101648290

Anonymous 07/31/24(Wed)03:16:44 No.101648290

>>101645719
Largestral is 123B dense, need more than 2 to run it well

Anonymous
07/31/24(Wed)03:16:56 No.101648293

Anonymous 07/31/24(Wed)03:16:56 No.101648293

>>101648277
i'm not sure why you're here at all, except the obvious reason of you being a moron

Anonymous
07/31/24(Wed)03:17:43 No.101648298

Anonymous 07/31/24(Wed)03:17:43 No.101648298

>>101648260
Right. People were throwing this
https://files.catbox.moe/6ae9ht.json
https://files.catbox.moe/2f13of.json
around and it too does not seem to be doing the system-last thing, so I'm curious if people are actually doing it or not.

Anonymous
07/31/24(Wed)03:17:46 No.101648300

Anonymous 07/31/24(Wed)03:17:46 No.101648300

>>101648170
thanks, downloading it now.

Anonymous
07/31/24(Wed)03:18:55 No.101648310

Anonymous 07/31/24(Wed)03:18:55 No.101648310

>>101648298 (me)
Or I guess "last_output_sequence" is this, now that I look closer. Huh.

Anonymous
07/31/24(Wed)03:19:13 No.101648311

Anonymous 07/31/24(Wed)03:19:13 No.101648311

>>101648293
why am I here? because I'm rooting for local, but I'm not coping like you, pretending that gemma is at the level at C3.5, that's insane to believe something like that

Anonymous
07/31/24(Wed)03:19:26 No.101648314

Anonymous 07/31/24(Wed)03:19:26 No.101648314

>>101648277
Like I said, you haven't conducted enough tests. For most of what I make it write they are the same. The few cases Claude is better is when I need a particular topic that requires vast knowledge.

Anonymous
07/31/24(Wed)03:19:29 No.101648315

Anonymous 07/31/24(Wed)03:19:29 No.101648315

>>101648298
Well, I'm not. I just use the most basic things. Just the mistral presets with the spacing fixed.

Anonymous
07/31/24(Wed)03:20:26 No.101648320

Anonymous 07/31/24(Wed)03:20:26 No.101648320

>>101647576
yes please, I fucking hate the limarp build shit, a jsonl would make everything easier

Anonymous
07/31/24(Wed)03:23:11 No.101648334

Anonymous 07/31/24(Wed)03:23:11 No.101648334

>>101648311
i wasn't the guy coping, but there's not much point complaining in here that you want claude running locally when there's nothing any of us can do to help you about that. there's 20 posts like this in every thread, and forgive me if my assumption is wrong that they're all made by retards

Anonymous
07/31/24(Wed)03:27:06 No.101648356

Anonymous 07/31/24(Wed)03:27:06 No.101648356

>>101643652
can you test in a similar scenario with llama.cpp?

Like, offloading the same amount of layers, with the same size quant.

I think that's where the real test is. If it's better than llama.cpp+gguf

I'll do this test myself but I currently have my gpu's busy doing benchmarks

hey avatarfag
07/31/24(Wed)03:29:28 No.101648382

hey avatarfag 07/31/24(Wed)03:29:28 No.101648382

File: fuck you.png (1.03 MB, 768x768)

1.03 MB PNG

https://fyrean.itch.io/matxinh

Anonymous
07/31/24(Wed)03:29:58 No.101648386

Anonymous 07/31/24(Wed)03:29:58 No.101648386

>>101648320
OK. Will restore original char names and just do
{"actor":"name","content":"text"}
{"actor":"name"...}
in the next update.

Anonymous
07/31/24(Wed)03:31:16 No.101648394

Anonymous 07/31/24(Wed)03:31:16 No.101648394

>>101648311
I just pretend only local exists.

Anonymous
07/31/24(Wed)03:34:17 No.101648416

Anonymous 07/31/24(Wed)03:34:17 No.101648416

>>101648394
That's stupid. Use claude for shit you don't care leaks and use miqu for shit you want to keep locked down. That simple. Except you're fucked if miqu can't handle your private shit for you. For now.

Anonymous
07/31/24(Wed)03:35:04 No.101648420

Anonymous 07/31/24(Wed)03:35:04 No.101648420

>>101646003
isn't like vLLM the gold-standard for an inference engine?

Any test to reproduce this?

Anonymous
07/31/24(Wed)03:37:46 No.101648433

Anonymous 07/31/24(Wed)03:37:46 No.101648433

>>101648416
>he doesn't know

Anonymous
07/31/24(Wed)03:38:35 No.101648441

Anonymous 07/31/24(Wed)03:38:35 No.101648441

>>101648386
Thanks, that's very appreciated.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)03:40:31 No.101648450

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)03:40:31 No.101648450

>>101645367
Will depend on the use case.
The code specific to the quantization format has just been copy-pasted from llama.cpp and minimally edited to fit vLLM.
For small contexts, a single user, and a single GPU I don't expect it to be faster because the code in the PR is missing some more recent optimizations.
Presumably since vLLM uses PyTorch there are other optimizations unrelated to the quantization format but I don't think they would be enough to offset the comparatively older code.
For large contexts vLLM could very well be faster since they to my knowledge use the original FlashAttention code which I think is still faster (for prompt processing) than the llama.cpp implementation.
For multiple GPUs or multiple users vLLM could also very well be faster since those use cases are poorly optimized in llama.cpp.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)03:44:41 No.101648477

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)03:44:41 No.101648477

>>101646003
>I observed weird nondeterministic behavior when I used vllm
If I had to guess this is due to atomic adds in the original FlashAttention implementation.
It has an option for deterministic results but that makes it slower.

Anonymous
07/31/24(Wed)03:54:25 No.101648566

Anonymous 07/31/24(Wed)03:54:25 No.101648566

>>101648416
>miqu can't handle
Yeah, I'd prefer it to be a little more reserved.

Anonymous
07/31/24(Wed)03:57:20 No.101648584

Anonymous 07/31/24(Wed)03:57:20 No.101648584

>>101648477
NTA, would that explain the thing I've observed when using llamacpp in ooba with deterministic sampling settings, where the first gen will be something non-deterministic and then all subsequent regenerations will be correctly identical?

Anonymous
07/31/24(Wed)03:58:54 No.101648598

Anonymous 07/31/24(Wed)03:58:54 No.101648598

>>101648382
now that is some actual ugly face...

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)04:00:15 No.101648610

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)04:00:15 No.101648610

>>101648584
No, I think that is due to prompt caching.
On the first run the first token is generated with a batch size >1, in subsequent runs the first token is generated with batch size 1.
And the results will be slightly different depending on batch size.

Anonymous
07/31/24(Wed)04:03:47 No.101648639

Anonymous 07/31/24(Wed)04:03:47 No.101648639

>>101648610
ahh, thanks for clearing that up

Anonymous
07/31/24(Wed)04:06:27 No.101648652

Anonymous 07/31/24(Wed)04:06:27 No.101648652

>>101645414
>Local has been better than whatever garbage that is for quite a while
And yet you'll never name a single one that holds conversations as well as it.

That's why you just have fags like you say
>local models do it better
Without ever naming the local model that does.

Even Mistral Large is nowhere close

Anonymous
07/31/24(Wed)04:09:49 No.101648682

Anonymous 07/31/24(Wed)04:09:49 No.101648682

>>101648652
Maybe local is too difficult for you, anon. Don't worry, it will get easier to use with time.

Anonymous
07/31/24(Wed)04:12:57 No.101648709

Anonymous 07/31/24(Wed)04:12:57 No.101648709

>>101648450
>Will depend on the use case.
Holy shit is that a Emmanuele Bassi reference?!

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)04:14:00 No.101648714

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)04:14:00 No.101648714

>>101648709
I have no idea who that is.

Anonymous
07/31/24(Wed)04:41:02 No.101648880

Anonymous 07/31/24(Wed)04:41:02 No.101648880

>>101648652
There is a real simple reason why this is. It's because there isn't anything good. They're just trolling you. If anything was hot shit then everyone would be shilling it, here, reddit, and on huggingface's trending. Nothing is. It's just FOTM garbage that underperforms. Always.

Anonymous
07/31/24(Wed)04:41:29 No.101648886

Anonymous 07/31/24(Wed)04:41:29 No.101648886

File: 1372665917207.png (15 KB, 400x400)

15 KB PNG

>>101648682
>still hasn't named a single model

It's ok anon, I would also cope if some shitty free website had a model that mogged literally everything localfags have been swearing by for years now.

Anonymous
07/31/24(Wed)04:49:28 No.101648934

Anonymous 07/31/24(Wed)04:49:28 No.101648934

What's the best way to format a character card for use in local?

Anonymous
07/31/24(Wed)05:07:30 No.101649046

Anonymous 07/31/24(Wed)05:07:30 No.101649046

>>101648610
how to achieve 100% (or as close as possible) deterministic inference in llama.cpp backend? Which factors apart from sampling /temp etc. affect the determinism in llama server and the most popular middlewares that uses llama.cpp as a backed?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)05:13:48 No.101649089

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)05:13:48 No.101649089

>>101649046
The llama.cpp HTTP server should be 100% deterministic by default.
Issues can only arise if prompt caching is explicitly enabled or if --n-parallel is set to a value >1.

Anonymous
07/31/24(Wed)05:21:36 No.101649141

Anonymous 07/31/24(Wed)05:21:36 No.101649141

>>101649089
do you mean the -np or --parallel ?

damn I didn't know about this. I'm running right know the MMLU-pro benchmark using the https://github.com/chigkim/Ollama-MMLU-Pro and changed the parallel to 10 to increase the speed.

Do you know if this is the case for the rest of the engines? like vllm. I also this the test for the unquantized benchmark using vllm using its oai api endpoint.

What is the nature of not being deterministic when --parallel >1?

Anonymous
07/31/24(Wed)05:27:51 No.101649207

Anonymous 07/31/24(Wed)05:27:51 No.101649207

>>101649089
why/how both parallel and prompt caching make output non-deterministic? what gives? Could you explain the mechanism?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)05:32:40 No.101649256

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)05:32:40 No.101649256

>>101649141
>>101649207
>do you mean the -np or --parallel ?
Yes.

>I'm running right know the MMLU-pro benchmark
If the benchmark code is written competently it will output not only a score but also an uncertainty for the score.
Even with nondeterministic behavior the combination of score and uncertainty should be valid: two results that are the different or the same within uncertainties should still be different or the same with nondeterminism.

>Do you know if this is the case for the rest of the engines? like vllm.
Isn't vLLM already nondeterministic in the first place?

>What is the nature of not being deterministic when --parallel >1?
(With CUDA) the results of matrix multiplications and FlashAttention are not bit-for-bit identical if you vary the batch size.
And with continuous batching the batch size for a specific model evaluation depends on how exactly the requests arrive which is a bit random.
Also the position of tokens in the llama.cpp unified KV cache will be slightly different depending on how requests arrive which will lead to slightly different results.

Anonymous
07/31/24(Wed)05:32:43 No.101649257

Anonymous 07/31/24(Wed)05:32:43 No.101649257

File: 1722418264430.gif (766 KB, 211x252)

766 KB GIF

SAO
GET OFF YOUR ASS AND MAKE 3.1 EURYALE
NOW NOW NOW

Anonymous
07/31/24(Wed)05:37:11 No.101649297

Anonymous 07/31/24(Wed)05:37:11 No.101649297

>>101649257
first donate to him

Anonymous
07/31/24(Wed)05:37:15 No.101649298

Anonymous 07/31/24(Wed)05:37:15 No.101649298

Any macfags tried MLX server as backend for Tavern? Do the presets and everything else work? Is it faster?

Anonymous
07/31/24(Wed)05:40:32 No.101649337

Anonymous 07/31/24(Wed)05:40:32 No.101649337

File: 1972418769306.jpg (2.19 MB, 1080x6542)

2.19 MB JPG

>>101649257
now

Anonymous
07/31/24(Wed)05:42:03 No.101649349

Anonymous 07/31/24(Wed)05:42:03 No.101649349

>>101649256
>Isn't vLLM already nondeterministic in the first place?
I'm 100% sure lots of researchers don't realize that's the case.
What's the reason vllm is non-deterministic to begin with , is atomic add the only reason?

Anonymous
07/31/24(Wed)05:44:08 No.101649371

Anonymous 07/31/24(Wed)05:44:08 No.101649371

>>101649337
what garbage model is that

Anonymous
07/31/24(Wed)05:44:38 No.101649373

Anonymous 07/31/24(Wed)05:44:38 No.101649373

File: file.png (58 KB, 801x598)

58 KB PNG

>>101649257

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)05:46:48 No.101649389

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)05:46:48 No.101649389

>>101649349
As I said, I THINK it's nondeterministic since the original FlashAttention implementation is if you run it with maximum performance but I did not actually confirm this.

>What's the reason vllm is non-deterministic to begin with , is atomic add the only reason?
It's the main reason since it allows you to get better performance.
But requests arriving in a non-defined order could also cause nondeterministic behavior.
(Or just bugs but that is a given.)

Anonymous
07/31/24(Wed)05:47:44 No.101649395

Anonymous 07/31/24(Wed)05:47:44 No.101649395

>>101649371
Dunno some retard left his ST open and the link and logs got posted in aicg

Anonymous
07/31/24(Wed)05:48:27 No.101649405

Anonymous 07/31/24(Wed)05:48:27 No.101649405

>>101649256
is rpc server deterministic in llama.cpp provided both parallel and prompt cache is disabled?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)05:48:51 No.101649407

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)05:48:51 No.101649407

>>101649405
Don't know.

Anonymous
07/31/24(Wed)05:48:54 No.101649408

Anonymous 07/31/24(Wed)05:48:54 No.101649408

>>101649349
>I'm 100% sure lots of researchers don't realize that's the case.

I think that's the case. I think is regarded as the gold standard but it's used for most of the benchmark.

Being 99% deterministic is completely fine for normal use, even for coding.
But the one thing where you want to be 100% is benchmarks.

Anonymous
07/31/24(Wed)06:00:03 No.101649504

Anonymous 07/31/24(Wed)06:00:03 No.101649504

What causes models to keep using the same sentences/paragraphs from previous posts?

Anonymous
07/31/24(Wed)06:03:35 No.101649531

Anonymous 07/31/24(Wed)06:03:35 No.101649531

>>101649504
being from mistral

Anonymous
07/31/24(Wed)06:09:14 No.101649587

Anonymous 07/31/24(Wed)06:09:14 No.101649587

>>101649531
It's happened with other models though. It seems to be independent of samplers, as well. The only possibility seems to be a problem with Kobold itself, since nothing else that is changed prevents it.

Anonymous
07/31/24(Wed)06:12:28 No.101649614

Anonymous 07/31/24(Wed)06:12:28 No.101649614

>>101649504
It's just how they are. They pick up on patterns from the context and repeat them. Different models to different degrees. Maybe increase temperature a little. What works the best, for me at least, is just giving it something to work with. Boring input, boring output.

Anonymous
07/31/24(Wed)06:14:20 No.101649638

Anonymous 07/31/24(Wed)06:14:20 No.101649638

>>101649408
>>101649389
IMHO This needs to blow up. A bunch of researchers are running benchmarks and posting results without realizing their engines are giving nondeterministic outputs. Then they're comparing these wonky results to other equally wonky stuff in fancy tables. Barely any of them bother to run the same benchmark multiple times and average it out. It's kinda ridiculous desu

Anonymous
07/31/24(Wed)06:18:37 No.101649677

Anonymous 07/31/24(Wed)06:18:37 No.101649677

>>101643089
Not really news but I've released my neural text to speech library (babylon.cpp) for anyone that wants to play around with it.

It's not a new model or anything its basically just a rewrite of piper (VITS) which uses a different phonemizer (DeepPhonemizer), and unlike Piper it actually compiles without issue.

https://github.com/Mobile-Artificial-Intelligence/babylon.cpp

>t.dane

Anonymous
07/31/24(Wed)06:18:37 No.101649678

Anonymous 07/31/24(Wed)06:18:37 No.101649678

File: Screenshot 2024-07-31 at (...).png (280 KB, 1940x956)

280 KB PNG

>>101649638
I agree. Check this https://github.com/TIGER-AI-Lab/MMLU-Pro/issues/10

Between the changes in the tokenizers, chat templates, and non-deterministic engines. Benchmarks are all over the place.

I haven't seen a detail methodology for the benchmarks the researchers do. So many variables to control and I don't think they are at all. They just post a table.

Even mistral posted a blog post with different 405B results (paper and measured)
https://mistral.ai/news/mistral-large-2407/

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)06:20:34 No.101649695

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)06:20:34 No.101649695

>>101649638
As I said, if the benchmarking software was written competently it should give you not just a result but also an uncertainty.
Then you would see something like model A with 50+-2, model B with 51+-2, and model C with 60+-3.
In this case you would be able to tell that there is no statistically significant between A and B but that C is significantly better.
Uncertainties should be provided regardless of whether or not the benchmarking code is deterministic anyways because differences in benchmark results are not always statistically significant and in the large sample limit where they are the effect of nondeterministic code should be negligible.

Anonymous
07/31/24(Wed)06:22:58 No.101649718

Anonymous 07/31/24(Wed)06:22:58 No.101649718

>>101644824
>*Bruno says sarcastically*
>hey, look, the model understands sarcasm!
I'm actively losing brain cells in this general

Anonymous
07/31/24(Wed)06:23:20 No.101649723

Anonymous 07/31/24(Wed)06:23:20 No.101649723

lumimaid is not very good

Anonymous
07/31/24(Wed)06:31:19 No.101649803

Anonymous 07/31/24(Wed)06:31:19 No.101649803

File: graph.png (7 KB, 502x397)

7 KB PNG

>>101649504

Anonymous
07/31/24(Wed)06:39:39 No.101649889

Anonymous 07/31/24(Wed)06:39:39 No.101649889

>>101649614
>Boring input, boring output.
Doll play enjoyer.

Anonymous
07/31/24(Wed)06:40:39 No.101649895

Anonymous 07/31/24(Wed)06:40:39 No.101649895

so mistral large has been working amazingly, any merge or new models that have emerged recently from the big update dump?

Anonymous
07/31/24(Wed)06:40:50 No.101649897

Anonymous 07/31/24(Wed)06:40:50 No.101649897

When are we gonna get an uncensored llama 3.1 405b on openrouter?

Anonymous
07/31/24(Wed)06:44:13 No.101649937

Anonymous 07/31/24(Wed)06:44:13 No.101649937

>>101649723
Nemo based models don't work for Kobold yet. Something about custom tokenizers.

Anonymous
07/31/24(Wed)06:45:22 No.101649949

Anonymous 07/31/24(Wed)06:45:22 No.101649949

>>101649897
You aren't. The word is that 3.1 is the most zogged model ever.

Anonymous
07/31/24(Wed)06:45:58 No.101649952

Anonymous 07/31/24(Wed)06:45:58 No.101649952

>>101643089
this seems a new drama. Benchmarks results are rigged. reeee
>>101649408
>>101649678
>>101649638

Anonymous
07/31/24(Wed)06:50:32 No.101649993

Anonymous 07/31/24(Wed)06:50:32 No.101649993

>>101649614
>Boring input, boring output.
Not all models are like that though, I could write one liners and some models would still keep up the length of their responses to 2-3 paragraphs and keep it engaging.
Not Nemo though, it would instantly mimic it the writing style and respond with 1-2 sentences max.

Anonymous
07/31/24(Wed)06:51:40 No.101650012

Anonymous 07/31/24(Wed)06:51:40 No.101650012

>>101649937
I will need a source for that statement.

Anonymous
07/31/24(Wed)06:53:13 No.101650027

Anonymous 07/31/24(Wed)06:53:13 No.101650027

>>101649952
>benchmarks are memes and don't mean anything
Tell us something we didn't know for 3 years already.

Anonymous
07/31/24(Wed)06:53:54 No.101650038

Anonymous 07/31/24(Wed)06:53:54 No.101650038

>>101649937
why keep spreading misinfo? it's been working fine since 1.71 some quants on hf are fucked tho, as usual

Anonymous
07/31/24(Wed)06:54:56 No.101650049

Anonymous 07/31/24(Wed)06:54:56 No.101650049

>>101650012
>Merged fixes and improvements from upstream, including Mistral Nemo support.
https://github.com/LostRuins/koboldcpp/releases/tag/v1.71.1
working quants:
https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF

Anonymous
07/31/24(Wed)07:00:17 No.101650107

Anonymous 07/31/24(Wed)07:00:17 No.101650107

>>101649993
>Not all models are like that though... some...
As i said. Different models to different degrees.
>Not Nemo though, it would instantly mimic it the writing style and respond with 1-2 sentences max.
Not quite the same problem from your other post.
>What causes models to keep using the same sentences/paragraphs from previous posts?
Repeating sentences from previous outputs, at least for me, happens when i don't give it enough to do. There's only so much it can say if the situation stays the same.
Responding in short sentences when you input short sentences i consider it a bonus. If i ask a person "how's it going?" i don't expect their life story, and i'd be annoyed if i got that. I expect the same from llms. Maybe nemo is just not for you.

Anonymous
07/31/24(Wed)07:00:45 No.101650115

Anonymous 07/31/24(Wed)07:00:45 No.101650115

>>101649937
its llama 3.1

Anonymous
07/31/24(Wed)07:00:49 No.101650117

Anonymous 07/31/24(Wed)07:00:49 No.101650117

>>101650049
you just posted the link saying that kobold supports nemo, I don't know why it wouldn't be the case for finetunes since they are on the same architecture

Anonymous
07/31/24(Wed)07:02:11 No.101650130

Anonymous 07/31/24(Wed)07:02:11 No.101650130

>>101649695
yes sir but that's on condition your benchmarks are spitting out standard deviation values. Not saying ML research needs to follow the 5-sigma rule, but if you don't even realize that the final results your MLLU or Ruler reports aren't deterministic to begin with, you can't tell if they're significant or not.
If your GPU overheats and starts flipping bits, or some zooming neutron from Andromeda smacks into it, your software won't know shit's gone sideways until you run a few trials. Like, how's MMLU supposed to know something's fucked up somewhere in the LLM engine or GPU matrix accelerator?

Anonymous
07/31/24(Wed)07:02:29 No.101650134

Anonymous 07/31/24(Wed)07:02:29 No.101650134

>>101650117
correct was refuting other anon's incorrect info can also anecdotally confirm "Lumimaid-v0.2-12B-Q6_K" runs fine on 1.71.1 which I think I got from bart as well

Anonymous
07/31/24(Wed)07:03:57 No.101650143

Anonymous 07/31/24(Wed)07:03:57 No.101650143

What are your thoughts on some people thinking neuroscience is important for developing AI?

Anonymous
07/31/24(Wed)07:06:15 No.101650167

Anonymous 07/31/24(Wed)07:06:15 No.101650167

>>101650027
what can we use to know what model would be better for a particular task? Should we just develop our own set of test and manually review each response? seems very time consuming. At least benchmarks gives us an idea where to start and what to choose

Anonymous
07/31/24(Wed)07:07:14 No.101650178

Anonymous 07/31/24(Wed)07:07:14 No.101650178

File: Peek.jpg (48 KB, 507x774)

48 KB JPG

I am hosting a model on the horde, koboldcpp used to display the prompts but now it doesn't.
How do I voyeur?

Anonymous
07/31/24(Wed)07:07:49 No.101650189

Anonymous 07/31/24(Wed)07:07:49 No.101650189

>>101650178
You modify the code.

Anonymous
07/31/24(Wed)07:09:10 No.101650208

Anonymous 07/31/24(Wed)07:09:10 No.101650208

>>101650189
How?
t. codelet

Anonymous
07/31/24(Wed)07:13:25 No.101650249

Anonymous 07/31/24(Wed)07:13:25 No.101650249

>>101650107
>As i said. Different models to different degrees.
Yeah, but Mistral models are known for having heavy issues with repetitions.
>Not quite the same problem from your other post.
I'm not that anon you were responding to anyway but it's the same problem, just manifesting in different form. The repetition problem comes from picking the patterns like you said, it may repeat words, entire sentences or even general feeling of responses, their length or layout. Autistically latching to every single thing is not a good thing in a model.
>If i ask a person "how's it going?" i don't expect their life story, and i'd be annoyed if i got that. I expect the same from llms
I agree but that's not Nemo. A good model would be flexible and respond accordingly. When I said 2-3 paragraphs I didn't mean giving 2-3 paragraphs of dialogue response for something trivial, but always adding something for you to work with or pushing the plot/situation forward. Nemo just copies patterns regardless if they make sense.
>Maybe nemo is just not for you.
I find Nemo sovful and interesting but the downsides are too irritating and severe to ignore. It has a big potential but it must be fixed, maybe by finetuning.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)07:18:40 No.101650304

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)07:18:40 No.101650304

>>101650130
As I said, uncertainties should be calculated and reported EVEN IF THE SOFTWARE IS DETERMINISTIC.
The prompts in a benchmark are essentially a random sample from the distribution of prompts that a human would come up with and the score is supposed to approximate the generalized model performance on this distribution.
Calculating an uncertainty for the score should be the lowest bar to meet for serious scientific work.
The correct thing to do would be to calculate the full covariance between models and use that to determine whether there are statistically significant differences.

>Not saying ML research needs to follow the 5-sigma rule, but if you don't even realize that the final results your MLLU or Ruler reports aren't deterministic to begin with, you can't tell if they're significant or not.
You can expect differences from nondeterminism to be uncorrelated so the effect on the final score with n samples will scale with 1/sqrt(n) .
This is the same scaling that you will get from the benchmark prompts effectively being a random sample.
The only effect of nondeterminism is that the uncertainties on your results will be larger and that the interpretation of the uncertainty would be slightly different to include the randomness from the software.
Nondeterminism doesn't fundamentally break a benchmark, it only weakens its power to separate model performance given a fixed amount of input data.

Anonymous
07/31/24(Wed)07:19:58 No.101650313

Anonymous 07/31/24(Wed)07:19:58 No.101650313

>>101649504
Lack of DRY penalty

Anonymous
07/31/24(Wed)07:20:08 No.101650314

Anonymous 07/31/24(Wed)07:20:08 No.101650314

In sillytavern terms, what does this mean? Where do I put the [...] or what

Anonymous
07/31/24(Wed)07:23:22 No.101650340

Anonymous 07/31/24(Wed)07:23:22 No.101650340

>>101650314
>anon.exe has stopped working

Anonymous
07/31/24(Wed)07:26:03 No.101650369

Anonymous 07/31/24(Wed)07:26:03 No.101650369

>>101650340
It's Altman's GPT4 bot having an issue. Maybe he switched it to 4o-mini and it doesn't handle the prompt as well as the old version did.

Anonymous
07/31/24(Wed)07:30:03 No.101650411

Anonymous 07/31/24(Wed)07:30:03 No.101650411

>>101650314
In the context of Silly Tavern, a popular online role-playing game (RPG) platform, square brackets `[...]` are used to denote actions or descriptions that are happening in the scene. These actions or descriptions are typically written in italics for visual distinction. Here's how you can use `[...]` in Silly Tavern:

1. **Actions**: Use `[...]` to describe what your character is doing. For example:
```
*John walks over to the bar and orders a drink.*
[John walks over to the bar and orders a drink.]
```

2. **Emotions or Reactions**: You can also use `[...]` to show your character's emotional state or reaction to something. For example:
```
*Mary sees the monster and screams in terror.*
[Mary sees the monster and screams in terror.]
```

3. **Internal Thoughts**: Although less common, you can use `[...]` to represent your character's internal thoughts or dialogue. For example:
```
*I wonder if I should try to fight the dragon or run away.*
[I wonder if I should try to fight the dragon or run away.]
```

Remember, the use of `[...]` is optional, and many players prefer to use italics (`*...*`) or quotation marks (`"..."`) for actions and descriptions. The key is to be consistent with your formatting choice throughout your posts to make it easier for other players and the game moderators to understand the narrative.

Anonymous
07/31/24(Wed)07:37:11 No.101650472

Anonymous 07/31/24(Wed)07:37:11 No.101650472

File: hahaha.jpg (8 KB, 226x223)

8 KB JPG

>>101650304
>As I said, uncertainties should be calculated and >reported EVEN IF THE SOFTWARE IS >DETERMINISTIC.
100% agree , but that's not the case in LLM,
and that's the issue., You see this >>101649678
not only they ain't calculate the uncertainty, but most likely , they have no clue there's any uncertainty in their shit at all. kek

Anonymous
07/31/24(Wed)07:41:26 No.101650511

Anonymous 07/31/24(Wed)07:41:26 No.101650511

File: file.png (12 KB, 344x417)

12 KB PNG

>>101650314
>>101650340
>>101650411
The fuck? The image didn't go through.

Anonymous
07/31/24(Wed)07:44:05 No.101650547

Anonymous 07/31/24(Wed)07:44:05 No.101650547

File: 1698161493053949.jpg (939 KB, 1920x1080)

939 KB JPG

>>101650511
That's your question?

Anonymous
07/31/24(Wed)07:45:05 No.101650557

Anonymous 07/31/24(Wed)07:45:05 No.101650557

>>101650547
Answer or don't, virgin computer toucher. No one cares about the theatrics you use to try and give your life meaning.

Anonymous
07/31/24(Wed)07:46:45 No.101650571

Anonymous 07/31/24(Wed)07:46:45 No.101650571

Why do youtube videos still have shitty subtitles when just putting it into an LLM fixes it basically? Why hasn't someone fixed this?

Anonymous
07/31/24(Wed)07:48:03 No.101650585

Anonymous 07/31/24(Wed)07:48:03 No.101650585

>>101649046
>>101649089
>>101649141
>>101649256
>>101649349
>>101649389
>>101649638
>>101649678
>>101650472
>not deterministic
why does it matter retard

Anonymous
07/31/24(Wed)07:48:38 No.101650591

Anonymous 07/31/24(Wed)07:48:38 No.101650591

File: wow hard.png (18 KB, 344x176)

18 KB PNG

>>101650511

Anonymous
07/31/24(Wed)07:51:01 No.101650612

Anonymous 07/31/24(Wed)07:51:01 No.101650612

File: FnykbnMWYAABHDC.jpg (37 KB, 525x619)

37 KB JPG

>>101650557
>*sniff* Answer or don't, *sniff* virgin computer toucher. *sniff* No one cares about the theatrics you use to try and give your life meaning. *sniff*

Anonymous
07/31/24(Wed)07:51:26 No.101650617

Anonymous 07/31/24(Wed)07:51:26 No.101650617

>>101650314
>Where do I put the [...]
>>101650511
holy brain damage can't understand placeholders

Anonymous
07/31/24(Wed)07:52:38 No.101650628

Anonymous 07/31/24(Wed)07:52:38 No.101650628

oh god, don't tell me the retarded anon that is mistaking non-deterministic with non-symbolic is still here

Anonymous
07/31/24(Wed)07:56:16 No.101650667

Anonymous 07/31/24(Wed)07:56:16 No.101650667

>>101650591
>>101650612
>I'm retarded, don't know, and I'm trying to save face
I see. You dropped out of kindergarten and missed the alpaca-LIKE rather than alpaca, as well as the </s> suffix not included in the preset in your picture. Easy mistake to make, anon. I mean, if you are a spastic retard whose tongue is too big to fit your mouth. Better luck next time.

Anonymous
07/31/24(Wed)07:56:34 No.101650671

Anonymous 07/31/24(Wed)07:56:34 No.101650671

>>101650585
because for a benchmark you want repeatable and measurable results. That's the nature of a benchmark

Anonymous
07/31/24(Wed)07:57:16 No.101650675

Anonymous 07/31/24(Wed)07:57:16 No.101650675

>>101650304
>EVEN IF THE SOFTWARE IS DETERMINISTIC.
There's no need to yell.

Anonymous
07/31/24(Wed)07:59:04 No.101650694

Anonymous 07/31/24(Wed)07:59:04 No.101650694

>>101650667
keep digging i'm sure you'll find out anon we're coutning on u!

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)08:00:12 No.101650710

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)08:00:12 No.101650710

>>101650675
How else am I supposed to emphasize part of the sentence in the absence of cursive or bold text?

Anonymous
07/31/24(Wed)08:02:51 No.101650736

Anonymous 07/31/24(Wed)08:02:51 No.101650736

>>101650710
(((text here)))

Anonymous
07/31/24(Wed)08:03:01 No.101650737

Anonymous 07/31/24(Wed)08:03:01 No.101650737

File: openwiiiiide.png (52 KB, 524x458)

52 KB PNG

>>101650667

Anonymous
07/31/24(Wed)08:08:05 No.101650787

Anonymous 07/31/24(Wed)08:08:05 No.101650787

>>101650585
because if you niggers compare apples to apples, you'd better be sure they're all apples, and not bananas.
you don't wanna your accountant to use broken calculator to calc you taxes, but you probly don't pay any, so just be sure your food stamps ain't random and your free bananas are all fresh, nigger.

Anonymous
07/31/24(Wed)08:10:37 No.101650812

Anonymous 07/31/24(Wed)08:10:37 No.101650812

>>101649723
Only good nemo tunes so far, from my testing, are magnum-mini and celeste.
The official instruct is fine, even if its vocab isn't very varied.

Anonymous
07/31/24(Wed)08:11:55 No.101650820

Anonymous 07/31/24(Wed)08:11:55 No.101650820

>>101650787
quick question - do you think neural networks are inherently deterministic?

Anonymous
07/31/24(Wed)08:12:30 No.101650829

Anonymous 07/31/24(Wed)08:12:30 No.101650829

>>101650511
>>101650557
>>101650667
All that baby rage to run moxxie's nemo tune huh, guess the target audience makes sense.
>https://huggingface.co/BeaverAI/mistral-doryV2-12b
>https://huggingface.co/BeaverAI/mistral-doryV2-12b/commits/main
>Fizzarolli commited on 9 days ago

Anonymous
07/31/24(Wed)08:14:22 No.101650844

Anonymous 07/31/24(Wed)08:14:22 No.101650844

>>101650737
>repeating what I said in an attempt to stop the humiliation
Lmao browbeaten faggot >>101650736

Anonymous
07/31/24(Wed)08:15:51 No.101650858

Anonymous 07/31/24(Wed)08:15:51 No.101650858

>>101650829
This dumb virgin got so mad he tried to pull some wannabe dox or something

Anonymous
07/31/24(Wed)08:16:33 No.101650863

Anonymous 07/31/24(Wed)08:16:33 No.101650863

>The guy that doesn't know what a placeholder is thinks somehow he's humiliating anyone but himself

Anonymous
07/31/24(Wed)08:19:25 No.101650890

Anonymous 07/31/24(Wed)08:19:25 No.101650890

>Can't attach a pic
>what's a placeholder???
>virgin
>virgin
He really flew in straight from Discord.

Anonymous
07/31/24(Wed)08:21:49 No.101650906

Anonymous 07/31/24(Wed)08:21:49 No.101650906

>>101650890
>made a mockery of himself so now he has to try le ebin book of buzzwords
>N-no he is a tranny! H-he is from discord or something!
Fragile bitch

Anonymous
07/31/24(Wed)08:22:36 No.101650915

Anonymous 07/31/24(Wed)08:22:36 No.101650915

>>101649504
I think it's a prompt issue.

Anonymous
07/31/24(Wed)08:23:42 No.101650922

Anonymous 07/31/24(Wed)08:23:42 No.101650922

>>101650820
That's a tricky question. You could make them so, but in reality, due to multiple factors, the way they're implemented these days, quanting, rounding errors, op orders, compilers hacks, usually they're not. But still , if you do benching and you compare your shit to other shit, you'd better be sure you know what your shit is doing underneath. I prefer to go as much deterministic as possible, cos that's a good rule of thumb in science.

Anonymous
07/31/24(Wed)08:25:20 No.101650941

Anonymous 07/31/24(Wed)08:25:20 No.101650941

File: Peek2.jpg (30 KB, 459x697)

30 KB JPG

>>101650178
Please.

Anonymous
07/31/24(Wed)08:26:06 No.101650955

Anonymous 07/31/24(Wed)08:26:06 No.101650955

>>101649895
https://huggingface.co/leafspark/Mistral-Large-218B-Instruct
It makes it smarter.

Anonymous
07/31/24(Wed)08:26:14 No.101650957

Anonymous 07/31/24(Wed)08:26:14 No.101650957

>>101650941
>>101650906

Anonymous
07/31/24(Wed)08:28:16 No.101650981

Anonymous 07/31/24(Wed)08:28:16 No.101650981

>>101650941
>-h

Anonymous
07/31/24(Wed)08:28:55 No.101650988

Anonymous 07/31/24(Wed)08:28:55 No.101650988

>>101648170
this one worked btw, thanks again

Anonymous
07/31/24(Wed)08:29:40 No.101650994

Anonymous 07/31/24(Wed)08:29:40 No.101650994

>>101650922
I was just checking because there was retarded anon here who was arguing they aren't deterministic at all as an algorithm (regardless of hardware quirks like rounding etc.)

Anonymous
07/31/24(Wed)08:30:46 No.101651002

Anonymous 07/31/24(Wed)08:30:46 No.101651002

>>101650941
we can't have nice things because of people like you, fuck off kindly

Anonymous
07/31/24(Wed)08:32:25 No.101651014

Anonymous 07/31/24(Wed)08:32:25 No.101651014

>>101650981
--quiet Enable quiet mode, which hides generation inputs and outputs in the terminal. Quiet mode is automatically enabled when running a horde worker.

>>101651002
Why? What did I do?

Anonymous
07/31/24(Wed)08:33:36 No.101651028

Anonymous 07/31/24(Wed)08:33:36 No.101651028

>>101650941
if you want to break the horde social contact you have to earn it by being smart enough to figure it out yourself

Anonymous
07/31/24(Wed)08:34:17 No.101651035

Anonymous 07/31/24(Wed)08:34:17 No.101651035

>>101650955
>It makes it smarter.
Nah, self-merges make it more creative, make text fancier for RP. When coding, self-merges performed worse when I tried them. And that one seems broken, wouldn't even try.
> - layer_range: [70, 87]
Should be
> - layer_range: [70, 88]

Anonymous
07/31/24(Wed)08:40:06 No.101651101

Anonymous 07/31/24(Wed)08:40:06 No.101651101

>>101651014
>Why? What did I do?

>join a nice initiative about crowdsourcing models so vramlets can rp with their models
>hehe anons, how can I shit on their privacy and look up what they are actually writing so I can jerk off
and you have the audacity to ask why I told you to fuck off? One of the main reason this general exists is because we value privacy which we can't achieve through corpo api models.

Anonymous
07/31/24(Wed)08:43:18 No.101651140

Anonymous 07/31/24(Wed)08:43:18 No.101651140

Does mistral large work that well even at 2bit because of that quantization aware training?

Anonymous
07/31/24(Wed)08:43:38 No.101651142

Anonymous 07/31/24(Wed)08:43:38 No.101651142

>>101651101
>because we value privacy which we can't achieve through corpo api models
nta. But running through horde shouldn't have the expectation of privacy. It's no different than the 'corpo api models'. If changing a line of code on the server disables that 'privacy' without the client knowing, that's no privacy at all.

Anonymous
07/31/24(Wed)08:44:02 No.101651146

Anonymous 07/31/24(Wed)08:44:02 No.101651146

>>101651101
nta but it's also to educate the naive fucks who think horde hosters (or any cloud provider for that matter) respect their privacy. anyone using horde or chatgpt or something to do embarrassing shit may think twice after reading this thread.

Anonymous
07/31/24(Wed)08:44:31 No.101651153

Anonymous 07/31/24(Wed)08:44:31 No.101651153

>>101650955
lmao it doesn't
you basically achieving the same as cranking up temperature for 2x ram used

Anonymous
07/31/24(Wed)08:45:34 No.101651163

Anonymous 07/31/24(Wed)08:45:34 No.101651163

>>101651002
>>101651101
You're baiting or you're a legit retard. This kind of thing will always happen, anon isn't the first to want to do this and won't be the last.

Anonymous
07/31/24(Wed)08:46:35 No.101651178

Anonymous 07/31/24(Wed)08:46:35 No.101651178

>>101651157
>>101651157
>>101651157

Anonymous
07/31/24(Wed)08:51:44 No.101651227

Anonymous 07/31/24(Wed)08:51:44 No.101651227

>>101651142
>>101651146
>>101651163
nobody expects 100% privacy through horde but communities like this should strive to ostracize retarded monkeys that work against the interests of that community, not encourage them. At least I hold the opensource circles to a higher standard than corpos and expect better from them.

Anonymous
07/31/24(Wed)08:52:52 No.101651240

Anonymous 07/31/24(Wed)08:52:52 No.101651240

>>101651227
>At least I hold the opensource circles to a higher standard than corpos and expect better from them.
kek

Anonymous
07/31/24(Wed)08:53:05 No.101651245

Anonymous 07/31/24(Wed)08:53:05 No.101651245

>>101651227
I only want to watch anon. I won't judge.

Anonymous
07/31/24(Wed)08:55:29 No.101651264

Anonymous 07/31/24(Wed)08:55:29 No.101651264

>>101651227
Stupid behavior. You should know that, if something is exploitable, it will be exploited.
You should ostracize retards with low morality only if that ACTUALLY affects other people negatively, like cheating in online games.

Anonymous
07/31/24(Wed)08:56:51 No.101651276

Anonymous 07/31/24(Wed)08:56:51 No.101651276

>>101651227
I'm not saying this is not 100% private. I'm saying that it gives a false sense of security to naive people. You think that anon is the first to wonder how it's done? If anon wasn't a retard he'd have done it already, as i'm sure many others did already.
HORDE SHOULD NEVER BE USED UNDER ANY CIRCUMSTANCE.
Oh, no... do YOU use horde?

Anonymous
07/31/24(Wed)08:57:18 No.101651282

Anonymous 07/31/24(Wed)08:57:18 No.101651282

>>101651002
It's not necessarily voyeuristic. When I tried doing that, I learned that 1) most users seem unable to prompt; and 2) Horde picks randomly available models for the proposed "scenarios".

No wonder the average reported Horde experience seems to be pretty terrible on average. Hosting models there is utterly pointless.

Anonymous
07/31/24(Wed)09:01:27 No.101651323

Anonymous 07/31/24(Wed)09:01:27 No.101651323

>>101651276
>You think that anon is the first to wonder how it's done? If anon wasn't a retard he'd have done it already, as i'm sure many others did already.
I know, all I did was telling him to fuck off. Or should I tell him how to do it and provide instructions how to make community worse?
>Oh, no... do YOU use horde?
nah, local all the way baby

Anonymous
07/31/24(Wed)09:02:33 No.101651334

Anonymous 07/31/24(Wed)09:02:33 No.101651334

>>101651323
>Or should I tell him how to do it and provide instructions how to make community worse?
Yes.

Anonymous
07/31/24(Wed)09:03:18 No.101651350

Anonymous 07/31/24(Wed)09:03:18 No.101651350

>>101651323
>Or should I tell him how to do it and provide instructions how to make community worse?
yes

Anonymous
07/31/24(Wed)09:04:47 No.101651361

Anonymous 07/31/24(Wed)09:04:47 No.101651361

>>101651323
>how to make community worse?
Have you spent more than 20 minutes in these threads. Fuck "communities".

Anonymous
07/31/24(Wed)09:06:11 No.101651377

Anonymous 07/31/24(Wed)09:06:11 No.101651377

>>101650994
That anon wasn't retarded. It was me, BTW, unless there were some other anons saying nonsenses. I didn't say nns weren't deterministic by their nature. I said that neural networks (especially huge ones trained on huge datasets) were inherently non-deterministic in training, which is true. That's the reason why even if you use the same dataset and the same settings, you won't get exactly the same weights with their hashes that perfectly match in the long run. They have to be trained on real hardware. Three body problem, broken pendulum problem, chaos theory. In backprop you either search for local minima or you average for global minimum in gradient decent. In the long run they're as deterministic as 3 black holes in an empty vacuum with no matter or radiation or even virtual particles at all. Yet after few million of years you can't reverse and track theirs paths back even if your precision is plank length size. You'll miss by a few parsec. Actually that's the issue in our close neighborhood too , like one of Saturn's moons that is Hyperion and it's orientation on it's orbit.So , determinism depends on the complexity and the scale.
I'm sure >>101650304 can explain that way better than me, since he's particle physicist by trade.

Anonymous
07/31/24(Wed)09:07:39 No.101651392

Anonymous 07/31/24(Wed)09:07:39 No.101651392

>>101651334
>>101651350
>>101651361
I'm gonna start putting malicious code into ggufs the moment I find a new overflow exploit just for you :3

Anonymous
07/31/24(Wed)09:10:52 No.101651427

Anonymous 07/31/24(Wed)09:10:52 No.101651427

>>101651392
I convert models myself, retard. Just like with horde, everyone should convert their own models.

Anonymous
07/31/24(Wed)09:12:14 No.101651442

Anonymous 07/31/24(Wed)09:12:14 No.101651442

>>101651377
I don't know if you are the same retard or a brand new one and honestly I don't care about finding out. Training with non-stochastic gradient descent methods and without some weird dataset reshuffling is entirely deterministic.

Anonymous
07/31/24(Wed)09:14:34 No.101651464

Anonymous 07/31/24(Wed)09:14:34 No.101651464

>>101651427
I assume you also run them on your own software and on your own system kernel. Because if not I could hide a surprise for you if I was reputable enough in a new llama.cpp/kobold merge.

Anonymous
07/31/24(Wed)09:17:22 No.101651500

Anonymous 07/31/24(Wed)09:17:22 No.101651500

>>101651392
>>101651464
Spreading malware and looking at logs are completely different things. But I understand, you probably are autistic and can't understand these concepts very well.

Anonymous
07/31/24(Wed)09:21:33 No.101651562

Anonymous 07/31/24(Wed)09:21:33 No.101651562

>>101651500
They differ only in a degree of maliciousness, but they are in the same category of being a dick. Be better.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/31/24(Wed)09:24:40 No.101651598

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/31/24(Wed)09:24:40 No.101651598

>>101651377
>I said that neural networks (especially huge ones trained on huge datasets) were inherently non-deterministic in training, which is true.
It is 100% possible to make neural network training deterministic.
But since training is so expensive, faster but non-deterministic training is usually preferred.

>They have to be trained on real hardware.
If you want to be really strict nothing is deterministic due to quantum physics but you can reasonably assume that there won't be random bit flips during training.
(Or at least none that the error correction won't catch.)

>Three body problem, broken pendulum problem, chaos theory.
I think you are confusing chaotic systems with nondeterministic systems.
Chaotic systems will have wildly different outputs given small perturbation of the inputs.
Nondeterministic systems will have different outputs even with the exact same inputs (e.g. the number of radioactive decays per time unit).
The classical three-body problem is chaotic but deterministic (if you ignore the effects of quantum physics).

Anonymous
07/31/24(Wed)09:38:25 No.101651760

Anonymous 07/31/24(Wed)09:38:25 No.101651760

>>101651598
Thanks Johannes, I think he is confusing (deterministic vs non-deterministic) with (numerical computations vs symbolic computations). Or he is simply overzealous with calling it non-deterministic because there is one-millionth chance the radiation from Sun hits the GPU and flips one bit, but like you said, you could argue from there that nothing is deterministic.

Anonymous
07/31/24(Wed)09:41:03 No.101651793

Anonymous 07/31/24(Wed)09:41:03 No.101651793

>>101645266
>just run 405B on a single H100 cluster
Nah, easier for corporate types to go for the aaS offering, less money up-front. Also, there's isolated instances for those wanting more security. The "actually useful" stuff falls more into TTS/STT and BI, and not so much Q&A chatbot. As it is, it's like growing a big quartz crystal thing - lots of time and energy, and it's pretty when it's done, but if you want to modify it, you can't. A truly intelligent AI assistant is going to need long and short-term associative memory, and some sort of "forgetting" mechanism which tosses out data that's not important to remember.

Anonymous
07/31/24(Wed)09:44:50 No.101651835

Anonymous 07/31/24(Wed)09:44:50 No.101651835

File: pepefroggie.jpg (38 KB, 780x438)

38 KB JPG

>software people trying everything to make models run on consumer hardware
>radio silence from hardware people
Did everybody give up to Nvidia? Is anyone even trying? Or do all hardware people work for big GPU?

Anonymous
07/31/24(Wed)09:48:19 No.101651870

Anonymous 07/31/24(Wed)09:48:19 No.101651870

>>101651835
Didn't llama.cpp merge support for some AI oriented accelerator the other day?

Anonymous
07/31/24(Wed)09:50:07 No.101651895

Anonymous 07/31/24(Wed)09:50:07 No.101651895

>>101651835
when was the last time you saw a successful open source hardware like GPU with a manufacturing power to produce it on the consumer scale? In software you can always count on some hackers to write thousands line of code to optimize shit, in hardware you are on the mercy of a few monopolies for which making GPUs like that isn't in their best interest.

Anonymous
07/31/24(Wed)10:01:55 No.101652045

Anonymous 07/31/24(Wed)10:01:55 No.101652045

>>101646197
>Any good models for Japanese to English translations?
I used Gemma last weekend to make up r/l pair exercises for my ESL student. It was much better than CR+, which was surprising. They were challenging yet not absurd. CR+ was kind of low-effort about it. I would say Gemma has a pretty good "comprehension" of Japanese.

Anonymous
07/31/24(Wed)10:03:23 No.101652063

Anonymous 07/31/24(Wed)10:03:23 No.101652063

there aren't any known upcoming big model releases for a while right?

opus 3.5, gemini ultra and gpt 4.5 or whatever are probably later in the year, and mistral, meta just played their hand

i guess cohere might surprise us with a model, but they seem more focused on corporate usage not retail

Anonymous
07/31/24(Wed)10:10:07 No.101652143

Anonymous 07/31/24(Wed)10:10:07 No.101652143

>>101651835
>>radio silence from hardware people
If it were easy, it would be done. It's not easy. There's no easy solution for designing a chip which has tens of thousands of matrix multiply units, runs really fast, doesn't create too much heat, doesn't have cache or pipeline issues etc... and doesn't cost a fortune... and finding a fab with both a small enough process and time to work you in.
Don't worry, Battlemage will be out any day now...

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.