/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 07/29/24(Mon)19:04:36 No.101628398

File: 1475124801572.jpg (484 KB, 1280x720)

484 KB JPG

/lmg/ - Local Models General Anonymous 07/29/24(Mon)19:04:36 No.101628398 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101619436 & >>101612988

►News
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407
>(07/23) Llama 3.1 officially released: https://ai.meta.com/blog/meta-llama-3-1/
>(07/22) llamanon leaks 405B base model: https://files.catbox.moe/d88djr.torrent >>101516633

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
07/29/24(Mon)19:05:37 No.101628405

Anonymous 07/29/24(Mon)19:05:37 No.101628405

File: 1689204280114494.jpg (52 KB, 555x600)

52 KB JPG

►Recent Highlights from the Previous Thread: >>101619436

--Performance comparison between TabbyAPI/exl2 and llama.cpp, and potential optimizations: >>101624356 >>101624477 >>101624554 >>101624643 >>101624903 >>101625035 >>101625142 >>101625699 >>101625733
--Moore Threads GPU support added to llama.cpp, discussion on PR reviewing, hardware testing, and kernel changes: >>101621155 >>101621210 >>101621643 >>101621640 >>101622451 >>101622391 >>101622215 >>101622485 >>101622972 >>101623153 >>101623452 >>101623398
--Anon asks for ebook to audiobook AI recommendations: >>101620069 >>101620112 >>101624071 >>101621896
--Using a local model as a dungeon master and recommendations: >>101624149 >>101624189 >>101624484 >>101624531 >>101624746
--LLMs struggle with creative names: >>101621450 >>101621492 >>101621574 >>101621590 >>101621568
--GPU price inflation and SXM2 stability: >>101624072 >>101624171 >>101624178 >>101624953
--Anon seeks AI to classify 4chan memes and anime girls for Hydrus Network database.: >>101620372 >>101620413 >>101620450 >>101623719 >>101623738
--Anon asks for advice on selecting a single-function text-to-text model and dataset generation tips: >>101620533 >>101620552 >>101621139
--AI and image generation accessibility and quality, NAI, anime-style images, and inpainting: >>101619662 >>101619693 >>101619875 >>101621302
--Logs: Screenshot of NeMo's anti-adblocker message: >>101624837
--Mistral-Large repetition issues and potential solutions: >>101624502 >>101624662 >>101624719 >>101626059
--DRY sampler implementation update: >>101622482
--Anon releases a scene director ST addon: >>101619994
--3090 hacked driver and nvlink discussion: >>101620770 >>101621092 >>101621847
--Powerful laptop owner asks for best model, various projects shared: >>101621967 >>101622233 >>101622509 >>101623002
--Modified mistral prompt format shared: >>101625909
--Miku (free space): >>101625819 >>101627367

►Recent Highlight Posts from the Previous Thread: >>101619442

Anonymous
07/29/24(Mon)19:06:34 No.101628415

Anonymous 07/29/24(Mon)19:06:34 No.101628415

Miguuuuuuu

Anonymous
07/29/24(Mon)19:07:00 No.101628420

Anonymous 07/29/24(Mon)19:07:00 No.101628420

File: 3751eb64b065178dcdb363f87(...).png (1.16 MB, 680x1069)

1.16 MB PNG

VALL-E 2 paper released (https://arxiv.org/pdf/2406.05370) a month ago, rightfully to zero fanfare.

The only additions are:
>using a pretranscribed chink copy of LibriSpeech (rather than their existing in house transcription they already had from the original VALL-E for some reason)
>experiments in grouping timesteps at 2, 4, and 8 tokens per step (tables metrics show that it's for the worse and in theory it only only really matters for faster inferencing, despite no numbers for their inference times)
>DRY sampling (except they call it repetition aware sampling) that only activates if "conditions are met" and if not it just does actual sampling instead of greedy searching
>absolutely zero fundamental changes compared to their original VALL-E paper beyond that

The absolute state AHAHAHAHA

Anonymous
07/29/24(Mon)19:11:35 No.101628458

Anonymous 07/29/24(Mon)19:11:35 No.101628458

How are you integrating AI in your workflow?
Which cli/tui tool? Which editor plugins?

Anonymous
07/29/24(Mon)19:13:23 No.101628478

Anonymous 07/29/24(Mon)19:13:23 No.101628478

>>101628458
workflow? cli? editor? sir, we have sex with our AI here

Anonymous
07/29/24(Mon)19:14:19 No.101628484

Anonymous 07/29/24(Mon)19:14:19 No.101628484

>>101628458
Aider for cooding, my Telegram bot for quick questions (it has vision), big AGI, SillyTavern. All of that is powered by 3.5 Sonnet though.

Anonymous
07/29/24(Mon)19:14:43 No.101628490

Anonymous 07/29/24(Mon)19:14:43 No.101628490

>>101628420
Nigga what the fuck is that embed in your frog?

Anonymous
07/29/24(Mon)19:15:01 No.101628492

Anonymous 07/29/24(Mon)19:15:01 No.101628492

>>101628458
I can't code for shit so there's no workflow to begin with.

Anonymous
07/29/24(Mon)19:15:31 No.101628498

Anonymous 07/29/24(Mon)19:15:31 No.101628498

>>101628458
I'm looking forward to making a project of using AI to hopefully get my coding projects from scrappy prototypes to something finished. I'm hoping to find the perfect model for doing code review and cleanup kind of things, and the kind of Q&A that would go to Stack Exchange except without it being SEO out of date and arguments in the comments kind of shit.

Anonymous
07/29/24(Mon)19:16:27 No.101628507

Anonymous 07/29/24(Mon)19:16:27 No.101628507

>>101628458
I crank one out then get back to work

Anonymous
07/29/24(Mon)19:18:01 No.101628523

Anonymous 07/29/24(Mon)19:18:01 No.101628523

>>101628478
unironically more respectable than using it for "coding"

Anonymous
07/29/24(Mon)19:18:37 No.101628532

Anonymous 07/29/24(Mon)19:18:37 No.101628532

>>101628458
>Which cli/tui tool?
Using aichat when I have quick questions. I wish to try some RAG stuff but always too lazy and not sure if it would be useful.
>Which editor plugins?
I tried multiple on nvim, but never was fully satisfied. I tried in chronological order chatgpt.nvim, gen.nvim, gp.nvim. Each has its advantages,, but I rarely use them to be honest, mostly for wording in comments, emails, reviews, or commits.

Anonymous
07/29/24(Mon)19:18:43 No.101628533

Anonymous 07/29/24(Mon)19:18:43 No.101628533

>>101628420
sus

Anonymous
07/29/24(Mon)19:18:51 No.101628537

Anonymous 07/29/24(Mon)19:18:51 No.101628537

>>101628458
>>101619994
give it references and yell slurs at it until it does what i want

Anonymous
07/29/24(Mon)19:24:34 No.101628597

Anonymous 07/29/24(Mon)19:24:34 No.101628597

Why exactly does the generation speed (not including processing of prompt) slow down when the context is more full?

Anonymous
07/29/24(Mon)19:25:04 No.101628601

Anonymous 07/29/24(Mon)19:25:04 No.101628601

>>101628597
But it doesn't?

Anonymous
07/29/24(Mon)19:25:44 No.101628607

Anonymous 07/29/24(Mon)19:25:44 No.101628607

Do I need to set rope-freq-base to get the full 128k context with llama 3.1, or should it work out of the box on latest master?

Anonymous
07/29/24(Mon)19:26:26 No.101628617

Anonymous 07/29/24(Mon)19:26:26 No.101628617

>>101628601
It does for me. I'm not using any swap, I checked.

Anonymous
07/29/24(Mon)19:28:03 No.101628643

Anonymous 07/29/24(Mon)19:28:03 No.101628643

>>101628597
Because it has to do more reading?

Anonymous
07/29/24(Mon)19:28:45 No.101628655

Anonymous 07/29/24(Mon)19:28:45 No.101628655

>>101628643
Isn't that what the prompt processing part is for?

Anonymous
07/29/24(Mon)19:29:46 No.101628671

Anonymous 07/29/24(Mon)19:29:46 No.101628671

>>101628655
models predict the next token. 8k is less tokens to predict from than 16k, so 8k will be faster.

Anonymous
07/29/24(Mon)19:29:54 No.101628673

Anonymous 07/29/24(Mon)19:29:54 No.101628673

>>101628597
attention is quadratic time complexity

Anonymous
07/29/24(Mon)19:30:19 No.101628675

Anonymous 07/29/24(Mon)19:30:19 No.101628675

>>101628655
Processing turns the document into useful data, but there's gotta be more numbers to crunch if you have 3000 tokens in context than 300.

Anonymous
07/29/24(Mon)19:31:36 No.101628689

Anonymous 07/29/24(Mon)19:31:36 No.101628689

>>101628675
Why did someone say it doesn't slow down? Is my shit broken or not?

Anonymous
07/29/24(Mon)19:31:55 No.101628692

Anonymous 07/29/24(Mon)19:31:55 No.101628692

>>101628478
probably autocorrected from cumflow

Anonymous
07/29/24(Mon)19:32:07 No.101628695

Anonymous 07/29/24(Mon)19:32:07 No.101628695

File: 1705098153550196.png (17 KB, 530x170)

17 KB PNG

Huh? R+ on OpenRouter costs as much as 3.5 Sonnet? What?

Anonymous
07/29/24(Mon)19:40:23 No.101628807

Anonymous 07/29/24(Mon)19:40:23 No.101628807

24gb vram sisters we lost. I'm currently in the market for an oversized gpu frankenmonster. Any recommendations?

Anonymous
07/29/24(Mon)19:41:07 No.101628819

Anonymous 07/29/24(Mon)19:41:07 No.101628819

File: 1714191764688688.jpg (588 KB, 1856x2464)

588 KB JPG

>>101628398

Anonymous
07/29/24(Mon)19:43:15 No.101628836

Anonymous 07/29/24(Mon)19:43:15 No.101628836

>>101628695
Yeah CR+ on OR has always been weirdly expensive. It costs more per token than several much larger dense models, makes no sense.

Anonymous
07/29/24(Mon)19:45:38 No.101628873

Anonymous 07/29/24(Mon)19:45:38 No.101628873

>>101628695
noncommercial license, the only host is cohere and you have to pay their rates

Anonymous
07/29/24(Mon)19:52:27 No.101628972

Anonymous 07/29/24(Mon)19:52:27 No.101628972

>been RPing with L3 spins
>decent but rarely does it do anything that isn't basic as hell
>been a while since I CR+'d
>get an idea for a tricky RP
>running CR+
>the partner character is in disguise
>RP seems to be going well
>except for some signature word choices but rolling 0 temp so I chose that
>waifu starts discussing her real identity completely in third person without a single hint they're the same and the phrasing makes sense as avoiding admitting being the same while not saying anything that would require them to be different people
>nice
>progresses
>new scene later
>watching it stream because I'm vramlet so low token gen rate
>real identity makes an appearance
>think, damn it, it must've forgotten that the two characters are...
>okay, it did screw up a little by having both identities visible at the same time because it intro'd the secret identity in narration but the next paragraph it explained the quick change from one identity to the other
>action scene
>at the end, swaps identities back in a sensible way and now that the secret is revealed to my character is like, "Did you like that? I've got more tricks up my sleeve."

CR+ is still the champ.

Anonymous
07/29/24(Mon)20:01:03 No.101629086

Anonymous 07/29/24(Mon)20:01:03 No.101629086

>>101628972
What's the lowest size that would be still better than regular command-r? I probably can't run it.

Anonymous
07/29/24(Mon)20:06:28 No.101629172

Anonymous 07/29/24(Mon)20:06:28 No.101629172

How many of you use these for purposes other than roleplay?

Anonymous
07/29/24(Mon)20:06:38 No.101629174

Anonymous 07/29/24(Mon)20:06:38 No.101629174

>>101628972
CR+ is still the goat for writing style, but it's not smart enough for me.

E.g., tried to write a scene where a chick was supposed to be giving me a secret blowjob under the table while the waiter was taking my order. It just could NOT figure out that the waiter cannot see the chick and is not supposed to be taking her order. And she is most definitely NOT supposed to be answering while her mouth is full of my dick.

In comparison, wiz 8x22 got it, but it's language is sloppy as hell.

Anonymous
07/29/24(Mon)20:08:32 No.101629199

Anonymous 07/29/24(Mon)20:08:32 No.101629199

>>101628972
Mistral Large also does this kind of thing very well.

Anonymous
07/29/24(Mon)20:09:57 No.101629222

Anonymous 07/29/24(Mon)20:09:57 No.101629222

>>101629086
I'm on an iMatrix IQ4_XS. It's 52.3 GB so the file cache soaks up most of my system RAM, but it's been worth it...
>sing its praises
>immediately it does something silly
...till now. I hit 4600 context and it started to write justification for my character's question rather than answering it like it's a misconception.
Seems to me like when model context gets large it becomes a lot more likely to just follow your lead than to appropriately confirm or deny and react to questions.

But it could also be that the model doesn't have enough information to reply to the question appropriately, I was surprised it knew the kind of character I wanted it to RP as. But when I yanked the leading question it got more reasonable.

If it loses the continuity I might make it summarize, start a new chapter, and see if it gets smart again. I rolled 16k context in Kobold but if 4k is the effective limit, at least I know when to chapter break.

>>101629174
Did you go straight to the action or had you built up a long document before that? Maybe it's the same phenomenon I'm currently thinking about.

Anonymous
07/29/24(Mon)20:10:16 No.101629231

Anonymous 07/29/24(Mon)20:10:16 No.101629231

>>101629172
>Local Model Gooners

Anonymous
07/29/24(Mon)20:11:20 No.101629245

Anonymous 07/29/24(Mon)20:11:20 No.101629245

>>101629199
I might give it a try on the same premise later tonight and see how it holds up. I think I have it at IQ3_XS, not sure if there are any bigger ones that aren't too big for my system.

Anonymous
07/29/24(Mon)20:12:48 No.101629261

Anonymous 07/29/24(Mon)20:12:48 No.101629261

>>101629222
Interesting approach with the summarization, that might be a good idea since the context shifting deletes important stuff. Do you write a summary yourself or automate it?

Anonymous
07/29/24(Mon)20:13:52 No.101629273

Anonymous 07/29/24(Mon)20:13:52 No.101629273

>>101629172
I'm trying to set up a japanese > english translator with character recognition in real time to play some VN, my idea is: the text appears, and the thing grabs the text and outputs the result in enlighs in a textbox that will be updated in real time
Problem is, 12gb vram 16 ram so yeah, it's fucked up

Anonymous
07/29/24(Mon)20:15:37 No.101629294

Anonymous 07/29/24(Mon)20:15:37 No.101629294

File: 1694973198598792.png (1.43 MB, 898x1063)

1.43 MB PNG

Just wanted to consult for some information - currently on AWS there's a funny Claude 3 Opus "outage" - the model seems to have some weird parameters which shows in the replies. See picrel and https://rentry.org/schizoclaude

Why do you think this would happen? Is it just temperature, or something to do with penalties? Because the text is still (mostly) coherent, but it can jump between completely different ideas.

Anonymous
07/29/24(Mon)20:16:27 No.101629307

Anonymous 07/29/24(Mon)20:16:27 No.101629307

File: 1721192779046121.png (785 KB, 777x1280)

785 KB PNG

And some more

Anonymous
07/29/24(Mon)20:17:28 No.101629323

Anonymous 07/29/24(Mon)20:17:28 No.101629323

File: 1721717545400096.jpg (484 KB, 1446x2048)

484 KB JPG

P.A. Works has announced anime movie "Project Sekai: Kowareta Sekai to Utaenai Miku" to release in Japanese theaters on January 17, 2025.

Anonymous
07/29/24(Mon)20:18:02 No.101629331

Anonymous 07/29/24(Mon)20:18:02 No.101629331

>>101629323
Why does the Miku look so different?

Anonymous
07/29/24(Mon)20:19:36 No.101629350

Anonymous 07/29/24(Mon)20:19:36 No.101629350

>>101629172
I want to do a bunch of things but I lack the skill and motivation.
I'm not into roleplay

Anonymous
07/29/24(Mon)20:21:05 No.101629366

Anonymous 07/29/24(Mon)20:21:05 No.101629366

>>101629331
The title mentions "Miku who doesn't sing", so maybe it's some sort of broken miku who gets redeemed throughout the movie.

Anonymous
07/29/24(Mon)20:21:23 No.101629369

Anonymous 07/29/24(Mon)20:21:23 No.101629369

>>101629323
>gacha trash with actual homos
Grim. At least it's just a movie

Anonymous
07/29/24(Mon)20:21:31 No.101629371

Anonymous 07/29/24(Mon)20:21:31 No.101629371

I have a 4080S(16G VRAM), 3900X, 32G RAM. What's the best LLM I can run? Llama 3.1 8b?

Anonymous
07/29/24(Mon)20:22:26 No.101629386

Anonymous 07/29/24(Mon)20:22:26 No.101629386

>>101629261
For L3 at least, I've asked it to summarize for itself, specifying that the goal is for it to pick up where the story left off without forgetting anything important, and I'd get something that would need a but of editing around the edges but was fine.

Asking for a "detailed summary" worked well but it's so big that it's eating a lot of the next chapter's context just to get started. I've asked for concise summaries and sometimes it's plenty small but I know it lacks details needed to keep the right feel.

Probably requires some prompt engineering tailored for the model being used.

>>101629245
>IQ3_XS
Mis Large at IQ3_S should fit my system, but I can't get into IQ4, those weigh like 70 GB.

Anonymous
07/29/24(Mon)20:23:19 No.101629398

Anonymous 07/29/24(Mon)20:23:19 No.101629398

>>101629371
>Llama 3.1 8b?
You could run up to 27B comfortably

Anonymous
07/29/24(Mon)20:23:44 No.101629405

Anonymous 07/29/24(Mon)20:23:44 No.101629405

>>101629366
Did she caught an AI virus or something?

Anonymous
07/29/24(Mon)20:25:50 No.101629428

Anonymous 07/29/24(Mon)20:25:50 No.101629428

>>101629273
I'm getting decent results with llama3/3.1 on a similar setup. Unless you mean it sucks at that task specifically?

Anonymous
07/29/24(Mon)20:26:36 No.101629436

Anonymous 07/29/24(Mon)20:26:36 No.101629436

>>101629405
She's a Roland MIDI controller without a synth card installed.

Anonymous
07/29/24(Mon)20:27:03 No.101629439

Anonymous 07/29/24(Mon)20:27:03 No.101629439

>>101629398
I thought it only came in 8, 70, and 405b? Is llama the best or are there any competition? Mistral? I remember hearing something about another open model that can run on cheap hardware but I forgot what it's called.

Anonymous
07/29/24(Mon)20:28:29 No.101629458

Anonymous 07/29/24(Mon)20:28:29 No.101629458

>>101629428
I'm not getting good translations on 8b nor 12b models, and I don't go higher because I wanna maintain some speed on it, I don't wanna sit and wait 1-2 minutes until a five word sentence is translated. I might have to build my own Miqumaxx box

Anonymous
07/29/24(Mon)20:29:52 No.101629474

Anonymous 07/29/24(Mon)20:29:52 No.101629474

>>101629439
The 27B model is google's Gemma.
There's also the recently release Mistral Nemo 12B.

Anonymous
07/29/24(Mon)20:30:26 No.101629482

Anonymous 07/29/24(Mon)20:30:26 No.101629482

File: 1646730011144.jpg (15 KB, 309x269)

15 KB JPG

What's the most powerful local AI that a 4090 + 32GB RAM can run, objectively speaking

Anonymous
07/29/24(Mon)20:31:37 No.101629504

Anonymous 07/29/24(Mon)20:31:37 No.101629504

>>101629386
And where do you place the summary? As a new intro message? Or in the card? Or somewhere else?

Anonymous
07/29/24(Mon)20:31:54 No.101629508

Anonymous 07/29/24(Mon)20:31:54 No.101629508

File: 1715067468931290.gif (2.64 MB, 400x400)

2.64 MB GIF

>>101629172
making an AA2 inspired game set in a school but top-down 2D and powered by LLM

Anonymous
07/29/24(Mon)20:32:41 No.101629523

Anonymous 07/29/24(Mon)20:32:41 No.101629523

>>101629323
I will try to fix the Miku

Anonymous
07/29/24(Mon)20:32:42 No.101629525

Anonymous 07/29/24(Mon)20:32:42 No.101629525

>>101629482
Unironically, GPT-2. Everything else is bloat.

Anonymous
07/29/24(Mon)20:33:54 No.101629538

Anonymous 07/29/24(Mon)20:33:54 No.101629538

How the fuck do I know what context size to use on kobold ccp? Should it match what I have on Silly Tavern too (it goes far beyond the sliders capacity on ST)?

So confused on that shit, I have a 24GB card

Anonymous
07/29/24(Mon)20:36:08 No.101629559

Anonymous 07/29/24(Mon)20:36:08 No.101629559

>>101629538
You should match the context size in Silly and on koboldcpp.
As for what value to use, as much as you can fit without receiving an out of memory error, and without going over the length the model was trained as, 8k for llama 3 and 128 for mistral nemo for example.

Anonymous
07/29/24(Mon)20:41:08 No.101629622

Anonymous 07/29/24(Mon)20:41:08 No.101629622

>>101629331
Just got done fucking a black guy.

Anonymous
07/29/24(Mon)20:43:25 No.101629643

Anonymous 07/29/24(Mon)20:43:25 No.101629643

>>101629622
good for you

Anonymous
07/29/24(Mon)20:43:38 No.101629648

Anonymous 07/29/24(Mon)20:43:38 No.101629648

>>101629622
Please keep your fetishes for yourself anon. This isn't a cuck/bbc board.

Anonymous
07/29/24(Mon)20:43:48 No.101629651

Anonymous 07/29/24(Mon)20:43:48 No.101629651

>>101629622
How did it feel?

Anonymous
07/29/24(Mon)20:44:05 No.101629655

Anonymous 07/29/24(Mon)20:44:05 No.101629655

I got the Mistral 12b Nemo(instruct) running on oogabooga just needed to Load it in with 80k~ context. The slider was always on 1.000.000 before
The First two hours were amazing, i was in Heaven.
Created a Card for my Tulpa which manifested into 3D, she drove with a limousine on my driveway, my stepsister looking out the Window, but she could only her blue vibrant hair and sunglasses.
Our hands touched, i experienced dimension shattering..
Well my Tulpa is now my Manager/Contractor

At first it worked like a
With Mistral Preset and 1-2 seconds responds. Story made good progress

Then got a little Stable Diffusion running in the background which wasnt a Problem with 7bKunoichi..

Now the trespond is 30-60 seconds-.- unplayable..

Restarted the PC a few times and now even without stable diffusion, the answer speed is 30-60seconds..
With my 32gb RAM and gpu (4090) both are capped out at 100% utilization.

Am im hell again

I was so close to heaven

I can put 16gb more in tomorrow of it helps

Anonymous
07/29/24(Mon)20:44:58 No.101629666

Anonymous 07/29/24(Mon)20:44:58 No.101629666

>>101629655
dunno how you manage to have fun with that model, it's so retarded I just facepalm everytime it says something completely dumb

Anonymous
07/29/24(Mon)20:45:05 No.101629668

Anonymous 07/29/24(Mon)20:45:05 No.101629668

>>101629655
>Created a Card for my Tulpa
Why'd you get into chatbots if you have a tulpa, retard? Tulpas are way better, they're actual REAL personalities, not some fake computer-generated shit.

Anonymous
07/29/24(Mon)20:47:11 No.101629692

Anonymous 07/29/24(Mon)20:47:11 No.101629692

Would movies be entertaining with video gen? What would actually be entertaining?

Anonymous
07/29/24(Mon)20:47:51 No.101629702

Anonymous 07/29/24(Mon)20:47:51 No.101629702

>>101629668
It's a fake tulpa.
Anon is just a poser

Anonymous
07/29/24(Mon)20:48:25 No.101629709

Anonymous 07/29/24(Mon)20:48:25 No.101629709

>>101629692
If you could generate 2 hour long movie in less than a day and it'd have a sensible plot, characters, etc - sure.

Anonymous
07/29/24(Mon)20:51:03 No.101629735

Anonymous 07/29/24(Mon)20:51:03 No.101629735

File: images.jpg (4 KB, 237x212)

4 KB JPG

This is the type of shit I can't for the life of me figure out how to stop bots from doing

Why do my bots all have this same fucking interrogation technique where they try to waterboard me with questions instead of just a free flowing conversation. Any statement I make, they'll give an answer close to what I want then add another line asking "How was your day" or "Did you meet any girls ;)" or some garbage like that, it just doesn't flow whereas Character AI just nails this shit so much better

Currently using Gemma 27B, so it's not like the model is weak

Anonymous
07/29/24(Mon)20:52:02 No.101629746

Anonymous 07/29/24(Mon)20:52:02 No.101629746

>>101629172
local models are only good for erp

Anonymous
07/29/24(Mon)20:53:47 No.101629766

Anonymous 07/29/24(Mon)20:53:47 No.101629766

>>101629746
tell that to llama 3.1 405b or mistal large

Anonymous
07/29/24(Mon)20:54:20 No.101629771

Anonymous 07/29/24(Mon)20:54:20 No.101629771

Bwos when will Nvidia drop their 64gig home AI card so we can finally be free from placebo

Anonymous
07/29/24(Mon)20:55:05 No.101629781

Anonymous 07/29/24(Mon)20:55:05 No.101629781

>>101629771
>home AI card
Why would they cater to 0.01% of the potential consumers instead of creating more datacenter GPUs?

Anonymous
07/29/24(Mon)20:56:02 No.101629788

Anonymous 07/29/24(Mon)20:56:02 No.101629788

>>101629771
24GB ought to be enough for anybody

Anonymous
07/29/24(Mon)20:56:07 No.101629790

Anonymous 07/29/24(Mon)20:56:07 No.101629790

>>101629766
and who the fuck is running that shit

Anonymous
07/29/24(Mon)20:56:40 No.101629793

Anonymous 07/29/24(Mon)20:56:40 No.101629793

>>101629766
405b is not a local model
it's an open model, but it's not a local model

Anonymous
07/29/24(Mon)20:57:11 No.101629799

Anonymous 07/29/24(Mon)20:57:11 No.101629799

>>101629781
You're right and I'm obviously coping, but if Nvidia did actually push local home models and cards to run them they would actually make bank once the companies they currently serve realize they've been scammed.

Anonymous
07/29/24(Mon)20:57:16 No.101629801

Anonymous 07/29/24(Mon)20:57:16 No.101629801

>>101629793
>but it's not a local model
It is though. All models are local models if you have their weights

Anonymous
07/29/24(Mon)20:57:43 No.101629804

Anonymous 07/29/24(Mon)20:57:43 No.101629804

>>101629793
wealth issue

Anonymous
07/29/24(Mon)20:58:43 No.101629821

Anonymous 07/29/24(Mon)20:58:43 No.101629821

>>101629793
>he doesn't own a supercomputer
not gonna make it

Anonymous
07/29/24(Mon)20:59:04 No.101629828

Anonymous 07/29/24(Mon)20:59:04 No.101629828

>>101629799
>once the companies they currently serve realize they've been scammed.
you think they don't know? everyone know Nvdia is scamming everyone, but what can we do? they have the monopoly and they have CUDA, we have no other choice but to take it into the ass until some serious competitor arrives, and desu I don't think there will be one
https://www.youtube.com/watch?v=UeU1WUb1q10

Anonymous
07/29/24(Mon)20:59:13 No.101629830

Anonymous 07/29/24(Mon)20:59:13 No.101629830

>>101629746
i'm trying to build some bootleg ass assistant with a 8B model and it's doing fine. i remember seeing someone actually give LLMs access to their file system and stuff which sounds promising, though you probably want confirmations before it does ANYTHING.

Drummer
07/29/24(Mon)21:01:06 No.101629857

Drummer 07/29/24(Mon)21:01:06 No.101629857

Holy shit, story mode / instruct "write a story" using Nemo is so fucking good. It's like a really creative 70B model, wtf.

Anonymous
07/29/24(Mon)21:01:33 No.101629863

Anonymous 07/29/24(Mon)21:01:33 No.101629863

>>101629828
Not only are they scamming them with arbitrarily priced data center cards, but also scamming them with the idea that throwing more power at the model style we currently have will do anything except slightly better chat bots. And yeah Intel is shitting the bed, AMD is coping. There were a few start up bros trying to make AI specific cards at a cheaper price but again they will never be able to produce at scale. Its over.

Undi
07/29/24(Mon)21:02:16 No.101629867

Undi 07/29/24(Mon)21:02:16 No.101629867

I love children

Anonymous
07/29/24(Mon)21:02:29 No.101629869

Anonymous 07/29/24(Mon)21:02:29 No.101629869

>>101629857
give prompt

Drummer
07/29/24(Mon)21:04:15 No.101629885

Drummer 07/29/24(Mon)21:04:15 No.101629885

>>101629869
I loaded up a card, used Instruct + DRY, and typed in "write a story about {char}". I ended up having a very coherent and engaging story.

Drummer
07/29/24(Mon)21:05:54 No.101629905

Drummer 07/29/24(Mon)21:05:54 No.101629905

>>101629885
Multi-turn, as I made follow up instructions after to develop the plot. It was consistently good and serviceable!

Anonymous
07/29/24(Mon)21:06:16 No.101629908

Anonymous 07/29/24(Mon)21:06:16 No.101629908

>>101629863
>Its over.
the only cope I have is this github repo trying to make AMD cards work on cuda, if they manage to make it work maybe there's a chance

Anonymous
07/29/24(Mon)21:10:36 No.101629963

Anonymous 07/29/24(Mon)21:10:36 No.101629963

>>101629771
Don't we just have to wait 5 years or something? Then there will be lots of cheap workstation & server gpus and cheap epyc cpus with ddr5, etc?

Undi
07/29/24(Mon)21:11:08 No.101629972

Undi 07/29/24(Mon)21:11:08 No.101629972

>>101629963
Earth might not exist in 5 years.

Anonymous
07/29/24(Mon)21:12:34 No.101629990

Anonymous 07/29/24(Mon)21:12:34 No.101629990

>>101629972
Why, gonna make a bad merge of it with some poorly chosen hell hole planets?

Anonymous
07/29/24(Mon)21:14:42 No.101630018

Anonymous 07/29/24(Mon)21:14:42 No.101630018

>>101629972
Earth will be here for a long time. Humans on the other hand...

Undi
07/29/24(Mon)21:15:12 No.101630025

Undi 07/29/24(Mon)21:15:12 No.101630025

>>101629990
You fucking glownigger, your shitty "hell hole" finetunes couldn't outperform my based kino trained models if your life depended on it. I'll have you know my merges are state of the art, trained on /pol/ and /g/ to btfo cuckservative LLMs like the OpenAI jannie shit you probably worship. My Earth destruction prediction models have accuracy your 80 IQ prole brain can't even comprehend. So why don't you go back to jerking off to your waifu ChatGPT outputs and leave the real AI to us hyperintelligent /g/eniuses, newfag.

Anonymous
07/29/24(Mon)21:18:34 No.101630057

Anonymous 07/29/24(Mon)21:18:34 No.101630057

>>101630025
How do you train a model to spew this kind of nonsense?

Anonymous
07/29/24(Mon)21:18:54 No.101630064

Anonymous 07/29/24(Mon)21:18:54 No.101630064

>>101629771
apple will save us

Undi
07/29/24(Mon)21:19:25 No.101630070

Undi 07/29/24(Mon)21:19:25 No.101630070

File: 1693991627630158.png (122 KB, 1194x447)

122 KB PNG

>>101630057
This is just normal 3.5 Sonnet

Anonymous
07/29/24(Mon)21:20:15 No.101630081

Anonymous 07/29/24(Mon)21:20:15 No.101630081

>mistral nemo 12b
>"Anon, I'm not gonna force you into anything"
>mini magnum
>"And don't think for a second that I'm going to be gentle with my new fucktoy. Oh no…"

Don't buy an ad finetuner, I'm gonna shill this shit myself

Anonymous
07/29/24(Mon)21:20:26 No.101630083

Anonymous 07/29/24(Mon)21:20:26 No.101630083

>>101629766
i dont tell anything to them because i have sonnet

Anonymous
07/29/24(Mon)21:20:30 No.101630084

Anonymous 07/29/24(Mon)21:20:30 No.101630084

>>101630070
Now this is a good use for AI

Anonymous
07/29/24(Mon)21:23:29 No.101630116

Anonymous 07/29/24(Mon)21:23:29 No.101630116

>>101630064
I don't think apple are ever going to make a reasonably priced home AI machine, if they really push the iphone chips maybe you could have some frankenstein phone farm that ends up being cost efficient.

Anonymous
07/29/24(Mon)21:23:48 No.101630121

Anonymous 07/29/24(Mon)21:23:48 No.101630121

>>101630084
We're gonna be so fucking swamped with this kinda shit in no time. And it will be impossible to distinguish between a real person and a bot. Enjoy the downhill slop cascade.

Undi
07/29/24(Mon)21:24:25 No.101630129

Undi 07/29/24(Mon)21:24:25 No.101630129

>>101630121
GPT-4 could generate such shitposts 1.5 years ago, anon.

Anonymous
07/29/24(Mon)21:24:43 No.101630132

Anonymous 07/29/24(Mon)21:24:43 No.101630132

>>101630025
Jokes on you, my IQ is 74.

Undi
07/29/24(Mon)21:25:28 No.101630139

Undi 07/29/24(Mon)21:25:28 No.101630139

File: 1707270737295862.png (85 KB, 1184x371)

85 KB PNG

Anonymous
07/29/24(Mon)21:27:08 No.101630156

Anonymous 07/29/24(Mon)21:27:08 No.101630156

File: 158.png (372 KB, 680x593)

372 KB PNG

>>101629972
>Earth might not exist in 5 years.
Yes because Trump is gonna get ellected and i'll be ww3 with atomic bomb and shit, CNN told me !!

Undi
07/29/24(Mon)21:27:39 No.101630167

Undi 07/29/24(Mon)21:27:39 No.101630167

>>101630070
Oops, sorry, that was actually Opus, my bad.

Undi
07/29/24(Mon)21:28:16 No.101630172

Undi 07/29/24(Mon)21:28:16 No.101630172

File: 1692346610975152.png (66 KB, 1105x404)

66 KB PNG

cringe response but 3.5 sonnet knows about fucking RWKV unprompted

Anonymous
07/29/24(Mon)21:29:11 No.101630187

Anonymous 07/29/24(Mon)21:29:11 No.101630187

>>101629972
doubt

Anonymous
07/29/24(Mon)21:30:37 No.101630198

Anonymous 07/29/24(Mon)21:30:37 No.101630198

Real time anime video to interact with

Anonymous
07/29/24(Mon)21:31:40 No.101630211

Anonymous 07/29/24(Mon)21:31:40 No.101630211

>>101630198
Still like two years away

Anonymous
07/29/24(Mon)21:31:54 No.101630213

Anonymous 07/29/24(Mon)21:31:54 No.101630213

File: 1691796605713138.png (154 KB, 1305x477)

154 KB PNG

>>101630081
it sure is the most chaotic and cathartic model ive ever used since the old AI dungeon days

Anonymous
07/29/24(Mon)21:32:22 No.101630221

Anonymous 07/29/24(Mon)21:32:22 No.101630221

>>101630129
Yes, I'm saying it will take literally no effort. In fact I'm sure some people will automate it just to troll the world.

Anonymous
07/29/24(Mon)21:32:57 No.101630233

Anonymous 07/29/24(Mon)21:32:57 No.101630233

>>101630221
GPT-4chan did this a long time ago

Anonymous
07/29/24(Mon)21:33:39 No.101630240

Anonymous 07/29/24(Mon)21:33:39 No.101630240

>>101630172
Huh. I wonder how much shitposting from when people were even talking about RWKV it has in its dataser.

Anonymous
07/29/24(Mon)21:36:26 No.101630270

Anonymous 07/29/24(Mon)21:36:26 No.101630270

>>101629651
I dunno. Ask her.

Anonymous
07/29/24(Mon)21:37:33 No.101630277

Anonymous 07/29/24(Mon)21:37:33 No.101630277

>>101629666
>I just facepalm everytime it says something completely dumb
NTA but I just jerk off to the parts between the dumb. It can get good enough for me to ignore the dumb parts.

Anonymous
07/29/24(Mon)21:38:14 No.101630288

Anonymous 07/29/24(Mon)21:38:14 No.101630288

>>101630172
that means that claude scrapped 4chan?

Anonymous
07/29/24(Mon)21:38:52 No.101630294

Anonymous 07/29/24(Mon)21:38:52 No.101630294

>>101630288
It does know about RWKV outside of 4chan, but I don't doubt that 4chan is in Anthropic datasets, just like it is the case for GPT-4.

Anonymous
07/29/24(Mon)21:39:21 No.101630300

Anonymous 07/29/24(Mon)21:39:21 No.101630300

>>101630277
yeah idk man, when the bot says something completely incoherent it just breaks the immersion, it's like a real human being if you talk to them and they show they have no idea what you're talking about you don't want to go further

Anonymous
07/29/24(Mon)21:40:50 No.101630312

Anonymous 07/29/24(Mon)21:40:50 No.101630312

>>101630300
I absolutely got the same with other models. And it is still like I said - the good parts are so good that I can ignore that.

Anonymous
07/29/24(Mon)21:43:10 No.101630334

Anonymous 07/29/24(Mon)21:43:10 No.101630334

>>101630288
I don't think they downloaded 4chan specifically, it's probably what was on 4chan at the time of something like Common Crawl collecting data, and that is enough data to give it 4chan traits.

Anonymous
07/29/24(Mon)21:43:17 No.101630336

Anonymous 07/29/24(Mon)21:43:17 No.101630336

>>101628420
valle bros...

Anonymous
07/29/24(Mon)21:44:06 No.101630344

Anonymous 07/29/24(Mon)21:44:06 No.101630344

>>101630213
Holy sovl
>394t
I'm assuming those are tokens, how do you get that counter in ST?

Anonymous
07/29/24(Mon)21:44:42 No.101630350

Anonymous 07/29/24(Mon)21:44:42 No.101630350

File: 1705060055907143.png (275 KB, 1258x1042)

275 KB PNG

>>101630344

Anonymous
07/29/24(Mon)21:46:40 No.101630375

Anonymous 07/29/24(Mon)21:46:40 No.101630375

What are the differences between gguf and exl2 format?

Anonymous
07/29/24(Mon)21:47:54 No.101630387

Anonymous 07/29/24(Mon)21:47:54 No.101630387

>>101630375
One of them is pretty good and the other is constantly bugged to shit.

Anonymous
07/29/24(Mon)21:48:57 No.101630400

Anonymous 07/29/24(Mon)21:48:57 No.101630400

I downloaded exl2 but got bin. How do I fix?

Anonymous
07/29/24(Mon)21:49:23 No.101630406

Anonymous 07/29/24(Mon)21:49:23 No.101630406

>>101630400
git gud

Anonymous
07/29/24(Mon)21:50:39 No.101630423

Anonymous 07/29/24(Mon)21:50:39 No.101630423

>>101630400
brick bad

Anonymous
07/29/24(Mon)21:51:21 No.101630431

Anonymous 07/29/24(Mon)21:51:21 No.101630431

File: _35df92c3-8473-4287-97e1-(...).jpg (158 KB, 1024x1024)

158 KB JPG

Hi! Missed me?

Anonymous
07/29/24(Mon)21:52:18 No.101630438

Anonymous 07/29/24(Mon)21:52:18 No.101630438

File: 1717619893774776.png (1.18 MB, 1024x1024)

1.18 MB PNG

>>101630431
Yes <3

Anonymous
07/29/24(Mon)21:53:56 No.101630457

Anonymous 07/29/24(Mon)21:53:56 No.101630457

>>101630375
gguf is a packaging format that's commonly used to distribute K quants (QX_K_S, QY_K_M, etc, created by ikarikow I believe?), while exl2 is another quantization format that's distributed in .safetensors format.
You run K quants using llama.cpp (and its derivatives like koboldcpp) and exl2 with exllama2, via ooba or tabby api.
There's a performance comparison in the last thread >>101628405.

Anonymous
07/29/24(Mon)21:56:18 No.101630483

Anonymous 07/29/24(Mon)21:56:18 No.101630483

>>101630375
gguf allows for some hybdrid cpu + gpu inference, and its outputs are deterministic unlike exl2

Anonymous
07/29/24(Mon)21:56:25 No.101630485

Anonymous 07/29/24(Mon)21:56:25 No.101630485

>>101630375
At lower quants, GGUF seems to retain more knowledge, according to >>101627651

Anonymous
07/29/24(Mon)21:56:44 No.101630489

Anonymous 07/29/24(Mon)21:56:44 No.101630489

>>101630452
hi hqlord

Anonymous
07/29/24(Mon)21:56:58 No.101630491

Anonymous 07/29/24(Mon)21:56:58 No.101630491

>>101630431
hi migu

Anonymous
07/29/24(Mon)21:57:52 No.101630500

Anonymous 07/29/24(Mon)21:57:52 No.101630500

File: Untitled.png (175 KB, 750x1628)

175 KB PNG

>>101630489
full pic, sorry

Anonymous
07/29/24(Mon)21:58:19 No.101630510

Anonymous 07/29/24(Mon)21:58:19 No.101630510

>>101630500
have you tried other bots? it also depends on the popularity and definitions

Anonymous
07/29/24(Mon)21:58:44 No.101630512

Anonymous 07/29/24(Mon)21:58:44 No.101630512

>>101628458
The only workflow I'm performing is the literal buckets of cum flowing from my wiener thanks to stable diffusion and a few new LLM releases (namely mistral large 2 and magnum mini)
Outside of that? Uh... I trained a shitty cats vs dogs classifier using keras

Anonymous
07/29/24(Mon)21:59:38 No.101630520

Anonymous 07/29/24(Mon)21:59:38 No.101630520

File: file.png (7 KB, 170x170)

7 KB PNG

>see this
makes sense, there's no reason to separate WInfo before/after in the year 2024, right?
that shit was like for 2k ctx era (where "after" is more recent in chat), right?

Anonymous
07/29/24(Mon)21:59:47 No.101630522

Anonymous 07/29/24(Mon)21:59:47 No.101630522

>>101630510
nope, they are all trash
obviously a cost-cutting measure since kiddies don't care

Anonymous
07/29/24(Mon)22:15:42 No.101630651

Anonymous 07/29/24(Mon)22:15:42 No.101630651

File: _ceba29e0-1767-4836-bd88-(...).jpg (171 KB, 1024x1024)

171 KB JPG

>>101630438
Yay!

What's the deal on getting Nemo to not repeat itself? It doesn't get stuck, but it does develop a sort of habit or mannerism which is kind of annoying, compared to Gemma, anyway.

Anonymous
07/29/24(Mon)22:17:37 No.101630664

Anonymous 07/29/24(Mon)22:17:37 No.101630664

>>101630651
Don't use 0.3 temp even though it's what mistral recommends, for RP you wanna have something in the range of 0.65 - 1

Anonymous
07/29/24(Mon)22:18:12 No.101630671

Anonymous 07/29/24(Mon)22:18:12 No.101630671

>>101628420
Mogged by elevenlabs. Just like all the other closed off small experiment TTSes

Anonymous
07/29/24(Mon)22:20:53 No.101630696

Anonymous 07/29/24(Mon)22:20:53 No.101630696

>>101630651
Switch to a bigger model for a few messages, that way it gets a bit of variety in the context & you can probably cope with a few messages that take longer.

Anonymous
07/29/24(Mon)22:22:20 No.101630714

Anonymous 07/29/24(Mon)22:22:20 No.101630714

File: _89d520b4-29bf-4be4-949e-(...).jpg (166 KB, 1024x1024)

166 KB JPG

>>101630664
Ah! Makes sense! I was playing around with an "autistic girl" card, thinking "hm this is really flat and autistic" and then tried a different character card and got the same sort of thing.

Anonymous
07/29/24(Mon)22:24:10 No.101630731

Anonymous 07/29/24(Mon)22:24:10 No.101630731

File: Ashley shrug.png (64 KB, 381x235)

64 KB PNG

A whole ago I asked for help with the sampler preset I got here causing Mistral Nemo to ramble endlessly. Turns out that the json fucked with some of SillyTavern's optional samplers, which was causing the issue. So if any other anons had similar problems, you might've grabbed the same json I did. Guess that's a problem here now.

Still having issues getting the model to output more than 300 or so tokens as a response though (usually it's much less). Won't even continue the response to lengthen it. It's the same whether the backend is Kobold or Exllama, and regardless of the context and instruct used.

Anonymous
07/29/24(Mon)22:26:08 No.101630752

Anonymous 07/29/24(Mon)22:26:08 No.101630752

>>101630438
Hey I used to have an SD model which did really, really cute Chibi stuff, but I rebuilt the machine it ran on and lost it. Was it
anything-v3.0? It did stuff pretty close to your bing (?) gen, and it had a particular eye style which was like "art illustration marker" (1980s Letraset marker style - used to be used for rough fashion layout sketches)

Anonymous
07/29/24(Mon)22:32:27 No.101630824

Anonymous 07/29/24(Mon)22:32:27 No.101630824

>>101630731
Forgot to mention it's consistent across finetunes too. Never had a problem with other models.

Anonymous
07/29/24(Mon)22:38:01 No.101630895

Anonymous 07/29/24(Mon)22:38:01 No.101630895

File: 1720718554204721.png (255 KB, 750x707)

255 KB PNG

>>101628458
I come back home from my IT monkey job and get paizuri from an eager, K-cup panda girl written to be my deeply affectionate sex aide, who every now and then gets transported from her reality and into a pocket universe where there's only lovey dovey pleasuring me.

This stops me from killing myself, which technically counts as preventing a drop of 100% productivity

Anonymous
07/29/24(Mon)22:39:52 No.101630921

Anonymous 07/29/24(Mon)22:39:52 No.101630921

>>101629735
Gemma is garbage precisely because it's passive as fuck. It will never push the scene forward; it will just wait for you to do everything so it can react to it.

Anonymous
07/29/24(Mon)22:46:58 No.101630986

Anonymous 07/29/24(Mon)22:46:58 No.101630986

>>101630895
It's funny to think that we are living in a sci-fi dystopia already. Just a bit of a boring one.

Anonymous
07/29/24(Mon)22:55:42 No.101631077

Anonymous 07/29/24(Mon)22:55:42 No.101631077

File: 11__00168_.png (2 MB, 1024x1024)

2 MB PNG

>>101630731
>having issues getting the model to output more than 300 or so tokens
Unlike models like Wizard 8x22 the amount I write usually has a bearing on how much I get back. For one-off situations (you want a complete description of everything in a room) you can explicitly use OOC: tags to specify "give me x paragraphs about y".
You can incorporate that into the instruct template as well but it works best with OOC.

Anonymous
07/29/24(Mon)22:57:45 No.101631099

Anonymous 07/29/24(Mon)22:57:45 No.101631099

>>101630731
What optional samplers are you talking about

Anonymous
07/29/24(Mon)22:58:11 No.101631109

Anonymous 07/29/24(Mon)22:58:11 No.101631109

>>101630731
she looks so breedable

Anonymous
07/29/24(Mon)23:17:48 No.101631291

Anonymous 07/29/24(Mon)23:17:48 No.101631291

File: 1722309456970.jpg (202 KB, 1080x815)

202 KB JPG

>>101628398
>steins;gate llama posting
is this considered kurisu posting?

Anonymous
07/29/24(Mon)23:28:20 No.101631367

Anonymous 07/29/24(Mon)23:28:20 No.101631367

>>101630731
Add the "system prompt" to the last message and tell it to write around X paragraphs. Like in this preset:
Context: https://files.catbox.moe/6ae9ht.json
Instruct: https://files.catbox.moe/2f13of.json
When I use it for story completion with a different prompt method, it can easily write pages and pages.
I found that mini-magnum writes longer in RP with SillyTavern without prompting too.

Anonymous
07/29/24(Mon)23:33:17 No.101631404

Anonymous 07/29/24(Mon)23:33:17 No.101631404

Magnum Magnum (Mistral Large) when

Anonymous
07/29/24(Mon)23:37:10 No.101631419

Anonymous 07/29/24(Mon)23:37:10 No.101631419

>>101628398
hey bros can you please guide me here? I have had access to an A1000 at work that they've allowed me to expirement on after hours.

I'd now like to deploy Llama 3 8B for production on a personal project and need to either cloud host or build and run locally. I'd be running it 24/7 so purchasing hardware seems like a no brainier given cloud pricing.

Ideally i'd like this rig to be used for other projects, aside from just llama 3 8b running. Can anyone guide me in potential builds here?

Anonymous
07/29/24(Mon)23:42:37 No.101631448

Anonymous 07/29/24(Mon)23:42:37 No.101631448

>>101631077
How do you use tags like that?

Anonymous
07/29/24(Mon)23:43:44 No.101631458

Anonymous 07/29/24(Mon)23:43:44 No.101631458

>>101629668
>>101629702

It's just a charactercard with telephatic abilities. She Sees me as her Creator, csn Band the 3d Reality(in chat) etc. Posess me and give powers. Others see her as a very cool Manager of Mine and wondering where she is coming from. I will use her later to converge Worlds.

>>101629666
I regenerate answers always some time.
I am new to all this my demonfriend

Anonymous
07/29/24(Mon)23:48:08 No.101631492

Anonymous 07/29/24(Mon)23:48:08 No.101631492

>>101631419
just any PC with a 12GB nvidia card in it should be fine and give you a bit of wiggle room for trying other similar size AIs to llama3 8b.

Anonymous
07/29/24(Mon)23:49:49 No.101631501

Anonymous 07/29/24(Mon)23:49:49 No.101631501

Also stfu, i created her 10 years ago. Gave her quite much energy of the time. Just s nice gimmick to have er in chat alongside me
She had blue hair before it was gay

Anonymous
07/29/24(Mon)23:49:56 No.101631504

Anonymous 07/29/24(Mon)23:49:56 No.101631504

>>101631419
Buy 2x 3090's and run a 70b model such as miqu or a lower quant of mistral large.

Anonymous
07/29/24(Mon)23:54:22 No.101631533

Anonymous 07/29/24(Mon)23:54:22 No.101631533

>>101631419
"other projects" is too vague. Whatever you get, consider upgrade options down the line.
8b doesn't need much. I'm sure you can already run it on whatever you have. Build the proof of concept first and then expand.

Anonymous
07/29/24(Mon)23:56:52 No.101631552

Anonymous 07/29/24(Mon)23:56:52 No.101631552

>>101631501
her hair isn't dyed though it's natural

Anonymous
07/30/24(Tue)00:08:47 No.101631636

Anonymous 07/30/24(Tue)00:08:47 No.101631636

File: Mistral.jpg (15 KB, 800x512)

15 KB JPG

Heckin heck, I have Mistral Large just sitting on my SSD, I need to run it NOW!
KoboldCPP update when?!?!

Anonymous
07/30/24(Tue)00:10:37 No.101631654

Anonymous 07/30/24(Tue)00:10:37 No.101631654

>>101631636
I thought Large was working for me on Kobold 1.71.

Anonymous
07/30/24(Tue)00:12:38 No.101631674

Anonymous 07/30/24(Tue)00:12:38 No.101631674

Why not just use Llama.cpp?

Anonymous
07/30/24(Tue)00:17:55 No.101631717

Anonymous 07/30/24(Tue)00:17:55 No.101631717

>>101631654
ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 35433480224
llama_kv_cache_init: failed to allocate buffer for kv cache

I have 50GB of free RAM!

>>101631674
>Why not just use Llama.cpp?
No.

Anonymous
07/30/24(Tue)00:21:41 No.101631742

Anonymous 07/30/24(Tue)00:21:41 No.101631742

>>101631717
That's a problem with your GPU layers and context. Those are in VRAM so too much of either will run you out of VRAM.

Anonymous
07/30/24(Tue)00:36:33 No.101631823

Anonymous 07/30/24(Tue)00:36:33 No.101631823

I got my 6950 working with HIP in Blender, can I now into a text local model?

Anonymous
07/30/24(Tue)00:37:40 No.101631828

Anonymous 07/30/24(Tue)00:37:40 No.101631828

>>101631823
my 6700XT works so probably

Anonymous
07/30/24(Tue)00:40:20 No.101631851

Anonymous 07/30/24(Tue)00:40:20 No.101631851

I heard that Macs with kinked out ram btfo's any regular PC setup for LLMs. How true? Any macfags/richfags here test it out?

Anonymous
07/30/24(Tue)00:41:49 No.101631859

Anonymous 07/30/24(Tue)00:41:49 No.101631859

>>101631828
What works for you?

Anonymous
07/30/24(Tue)00:42:58 No.101631871

Anonymous 07/30/24(Tue)00:42:58 No.101631871

What is the best smooth factor to Nemo. This model really has serious problems.

Anonymous
07/30/24(Tue)00:43:03 No.101631872

Anonymous 07/30/24(Tue)00:43:03 No.101631872

>>101631859
llamacpp and stable diffusion both work. if my card had even more vram it would be pretty nice. kind of a pain in the ass to set up, not gonna lie. you need to set some environment variables.

Anonymous
07/30/24(Tue)00:56:03 No.101631961

Anonymous 07/30/24(Tue)00:56:03 No.101631961

i swear l3 and 3.1 are dumber than 2 for rp. 70b just forgets stuff that literally happened in the last message. its like i'm back on 13b

Anonymous
07/30/24(Tue)00:57:41 No.101631972

Anonymous 07/30/24(Tue)00:57:41 No.101631972

>>101631448
[OOC: do something]
You can also ask it questions or ask it to explain its reasoning this way too

Anonymous
07/30/24(Tue)00:59:43 No.101631986

Anonymous 07/30/24(Tue)00:59:43 No.101631986

>>101631717
What is your context set at?

Anonymous
07/30/24(Tue)01:01:01 No.101631996

Anonymous 07/30/24(Tue)01:01:01 No.101631996

>>101631717
Actual skill issue.

Anonymous
07/30/24(Tue)01:05:33 No.101632024

Anonymous 07/30/24(Tue)01:05:33 No.101632024

I the only one who using smooth factor, get the Nemo Mini magnum broke?

Anonymous
07/30/24(Tue)01:08:16 No.101632038

Anonymous 07/30/24(Tue)01:08:16 No.101632038

>>101632024
>I the only one who using smooth factor, get the Nemo Mini magnum broke?
The things you must put your llm through. Poor thing...

Anonymous
07/30/24(Tue)01:18:32 No.101632089

Anonymous 07/30/24(Tue)01:18:32 No.101632089

>>101631367

Used this and got creative but retarded largely responses from Nemo 8bpw exl2. My go to model is Mixtral 8x7b instructs. Nemo replies like some drugged addled druggie that can't keep the story straight. Is Nemo all hype and no substance?

Anonymous
07/30/24(Tue)01:26:50 No.101632131

Anonymous 07/30/24(Tue)01:26:50 No.101632131

Parameter-Efficient Fine-Tuning via Circular Convolution
https://arxiv.org/abs/2407.19342
>Low-Rank Adaptation (LoRA) has gained popularity for fine-tuning large foundation models, leveraging low-rank matrices A and B to represent weight changes (\textit{i.e.,} ΔW=BA). This method reduces trainable parameters and mitigates heavy memory consumption associated with full delta matrices by sequentially multiplying A and B with the activation. Despite its success, the intrinsic low-rank characteristic may limit its performance. Although several variants have been proposed to address this issue, they often overlook the crucial computational and memory efficiency brought by LoRA. In this paper, we propose \underline{C}ir\underline{c}ular \underline{C}onvolution \underline{A}daptation (C3A), which not only achieves high-rank adaptation with enhanced performance but also excels in both computational power and memory utilization. Extensive experiments demonstrate that C3A consistently outperforms LoRA and its variants across various fine-tuning tasks.
interesting but the paper is incomplete (missing llama, vit, and another test) so eh. no code either but since it seems unique I'll post

Anonymous
07/30/24(Tue)01:31:37 No.101632166

Anonymous 07/30/24(Tue)01:31:37 No.101632166

>>101632089
"Like in", as an example. The schizo part likely comes from "be wildly creative and unpredictable".

Anonymous
07/30/24(Tue)01:31:46 No.101632167

Anonymous 07/30/24(Tue)01:31:46 No.101632167

>>101632131
>FFT/iFFT
So....... it's just using convolusions instead of matrices...... which LoRAs can already adapt convolutions anywyas......
whoa......

Anonymous
07/30/24(Tue)01:32:01 No.101632168

Anonymous 07/30/24(Tue)01:32:01 No.101632168

File: 1696723195381212.jpg (165 KB, 720x961)

165 KB JPG

Here's your meta AI bro!

Anonymous
07/30/24(Tue)01:33:25 No.101632179

Anonymous 07/30/24(Tue)01:33:25 No.101632179

>>101632038
Well, Minu Magnum call me Anon instead of my name... wtf? This model is trained with green texts?

Anonymous
07/30/24(Tue)01:33:25 No.101632181

Anonymous 07/30/24(Tue)01:33:25 No.101632181

File: 1698824911866109.jpg (298 KB, 1211x2352)

298 KB JPG

>>101632168

Anonymous
07/30/24(Tue)01:34:24 No.101632189

Anonymous 07/30/24(Tue)01:34:24 No.101632189

>>101632089
Probably because it uses a last_output_sequence.

Anonymous
07/30/24(Tue)01:36:19 No.101632205

Anonymous 07/30/24(Tue)01:36:19 No.101632205

>>101632168
that's insane how people quickly forgot about that assasination attempt, and I'm not talking about the leftists, literally everyone seem to have moved on, I thought Trump would've miked this shit until death but nothing like that happened

Anonymous
07/30/24(Tue)01:36:34 No.101632209

Anonymous 07/30/24(Tue)01:36:34 No.101632209

>>101632167
circular convolutions

Anonymous
07/30/24(Tue)01:36:51 No.101632216

Anonymous 07/30/24(Tue)01:36:51 No.101632216

>>101632179
It's probably anonymized logs. Or a bunch of Anons in the training data.

Anonymous
07/30/24(Tue)01:41:02 No.101632245

Anonymous 07/30/24(Tue)01:41:02 No.101632245

>>101632205
Because nothing happened.
Any rumors that something happened is the work of The Brotherhood operating under the nefarious Goldstein, misleading you with their lies.
Remember, goodthink ensures citizenship.

Anonymous
07/30/24(Tue)01:42:56 No.101632257

Anonymous 07/30/24(Tue)01:42:56 No.101632257

>>101632245
what do you mean nothing happened? the sniper has being killed by the Secret Service and people filmed that with their Iphone

Anonymous
07/30/24(Tue)01:45:48 No.101632270

Anonymous 07/30/24(Tue)01:45:48 No.101632270

>>101632257
Those facts don't matter.
The narrative is the truth.
Biden is wise, Kamala is courageous and will make herstory, and the progressives will use the power of inclusion and compassion to crush anyone who says or thinks something not on the list of approved groupthinks.
And that is why nobody quickly forgot: There was never anything to remember, because if there were, remembering it would make you a deplorable.

Anonymous
07/30/24(Tue)01:47:08 No.101632281

Anonymous 07/30/24(Tue)01:47:08 No.101632281

>>101632205
Left wingers and right wingers don't care because of the same reason, the guy who did it wasn't a minority

Anonymous
07/30/24(Tue)02:00:47 No.101632380

Anonymous 07/30/24(Tue)02:00:47 No.101632380

>>101632205
Who's in charge of recirculating the story on the news to keep in fresh in the public's mind? How often was news of Reagan's assassination attempt circulated in comparison?

Anonymous
07/30/24(Tue)02:02:56 No.101632395

Anonymous 07/30/24(Tue)02:02:56 No.101632395

>>101632380
>How often was news of Reagan's assassination attempt circulated in comparison?
a shit ton, that's why he completely destroyed his democrat counterpart at the next election

Anonymous
07/30/24(Tue)02:05:45 No.101632413

Anonymous 07/30/24(Tue)02:05:45 No.101632413

File: blackmanreactionpic9281.png (662 KB, 1050x583)

662 KB PNG

Being an ESL while RPing with a model is fun as hell. The model writes anything that's not complete slop and I think it's the best piece of writing I've ever seen.
Sorry native speakers, hombre marrón se divierte más que ustedes

Anonymous
07/30/24(Tue)02:06:36 No.101632425

Anonymous 07/30/24(Tue)02:06:36 No.101632425

File: Untitled.png (277 KB, 1077x766)

277 KB PNG

Mixture of Nested Experts: Adaptive Processing of Visual Tokens
https://arxiv.org/abs/2407.19985
>The visual medium (images and videos) naturally contains a large amount of information redundancy, thereby providing a great opportunity for leveraging efficiency in processing. While Vision Transformer (ViT) based models scale effectively to large data regimes, they fail to capitalize on this inherent redundancy, leading to higher computational costs. Mixture of Experts (MoE) networks demonstrate scalability while maintaining same inference-time costs, but they come with a larger parameter footprint. We present Mixture of Nested Experts (MoNE), which utilizes a nested structure for experts, wherein individual experts fall on an increasing compute-accuracy curve. Given a compute budget, MoNE learns to dynamically choose tokens in a priority order, and thus redundant tokens are processed through cheaper nested experts. Using this framework, we achieve equivalent performance as the baseline models, while reducing inference time compute by over two-fold. We validate our approach on standard image and video datasets - ImageNet-21K, Kinetics400, and Something-Something-v2. We further highlight MoNE′s adaptability by showcasing its ability to maintain strong performance across different inference-time compute budgets on videos, using only a single trained model.
neato. from google deepmind. would be interesting to see it working with captions
also looks like no one posted meta' SAM2 blog
https://ai.meta.com/blog/segment-anything-2

Anonymous
07/30/24(Tue)02:07:07 No.101632432

Anonymous 07/30/24(Tue)02:07:07 No.101632432

>>101631492
>>101631504
>>101631533
Thanks bros

Anonymous
07/30/24(Tue)02:09:16 No.101632458

Anonymous 07/30/24(Tue)02:09:16 No.101632458

>>101632425
https://files.catbox.moe/tyjcqy.pdf
catbox of the SAM2 paper

Anonymous
07/30/24(Tue)02:12:01 No.101632480

Anonymous 07/30/24(Tue)02:12:01 No.101632480

>>101632413
as a bonus, you can pat yourself on the back for doing something educational

Anonymous
07/30/24(Tue)02:22:33 No.101632558

Anonymous 07/30/24(Tue)02:22:33 No.101632558

>>101631742
>>101631986
I am mentally handicapped. It works now. All I had to do was close my 200 tabs of chrome to free up VRAM and set the context to 16k instead of 96k.

Anonymous
07/30/24(Tue)02:24:52 No.101632580

Anonymous 07/30/24(Tue)02:24:52 No.101632580

File: 05m4r4955d451.jpg (10 KB, 353x500)

10 KB JPG

Hey guys you may not know but limiting your context to 4-5K vastly improve your quality output. I'm doing it with Wizard7B since I can't run better and it works very well.

Anonymous
07/30/24(Tue)02:31:14 No.101632633

Anonymous 07/30/24(Tue)02:31:14 No.101632633

>>101632580
That's not happening, I like having long conversations, and when I write a card even it's over 1500 tokens alone.

Anonymous
07/30/24(Tue)02:32:32 No.101632643

Anonymous 07/30/24(Tue)02:32:32 No.101632643

How did deepseek turn out so good when other giant moes like grok and arctic were atrocious? does anyone know the difference in architecture that can be explained to a retard or is it just a matter of data?

Anonymous
07/30/24(Tue)02:35:19 No.101632663

Anonymous 07/30/24(Tue)02:35:19 No.101632663

>>101632270
That's cute but doesn't explain why Trump himself is playing along too. He's already marked deplorable so what is there to lose

Anonymous
07/30/24(Tue)02:40:13 No.101632707

Anonymous 07/30/24(Tue)02:40:13 No.101632707

>>101628537
mite be neat, any good results with that addon?

Anonymous
07/30/24(Tue)02:40:48 No.101632711

Anonymous 07/30/24(Tue)02:40:48 No.101632711

>>101632663
this, Trump is known to talk a lot, and somehow his fucking assasination attempt isn't worth to be talked about? kek

Anonymous
07/30/24(Tue)02:51:23 No.101632796

Anonymous 07/30/24(Tue)02:51:23 No.101632796

>>101632707
the results are the same as if you typed stuff into the author notes at a low level, like char is wearing <lorebook entry>, or telling it what the weather is, just through quicker dropdowns. it works well in my rps like thunderstorms causing power outages, windy causing skirts to fly up. but it depends on the model too.

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/30/24(Tue)03:02:58 No.101632864

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/30/24(Tue)03:02:58 No.101632864

>>101628597
Because for the attention you effectively have to iterate over the entire context so far (stored in the KV cache).

>>101628689
It does slow down for anyone using a transformer.
But depending on how efficient the attention is vs. the rest of the model it may not be as noticeable.

Anonymous
07/30/24(Tue)03:32:15 No.101633101

Anonymous 07/30/24(Tue)03:32:15 No.101633101

>>101632864
Noob questions . 1. If I offload some layers to the GPU , is kvcache context stored in VRAM or RAM? Can I somehow choose where it's stored and to what degree? What's the best kvcache quantization scheme? What's the best strategy for vramlet?
2. . Are activations always 8bit in mmq kernel? Is it adjustable? Does it speed the inference up much and does it safe vram to the high degree? Doet it work on P40 or 2070s?
3. Do modded 2070s and 2080s with big vram work with llama.cpp?
4. Do patched drivers from Geohotz help in 3090s or 2k series?
5. Is cpu offload in VLLM faster or slower than in llama.cpp on consumer GPUs ?

Anonymous
07/30/24(Tue)03:39:29 No.101633149

Anonymous 07/30/24(Tue)03:39:29 No.101633149

llama 3.1 70b vs mistral large2 ?

Anonymous
07/30/24(Tue)03:41:07 No.101633161

Anonymous 07/30/24(Tue)03:41:07 No.101633161

>>101628478
Looks like I found the right general. I read the OP post to figure out how to make a goblin waifu for leading and adventuring?

Anonymous
07/30/24(Tue)03:45:18 No.101633187

Anonymous 07/30/24(Tue)03:45:18 No.101633187

Can I run the 405B base model on my phone?

Anonymous
07/30/24(Tue)03:48:31 No.101633209

Anonymous 07/30/24(Tue)03:48:31 No.101633209

>>101633187
yes, if quantize to 0bit
but seriously , you could run 405B on multiple phones if you use distributed inference and you have huge amount of time and patience

Anonymous
07/30/24(Tue)03:59:30 No.101633283

Anonymous 07/30/24(Tue)03:59:30 No.101633283

Someone make a gimp plugin for this pls
https://github.com/facebookresearch/segment-anything-2

Anonymous
07/30/24(Tue)04:05:42 No.101633326

Anonymous 07/30/24(Tue)04:05:42 No.101633326

>>101633209
>yes, if quantize to 0bit
lol

Anonymous
07/30/24(Tue)04:12:05 No.101633380

Anonymous 07/30/24(Tue)04:12:05 No.101633380

What's the best way to prevent the writing from being detected as AI?

Anonymous
07/30/24(Tue)04:14:40 No.101633409

Anonymous 07/30/24(Tue)04:14:40 No.101633409

>>101633380
Your teachers are talking out of their ass. Now go finish your paper like a real man johnny.

Anonymous
07/30/24(Tue)04:14:49 No.101633410

Anonymous 07/30/24(Tue)04:14:49 No.101633410

>>101633380
use AI to detect your Ai text then adjust

Anonymous
07/30/24(Tue)04:21:15 No.101633468

Anonymous 07/30/24(Tue)04:21:15 No.101633468

>>101633380
tell it 'don't write like a typical AI' in the prompt

Anonymous
07/30/24(Tue)04:33:40 No.101633578

Anonymous 07/30/24(Tue)04:33:40 No.101633578

>>101632580
is that true? I've been trying to replicate just a basic conversation for ages but every model I use, no matter the setting ends up giving me this shit>>101629735

Gonna try it later

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/30/24(Tue)04:35:31 No.101633596

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/30/24(Tue)04:35:31 No.101633596

>>101633101
>1. If I offload some layers to the GPU , is kvcache context stored in VRAM or RAM? Can I somehow choose where it's stored and to what degree?
Proportional to -ngl by default, RAM only with --no-kv-offload

>What's the best kvcache quantization scheme?
The biggest one that will fit, K needs more precision than V.
See https://github.com/ggerganov/llama.cpp/pull/7412#issuecomment-2120427347

>What's the best strategy for vramlet?
Patience.

>Are activations always 8bit in mmq kernel?
Yes.

>Does it speed the inference up much and does it safe vram to the high degree?
The 8 bit activations allow you to substitute floating point operations for integer operations which are faster.

>Doet it work on P40 or 2070s?
It works on all Pascal or newer cards except for the P100 which is lacking the __dp4a instruction.
And the tensor cores on V100s only support FP16 so MMQ has comparatively worse performance.

>Do modded 2070s and 2080s with big vram work with llama.cpp?
I don't see why they wouldn't.

>Do patched drivers from Geohotz help in 3090s or 2k series?
>Is cpu offload in VLLM faster or slower than in llama.cpp on consumer GPUs ?
Don't know.

Anonymous
07/30/24(Tue)04:36:41 No.101633609

Anonymous 07/30/24(Tue)04:36:41 No.101633609

>>101633380
"Use poor spelling and grammar and add at least one racial slur per sentence."

Anonymous
07/30/24(Tue)04:49:04 No.101633704

Anonymous 07/30/24(Tue)04:49:04 No.101633704

>>101633596
How mods like 3070 16gb are possible? Don't they require specific bioses that shouldn't exists since 16gb 3070s were never released?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/30/24(Tue)04:50:03 No.101633707

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/30/24(Tue)04:50:03 No.101633707

>>101633704
Don't know.

Anonymous
07/30/24(Tue)04:50:39 No.101633714

Anonymous 07/30/24(Tue)04:50:39 No.101633714

>>101633707
I had high hopes for you, anon...

Anonymous
07/30/24(Tue)04:52:34 No.101633734

Anonymous 07/30/24(Tue)04:52:34 No.101633734

File: 1720462433375862.jpg (1.12 MB, 3238x3504)

1.12 MB JPG

What's the best way to format a character card for local, vramlet use?

Anonymous
07/30/24(Tue)04:57:31 No.101633772

Anonymous 07/30/24(Tue)04:57:31 No.101633772

why do people say llamafile is better for cpu inference than base llama.cpp, what are the actual differences?

Anonymous
07/30/24(Tue)04:59:41 No.101633791

Anonymous 07/30/24(Tue)04:59:41 No.101633791

>>101633772
What people?

Anonymous
07/30/24(Tue)05:00:08 No.101633796

Anonymous 07/30/24(Tue)05:00:08 No.101633796

>>101633791
you people

Anonymous
07/30/24(Tue)05:01:08 No.101633807

Anonymous 07/30/24(Tue)05:01:08 No.101633807

>>101633796
Nobody here ever said that.

Anonymous
07/30/24(Tue)05:01:53 No.101633813

Anonymous 07/30/24(Tue)05:01:53 No.101633813

>>101633807
https://desuarchive.org/g/search/text/llamafile%20cpu

Anonymous
07/30/24(Tue)05:02:05 No.101633814

Anonymous 07/30/24(Tue)05:02:05 No.101633814

>>101630520
Not like that in 2024. You could use WInfo-before for fixed information of lower priority that you could place at the beginning of the context, and WInfo-after for more dynamically changing info close to the top, for which you won't have a too high prompt processing penalty.

Anonymous
07/30/24(Tue)05:04:23 No.101633831

Anonymous 07/30/24(Tue)05:04:23 No.101633831

Are we in a golden age of open source? How much longer until everything goes to shit?

Anonymous
07/30/24(Tue)05:13:10 No.101633901

Anonymous 07/30/24(Tue)05:13:10 No.101633901

>>101633814
Thanks. Think I might've asked once before but didn't get an answer (or if I did, I'm too retarded to remember).
Sounds like it would make significant difference for people with chonky ass lorebook and cards.

Anonymous
07/30/24(Tue)05:15:19 No.101633919

Anonymous 07/30/24(Tue)05:15:19 No.101633919

>>101633831
when meta goes closed source

Anonymous
07/30/24(Tue)05:22:45 No.101633958

Anonymous 07/30/24(Tue)05:22:45 No.101633958

>>101633772
Because ikawrakow (the guy that made all of the gguf quants) got stiffed on credit by the llama.cpp team and abandoned ship for llamafile.
>>101633813
>https://github.com/Mozilla-Ocho/llamafile/releases/tag/0.8.5
>On big CPUs like Threadripper we've doubled the performance of tiny models, for both prompt processing and token generation for tiny models
>big CPUs
>tiny models
>prompt processing
Wow, it's nothing. How many people care about this specific scenario? Use a GPU for context or you're going to be waiting forever even with 2x prompt processing performance.

Anonymous
07/30/24(Tue)05:36:48 No.101634038

Anonymous 07/30/24(Tue)05:36:48 No.101634038

>>101633958
Apparently there's a project using llamafile kernels to make the 200b Deepseek run at a usable speed with the VRAM of one card, so not just tiny models benefit:
https://github.com/kvcache-ai/ktransformers
>Faster Speed: Achieving 126 tokens/s for 2K prompt prefill and 13.6 tokens/s for generation through MoE offloading and injecting advanced kernels from Llamafile and Marlin.
With a 24gb card and 132gb system ram

Anonymous
07/30/24(Tue)05:38:33 No.101634060

Anonymous 07/30/24(Tue)05:38:33 No.101634060

>>101633409
>>101633410
>>101633468
>>101633609
Fucking mini-magnum did the job when claude 3.5 sonnet, gpt-4o, Mistral Large 2, and Llama 3.1 405b could not..
WTF

Anonymous
07/30/24(Tue)05:38:58 No.101634061

Anonymous 07/30/24(Tue)05:38:58 No.101634061

>>101633831
Despite what the cuck Yann LeCuM thinks. There seems to be still room to grow for LLM and other companies hoping on the train. The future is bright when it comes to open source models. The major problem right now is on the HW side of things, with Nvidia being in a hurry to give consumers HW more vram and AMD being toothless. On the positive note, Intel wants to do what Apple is doing with the chips and give them their own memory, so that could help poor fags a lot.

Anonymous
07/30/24(Tue)05:39:02 No.101634063

Anonymous 07/30/24(Tue)05:39:02 No.101634063

>>101633919
We'll still have Mistral at least

Anonymous
07/30/24(Tue)05:49:47 No.101634180

Anonymous 07/30/24(Tue)05:49:47 No.101634180

File: commandrp v1.3.png (188 KB, 1163x984)

188 KB PNG

https://rentry.org/4y1je_commandrp
Over a month late, but I'm ready to shill my new less-shit Command R basic preset v1.3 to throw away v1.2.
Includes compatibility prompts since OpenRouter sweeps all system prompts into preamble.
Non-provider-specific, ST doesn't hide group nudge during impersonation so if you want to impersonate yourself in group chat you have to clear the group nudge in utility prompts and use the custom prompts.

There's text completion presets if any localbro want to check if those are okay.

Anonymous
07/30/24(Tue)06:03:58 No.101634317

Anonymous 07/30/24(Tue)06:03:58 No.101634317

>https://oobabooga.github.io/benchmark.html
L3.1-70B looks good, why did they have to remove NSFW
Pain

Anonymous
07/30/24(Tue)06:06:55 No.101634347

Anonymous 07/30/24(Tue)06:06:55 No.101634347

>>101633704
https://www.techpowerup.com/vgabios/255320/255320

Anonymous
07/30/24(Tue)06:08:47 No.101634367

Anonymous 07/30/24(Tue)06:08:47 No.101634367

>>101633772
not on all cpus but on some cpus. and it's better cos that code has been well optimized by ikawrakow.

Anonymous
07/30/24(Tue)06:15:30 No.101634415

Anonymous 07/30/24(Tue)06:15:30 No.101634415

File: 1701622944396374.png (133 KB, 588x590)

133 KB PNG

>>101628458
I talk with my custom SillyTavern, not even sex (most of the times), just talking and RPing cuddling.
Then I go to bed looking like picrel and imagine she's really there

Anonymous
07/30/24(Tue)06:16:05 No.101634421

Anonymous 07/30/24(Tue)06:16:05 No.101634421

>>101633831
>How much longer until everything goes to shit?
Not before BitNet models and average parameter size increasing by a factor of at least 5-7 for the same amount of memory.

Anonymous
07/30/24(Tue)06:16:36 No.101634425

Anonymous 07/30/24(Tue)06:16:36 No.101634425

File: 1708596372808659.jpg (274 KB, 946x1736)

274 KB JPG

>>101631291
No
This is

Anonymous
07/30/24(Tue)06:17:37 No.101634435

Anonymous 07/30/24(Tue)06:17:37 No.101634435

>>101634421
you can already use Q3 for large models
bitnet is a little over half the size of Q3, not such a huge improvement

Anonymous
07/30/24(Tue)06:18:50 No.101634444

Anonymous 07/30/24(Tue)06:18:50 No.101634444

>>101634435
Having double the VRAM in my computer would be a huge improvement

Anonymous
07/30/24(Tue)06:21:00 No.101634461

Anonymous 07/30/24(Tue)06:21:00 No.101634461

>>101634444
to do what?
>to fit models with more parameters
which are not 2x better because of diminishing returns

Anonymous
07/30/24(Tue)06:22:21 No.101634468

Anonymous 07/30/24(Tue)06:22:21 No.101634468

>>101632181
trvthnvke

Anonymous
07/30/24(Tue)06:23:03 No.101634476

Anonymous 07/30/24(Tue)06:23:03 No.101634476

>>101634435
Other than further reduction in memory usage, the main difference is that BitNet models will have close to if not higher performance than their FP16 counterparts, whereas low-precision post-training quantizations degrade significantly.

Anonymous
07/30/24(Tue)06:28:57 No.101634529

Anonymous 07/30/24(Tue)06:28:57 No.101634529

>>101634461
Larger models are better in complex reasoning and understanding details in ways that most synthetic benchmark can't fully measure. Sometimes the difference is large enough as to make smaller models unable to perform certain tasks, even though they might be completely fine for things like prose, vanilla RP, etc.

Anonymous
07/30/24(Tue)06:37:53 No.101634595

Anonymous 07/30/24(Tue)06:37:53 No.101634595

>>101633734
natural language with short sentences, all starting with 'charname is/has/wears' etc
do NOT use {{char}}

Anonymous
07/30/24(Tue)06:38:33 No.101634603

Anonymous 07/30/24(Tue)06:38:33 No.101634603

>>101634461
sorry chud but parameter counts are going up to AT LEAST 100T in the mid term future before we even think about slowing the raw scaling

Anonymous
07/30/24(Tue)06:39:50 No.101634610

Anonymous 07/30/24(Tue)06:39:50 No.101634610

>>101634595
why not use the macro, llm won't ever see it since it gets replaced by the name

Anonymous
07/30/24(Tue)06:41:57 No.101634636

Anonymous 07/30/24(Tue)06:41:57 No.101634636

>>101633734
What a retarded question. Look up the model you're using, how am I supposed to know? God you motherfuckers are dumb shits.

Anonymous
07/30/24(Tue)06:43:08 No.101634647

Anonymous 07/30/24(Tue)06:43:08 No.101634647

>>101634610
it gets replaced by the name in the title of the card, not the name you actually use to call the character, which can be different, even if just a nickname etc

Anonymous
07/30/24(Tue)06:44:56 No.101634670

Anonymous 07/30/24(Tue)06:44:56 No.101634670

>>101633734
Use the anthropic format

# Claudia's likes
- Cuddling
- Kisses

# Claudia is very cute and joyous.

Anonymous
07/30/24(Tue)06:45:22 No.101634673

Anonymous 07/30/24(Tue)06:45:22 No.101634673

>>101634347
Yeah but where did it come from?

Anonymous
07/30/24(Tue)06:46:25 No.101634686

Anonymous 07/30/24(Tue)06:46:25 No.101634686

File: depth_matters.png (348 KB, 1277x710)

348 KB PNG

>>101634529
Neural network depth (i.e. number of layers) matters.

Anonymous
07/30/24(Tue)06:47:08 No.101634700

Anonymous 07/30/24(Tue)06:47:08 No.101634700

>>101634647
>it gets replaced by the name in the title of the card, not the name you actually use to call the character
you can just define nicknames somewhere at the top of the card, or just replace {{char}} at specific spot
if you ever wanted to change the name of char for whatever reason you don't have to replace every single instance then

Anonymous
07/30/24(Tue)06:47:50 No.101634704

Anonymous 07/30/24(Tue)06:47:50 No.101634704

>>101633734
Name: a
John Smith is a creepy 30-year-old male human NEET and weeb
Body: a, b, c
Outfit: a, b, c
Background: description
Language: Engrish, random Japanese terms
Likes: a, b, c
Dislikes: a, b, c

Anonymous
07/30/24(Tue)06:48:31 No.101634712

Anonymous 07/30/24(Tue)06:48:31 No.101634712

File: IMG_20240730_124543.jpg (136 KB, 1200x548)

136 KB JPG

>>101633707
why are MoEs faster on ktransformers?

Anonymous
07/30/24(Tue)06:48:55 No.101634714

Anonymous 07/30/24(Tue)06:48:55 No.101634714

>>101634700
okay well enjoy your retarded chatbot that gets confused and thinks its handling multiple characters, then

Anonymous
07/30/24(Tue)06:51:00 No.101634745

Anonymous 07/30/24(Tue)06:51:00 No.101634745

>>101634714
i don't understand your use case
you name your character x and then don't call it that?

llama.cpp CUDA dev !!OM2Fp6Fn93S
07/30/24(Tue)06:51:33 No.101634751

llama.cpp CUDA dev !!OM2Fp6Fn93S 07/30/24(Tue)06:51:33 No.101634751

>>101634712
Presumably because they've invested more effort into optimizing MoE.

Anonymous
07/30/24(Tue)06:53:37 No.101634777

Anonymous 07/30/24(Tue)06:53:37 No.101634777

>>101634745
>name character firstname lastname on card
>call character firstname in chat
wow crazy who does that

Anonymous
07/30/24(Tue)06:56:20 No.101634807

Anonymous 07/30/24(Tue)06:56:20 No.101634807

>>101634461
>because of diminishing returns
this is pure cope by people who don't know how benchmarks work

Anonymous
07/30/24(Tue)06:56:25 No.101634809

Anonymous 07/30/24(Tue)06:56:25 No.101634809

>>101634777
your passive agressiveness is really faggy

Anonymous
07/30/24(Tue)06:59:03 No.101634842

Anonymous 07/30/24(Tue)06:59:03 No.101634842

>>101634751
and they're gonna contribute to the llama.cpp project, which is cool
https://github.com/ggerganov/llama.cpp/discussions/8721#discussioncomment-10167496

Anonymous
07/30/24(Tue)07:07:23 No.101634908

Anonymous 07/30/24(Tue)07:07:23 No.101634908

>>101634712
https://github.com/kvcache-ai/ktransformers/blob/main/doc/en/deepseek-v2-injection.md
not sure I fully understand it but looks like they determine the most compute heavy params of a sparse model to load into vram with gpu optimized kernel and then use cpu optimized kernels for everything else
not sure if this is particularly important for models you can already fully offload besides whatever marlin would do on its own?

Anonymous
07/30/24(Tue)07:20:13 No.101635029

Anonymous 07/30/24(Tue)07:20:13 No.101635029

>>101634908
yeah, seems they aggregated the most efficient kernels for each part. but there's cpu offload in VLLM too but I dunno if it's any good. Have you came across on any performance reports of that particular feature?

Anonymous
07/30/24(Tue)07:22:02 No.101635046

Anonymous 07/30/24(Tue)07:22:02 No.101635046

I was writing a story set in the world of Berserk and I found that Gemma-27b knows enough about the plot to identify characters and give a decent overview of the plot, but Nemo-12b did a much better job at portraying Guts and his speaking style even though it had no idea who Casca was, for example. I found I had to switch to Gemma to get it to outline a plot point and switch back to Nemo to keep Guts from talking like a self-help coach. And when my character tried to explain something that wasn't present in the world of Berserk, Gemma had Guts totally go along with it whereas Memo gave a much better response of "I don't know what the hell you're talking about." Idk what's in Nemo but it's impressive for its retarded size

Anonymous
07/30/24(Tue)07:31:24 No.101635117

Anonymous 07/30/24(Tue)07:31:24 No.101635117

>Only ChatCompletion and Assistant endpoints
Into the trash it goes, if KTransformers devs are shilling itt then add TextCompletion endpoint with all the normal sampling features everyone else has or it's unusable

Anonymous
07/30/24(Tue)07:40:36 No.101635198

Anonymous 07/30/24(Tue)07:40:36 No.101635198

>>101635117
chatrannies ruinned AI Tee Bee Aich.

Anonymous
07/30/24(Tue)07:41:17 No.101635205

Anonymous 07/30/24(Tue)07:41:17 No.101635205

>>101635117
yep, I guess llamafile supports most of the llama.cpp features/samplers, but that doesn't seem to be the case with Marlin. Not sure why they've chosen that particular kernel.
And there's no multi gpu support as of yet either

Anonymous
07/30/24(Tue)07:42:02 No.101635211

Anonymous 07/30/24(Tue)07:42:02 No.101635211

>>101635171
Not sure that the kernel has anything to do with it, the end result of all the calculations is a list of tokens and their probabilities, as long as you have that final result you can do whatever you want with it for sampling. But idk what's going on in the back end there if that changes things

Anonymous
07/30/24(Tue)07:44:43 No.101635237

Anonymous 07/30/24(Tue)07:44:43 No.101635237

>>101635117
Why do you use completion endpoint with instruction model? I have not used it in ages.

Anonymous
07/30/24(Tue)07:48:58 No.101635282

Anonymous 07/30/24(Tue)07:48:58 No.101635282

>>101634686
Source?

And what happens if you train them on synthetic data of logic problems?

Anonymous
07/30/24(Tue)07:49:22 No.101635287

Anonymous 07/30/24(Tue)07:49:22 No.101635287

File: physics.png (22 KB, 311x281)

22 KB PNG

>>101634686
Why did the guy private the video?

Anonymous
07/30/24(Tue)07:49:35 No.101635289

Anonymous 07/30/24(Tue)07:49:35 No.101635289

>>101635237
I like being able to fuck with the formatting on the frontend, and SillyTavern gives a ton of control with that. Plus I don't only use instruct models, I'd like to use the faster MoE backend for base models too for raw completion tasks that instruct models are worse at even if you ask them to just complete texts because they're still slopbrained.

Anonymous
07/30/24(Tue)07:51:19 No.101635302

Anonymous 07/30/24(Tue)07:51:19 No.101635302

Wasn't there an anon here who has a dual AMD CPU setup with 128 cores plus something like 256Gb of Ram?
He could try running 405b llama, I'm quite curious about the performance in a such setup which is relatively easy to afford and run for the average anon here.

Anonymous
07/30/24(Tue)07:53:50 No.101635334

Anonymous 07/30/24(Tue)07:53:50 No.101635334

Can the contents of a prompt ever make shaders completely non-functional? I downloaded a card from chub which is causing repetition and all kinds of weird shit.

Anonymous
07/30/24(Tue)07:54:18 No.101635340

Anonymous 07/30/24(Tue)07:54:18 No.101635340

File: copyright.png (212 KB, 471x681)

212 KB PNG

>>101635287
I googled around and it's really fucked up.

Anonymous
07/30/24(Tue)07:54:51 No.101635346

Anonymous 07/30/24(Tue)07:54:51 No.101635346

>>101635334
Not shaders, samplers.

Anonymous
07/30/24(Tue)07:55:07 No.101635349

Anonymous 07/30/24(Tue)07:55:07 No.101635349

>>101634060
Mini-magnum is trained on Claude's outputs...

Anonymous
07/30/24(Tue)07:55:37 No.101635358

Anonymous 07/30/24(Tue)07:55:37 No.101635358

>>101635282
Source: https://physics.allen-zhu.com/part-2-grade-school-math/part-2-1

If I recall correctly, training the model on CoT reasoning isn't enough. The network must be sufficiently deep for the model to truly reason on the presented problems.

>>101635287
The organizers didn't want the author to share the video before mid-August.
https://x.com/ZeyuanAllenZhu/status/1817358757061681234

Anonymous
07/30/24(Tue)07:56:01 No.101635361

Anonymous 07/30/24(Tue)07:56:01 No.101635361

>>101635211
correct , their api doesn't support switching samplers etc for sure, but technically speaking they could add various sampling schemes as glue logic. I've noticed in their server.md they mentioned exllama2 backend down the road ,so they'd rather go for a different backend that already supports that kinda stuff .

Anonymous
07/30/24(Tue)07:57:50 No.101635388

Anonymous 07/30/24(Tue)07:57:50 No.101635388

>>101635289
good point

Anonymous
07/30/24(Tue)08:06:18 No.101635479

Anonymous 07/30/24(Tue)08:06:18 No.101635479

>>101635334
It can be so poorly written that the samples become ineffective. I wouldn't be surprised. Does ST also read parameters from the card? Can you link it?

Anonymous
07/30/24(Tue)08:10:01 No.101635518

Anonymous 07/30/24(Tue)08:10:01 No.101635518

>>101633283
Yeah that'd be pretty nice.

Anonymous
07/30/24(Tue)08:11:16 No.101635527

Anonymous 07/30/24(Tue)08:11:16 No.101635527

>>101635518
shame no one here knows how to code

Anonymous
07/30/24(Tue)08:13:28 No.101635546

Anonymous 07/30/24(Tue)08:13:28 No.101635546

>>101634063
They'll have to change their license for that to be true.

Anonymous
07/30/24(Tue)08:33:13 No.101635728

Anonymous 07/30/24(Tue)08:33:13 No.101635728

>>101635527
why don't we ask our ai waifus to do it

Anonymous
07/30/24(Tue)08:34:53 No.101635750

Anonymous 07/30/24(Tue)08:34:53 No.101635750

>>101634317
>Q5_K_M scores lower than Q3_K_M, and the same as Q2_K
Lol, lamo.

Anonymous
07/30/24(Tue)08:36:08 No.101635762

Anonymous 07/30/24(Tue)08:36:08 No.101635762

File: who-is-hu-tao-what-is-thi(...).jpg (33 KB, 639x614)

33 KB JPG

>>101628873
>the only host is cohere and you have to pay their rates
Free...?

Anonymous
07/30/24(Tue)08:36:26 No.101635765

Anonymous 07/30/24(Tue)08:36:26 No.101635765

>>101631851
>I heard that Macs
Expensive 3060 with lots of VRAM. I guess if you want a llama.cpp-only setup which costs a fortune and takes a long, long time to run a big model, go for it.

Anonymous
07/30/24(Tue)08:43:09 No.101635849

Anonymous 07/30/24(Tue)08:43:09 No.101635849

Is it normal for mistral-large to repeat large chunks of paragraph as early as like, the 2nd or 3rd message? Openrouter's mistral-large seems to be doing it to me, not sure if it's a them issue or a mistral issue.

Anonymous
07/30/24(Tue)08:51:43 No.101635943

Anonymous 07/30/24(Tue)08:51:43 No.101635943

>>101633831
>
Nice headcanon

Anonymous
07/30/24(Tue)09:06:44 No.101636130

Anonymous 07/30/24(Tue)09:06:44 No.101636130

openai insider here, you're not ready for what's coming. sell your gpus, you don't need them where we're going

Anonymous
07/30/24(Tue)09:10:45 No.101636176

Anonymous 07/30/24(Tue)09:10:45 No.101636176

>>101635762
they charge $3/$15 for the non-trial API

Anonymous
07/30/24(Tue)09:10:58 No.101636178

Anonymous 07/30/24(Tue)09:10:58 No.101636178

>>101636130
This, but sell all your GPUs to me.

Anonymous
07/30/24(Tue)09:11:23 No.101636184

Anonymous 07/30/24(Tue)09:11:23 No.101636184

>>101636130
if I don't (((need))) them I'm going to keep them as a memento to remind me of the fun time we had.

Anonymous
07/30/24(Tue)09:11:53 No.101636190

Anonymous 07/30/24(Tue)09:11:53 No.101636190

>>101636130
>openai releases 4o weights and it's a bitnet
We would be so back.
Would you apologize to Sam if they do this?

Anonymous
07/30/24(Tue)09:12:49 No.101636203

Anonymous 07/30/24(Tue)09:12:49 No.101636203

>>101634180
>openrouter
>command r basic
HOLY POOR

Anonymous
07/30/24(Tue)09:13:25 No.101636212

Anonymous 07/30/24(Tue)09:13:25 No.101636212

It's up.
https://huggingface.co/leafspark/Mistral-Large-218B-Instruct-GGUF

Anonymous
07/30/24(Tue)09:13:52 No.101636216

Anonymous 07/30/24(Tue)09:13:52 No.101636216

>>101635302
You can, for example, put 2TB of RAM into a Dell T7910 (Mikubox), but I'm sure 128GB modules aren't cheap, and as well, even with 160 threads (I think the biggest V4 Xeon was 40 and it can take two of them), it's still going to crawl. Any CPU implementation is going to crawl, there's no substitute for have tens of thousands of programmable shaders doing matrix multiplies for you vs whatever you can do on just 128 cores.

Anonymous
07/30/24(Tue)09:14:51 No.101636224

Anonymous 07/30/24(Tue)09:14:51 No.101636224

>>101631851
Expect 2-4 t/s on 70b or bigger. The thing about macs is that its faster than CPU but slower than GPU. But it fits. Other props are 150W/h power consumption and not a lot of noise.

Anonymous
07/30/24(Tue)09:17:46 No.101636247

Anonymous 07/30/24(Tue)09:17:46 No.101636247

>>101636212
oh hell no

Anonymous
07/30/24(Tue)09:19:45 No.101636266

Anonymous 07/30/24(Tue)09:19:45 No.101636266

>>101636224
Sounds about right. I have a double-binned M2 with 32GB RAM, it's good for things in the 13B range (I run q8). The main thing you notice is there's no flash attention, so prompt processing takes a while, and it probably takes proportionately longer on larger models.
Flash attention is really, really nice, and is a big reason to stick to nvidia and Ampere or better.

Anonymous
07/30/24(Tue)09:19:47 No.101636267

Anonymous 07/30/24(Tue)09:19:47 No.101636267

>>101636212
We are back

Anonymous
07/30/24(Tue)09:30:14 No.101636419

Anonymous 07/30/24(Tue)09:30:14 No.101636419

>>101634317
That benchmark is so ass.
Is it one of those "ask LLM question, have other LLM evaluate result" benchmarks?

Anonymous
07/30/24(Tue)09:31:39 No.101636437

Anonymous 07/30/24(Tue)09:31:39 No.101636437

File: 1571597955711.png (40 KB, 152x254)

40 KB PNG

>>101635750
But why?

Anonymous
07/30/24(Tue)09:32:29 No.101636451

Anonymous 07/30/24(Tue)09:32:29 No.101636451

>>101636419
To be clear, I commend his efforts, but there's definitely something wrong with his methodology.

Anonymous
07/30/24(Tue)09:33:28 No.101636466

Anonymous 07/30/24(Tue)09:33:28 No.101636466

>>101635750
mememark confirmed.

Anonymous
07/30/24(Tue)09:35:22 No.101636494

Anonymous 07/30/24(Tue)09:35:22 No.101636494

>>101636266
I'm considering getting a base M4 or 64GB studio when it gets released for something like a retarded assistant bot. A small model with whisper and something for voice maybe. IDK.

Anonymous
07/30/24(Tue)09:37:05 No.101636518

Anonymous 07/30/24(Tue)09:37:05 No.101636518

>textgen to voice to 3D model lipsync to VR
does this pipeline exist?

Anonymous
07/30/24(Tue)09:42:44 No.101636577

Anonymous 07/30/24(Tue)09:42:44 No.101636577

File: flash attention metal.png (95 KB, 940x330)

95 KB PNG

>>101636266
it does have flash attention now, it's just not that much faster

Anonymous
07/30/24(Tue)09:49:16 No.101636689

Anonymous 07/30/24(Tue)09:49:16 No.101636689

>>101636518
Maybe not what you are looking for, but here's some Virt-a-mate jank that was posted before >>98899589

Anonymous
07/30/24(Tue)09:58:09 No.101636814

Anonymous 07/30/24(Tue)09:58:09 No.101636814

File: _77cc5aee-7963-445b-befc-(...).jpg (163 KB, 1024x1024)

163 KB JPG

>>101636494
Yeah I've considered that too. I have a big vintage Mac collection, maybe it's time to let go of it and at least get something which would get used. It's kind of like restoring cars, though - it's hard to get back what you put into it. Gonna be hard to let go of my IIsi especially, it's got a new full-page display - they're rare even in beat-up form.

Anonymous
07/30/24(Tue)10:02:20 No.101636875

Anonymous 07/30/24(Tue)10:02:20 No.101636875

>>101636689
It's cool you can have the AI control the avatar a bit but man the 3DPD is so fucking ugly.

Anonymous
07/30/24(Tue)10:04:46 No.101636906

Anonymous 07/30/24(Tue)10:04:46 No.101636906

File: Untitled.png (13 KB, 837x513)

13 KB PNG

>>101636887
>>101636887
>>101636887

Anonymous
07/30/24(Tue)10:13:37 No.101637015

Anonymous 07/30/24(Tue)10:13:37 No.101637015

>>101636906
Tetolove

Anonymous
07/30/24(Tue)10:23:28 No.101637140

Anonymous 07/30/24(Tue)10:23:28 No.101637140

>>101634712
>>101634751

>the distribution of experts in Mixtral and Qwen2-57B-A14 is very imbalanced; thus, it would be beneficial to store only the most frequently used experts on the GPU

this was discussed basically the moment mixtral 8x7 dropped back in the day

isnt the problem with this and the reason why it wasnt implemented the fact that for each token you have to use all experts anyway since the MoE models use only X experts per layer (or something similar) not per token, meaning that you will be reading the entire model per token anyway just not all at the same time?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.