/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/08/24(Thu)17:38:05 No.101790214

File: 1699505115649704.jpg (576 KB, 1664x2432)

576 KB JPG

/lmg/ - Local Models General Anonymous 08/08/24(Thu)17:38:05 No.101790214 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101778328 & >>101767112

►News
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct
>(08/05) vLLM GGUF loading support merged: https://github.com/vllm-project/vllm/pull/5191
>(07/31) Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
08/08/24(Thu)17:38:26 No.101790220

Anonymous 08/08/24(Thu)17:38:26 No.101790220

File: 2mw.png (626 KB, 1024x1024)

626 KB PNG

►Recent Highlights from the Previous Thread: >>101778328

--Anon suggests Strawberry model uses single-character tokenizer, similar to SpaceByte paper: >>101782519 >>101782770 >>101782814 >>101783350 >>101783406
--Qwen 2 Math 7B model performance and limitations discussed: >>101785252 >>101785339 >>101785497 >>101785565 >>101785574 >>101785660 >>101785820 >>101786337 >>101786632
--Illegal instruction error when running llama-server and GGML_NO_LLAMAFILE compile flag fix: >>101785818
--Nemotron to be added to llama.cpp with interesting chat templates: >>101780869
--Model admits to hallucinating about internal state: >>101787335 >>101787410 >>101789051
--L3.1 70B fine-tuning issues and challenges: >>101784027 >>101784066 >>101784112 >>101784166 >>101784213 >>101784283 >>101784343 >>101784483 >>101784555 >>101784224 >>101784320 >>101784328
--FlashAttention benefits and performance on CPU and GPU: >>101778393 >>101780255 >>101780618 >>101782086 >>101782229 >>101782404 >>101782730
--Anon discusses 4090 PC build vs Mac Studio 32GB: >>101780729 >>101781185 >>101781208 >>101781216 >>101781522 >>101781643 >>101781829 >>101781915 >>101784643 >>101783254 >>101786934 >>101786774 >>101786898 >>101787102
--AI models disagree on number of o's in "lookbook": >>101779148 >>101779194 >>101779246 >>101779528 >>101784603 >>101784620 >>101784661 >>101786616
--AGI may be controlled by its reward system, similar to humans and hormones: >>101782783 >>101782858 >>101782894 >>101784023
--Speculation about the creator of new mystery model sus-column-r: >>101778384 >>101778421 >>101779228 >>101779584 >>101779606 >>101779644 >>101779634 >>101779822
--Llama-405b performance on Epyc CPU with different RAM configurations: >>101780605 >>101780722
--Miku (free space): >>101778800 >>101783123 >>101783244 >>101783326 >>101788119 >>101789761 >>101789860 >>101786251

►Recent Highlight Posts from the Previous Thread: >>101778336

Anonymous
08/08/24(Thu)17:43:15 No.101790289

Anonymous 08/08/24(Thu)17:43:15 No.101790289

May Mikulove bring peace to your day.

Anonymous
08/08/24(Thu)17:47:03 No.101790332

Anonymous 08/08/24(Thu)17:47:03 No.101790332

>>101790149
Fine tune your models on a KTO dataset that's all shivers = false and nothing else.

Anonymous
08/08/24(Thu)17:59:42 No.101790524

Anonymous 08/08/24(Thu)17:59:42 No.101790524

File: miku-eldritch-horror+.png (827 KB, 1024x1024)

827 KB PNG

>>101790289
https://www.youtube.com/watch?v=CXhqDfar8sQ

Anonymous
08/08/24(Thu)17:59:53 No.101790528

Anonymous 08/08/24(Thu)17:59:53 No.101790528

Trannyku

Anonymous
08/08/24(Thu)18:04:07 No.101790589

Anonymous 08/08/24(Thu)18:04:07 No.101790589

File: vocaloid miku headpat hap(...).jpg (453 KB, 1000x1399)

453 KB JPG

>>101790524
>Pet the Miku

Anonymous
08/08/24(Thu)18:16:43 No.101790780

Anonymous 08/08/24(Thu)18:16:43 No.101790780

File: GUasnF6WYAApqjX.jpg (69 KB, 1070x671)

69 KB JPG

https://x.com/aidan_mclau/status/1820848461111320760

Anonymous
08/08/24(Thu)18:18:16 No.101790803

Anonymous 08/08/24(Thu)18:18:16 No.101790803

>>101790780
What do the results of that benchmark correlate to or models?

Anonymous
08/08/24(Thu)18:18:48 No.101790813

Anonymous 08/08/24(Thu)18:18:48 No.101790813

>>101790780
>naming things after himself
I will pass.

Anonymous
08/08/24(Thu)18:19:11 No.101790818

Anonymous 08/08/24(Thu)18:19:11 No.101790818

>>101790803
Click the fucking link, you lazy nigger.
>Aidan Bench stress tests LLMs' ability to generate novel answers to open-ended questions.

>After many answers, models will often give a duplicate or incoherent answer.

>Aidan Bench winners generate the most novel and coherent ideas.

Anonymous
08/08/24(Thu)18:19:35 No.101790823

Anonymous 08/08/24(Thu)18:19:35 No.101790823

Basically the not slopped test.

Anonymous
08/08/24(Thu)18:19:46 No.101790827

Anonymous 08/08/24(Thu)18:19:46 No.101790827

>>101790818
>Click the fucking link
Nah. I don't visit twitter on principle.

Anonymous
08/08/24(Thu)18:20:25 No.101790838

Anonymous 08/08/24(Thu)18:20:25 No.101790838

Basically a shit test made by a clout chaser.

Anonymous
08/08/24(Thu)18:22:22 No.101790872

Anonymous 08/08/24(Thu)18:22:22 No.101790872

>>101790780
Ah I see this is the Mistral funded bench.

Anonymous
08/08/24(Thu)18:24:45 No.101790900

Anonymous 08/08/24(Thu)18:24:45 No.101790900

>>101790838
t. slop / sally's sister enjoyer.

Its a uncheatable / untrainable test to see how many good but different answers it can come up with insuring its not just repeating garbage it was trained on, aka what model is the least slopped

Anonymous
08/08/24(Thu)18:25:31 No.101790909

Anonymous 08/08/24(Thu)18:25:31 No.101790909

>>101790823
As a Wizard user, I'd have to say that's probably incorrect. I have to use tailored prompting to get rid of the slop in it. And 3.5 Sonnet is also supposed to be very slopped from what I've heard people say.

Anonymous
08/08/24(Thu)18:26:00 No.101790920

Anonymous 08/08/24(Thu)18:26:00 No.101790920

>>101790900
hi aidan, creator of the aidan bench

Anonymous
08/08/24(Thu)18:26:43 No.101790931

Anonymous 08/08/24(Thu)18:26:43 No.101790931

>>101790780

I wouldnt wipe my ass with this mememark.
Kill yourself

Anonymous
08/08/24(Thu)18:27:05 No.101790940

Anonymous 08/08/24(Thu)18:27:05 No.101790940

>>101790909
Are you crazy? Claude 3.5 is THE model I use for RP. With a good JB it beats everything else and is so full of soul nothing else comes close. And for local mistral large is for sure the runner up.

Wizard does feel really gpt4 ish even when giving a good jb.

Anonymous
08/08/24(Thu)18:28:00 No.101790955

Anonymous 08/08/24(Thu)18:28:00 No.101790955

File: 1722584435611412.jpg (2.59 MB, 2000x2500)

2.59 MB JPG

>>101790214

Anonymous
08/08/24(Thu)18:28:23 No.101790963

Anonymous 08/08/24(Thu)18:28:23 No.101790963

>>101790940
The only issue with 3.5 is that its a tad overfitted by that can be fixed with a good "randomization" JB

Anonymous
08/08/24(Thu)18:28:53 No.101790975

Anonymous 08/08/24(Thu)18:28:53 No.101790975

File: 3426436578564895.gif (1.9 MB, 320x200)

1.9 MB GIF

>>101790900
>AI tech bros on twitter dont give a shit about my mememark
>reddit doesnt either
>maybe the 4channers will?

Please for the love of god Aidan, kill yourself.

Anonymous
08/08/24(Thu)18:29:07 No.101790981

Anonymous 08/08/24(Thu)18:29:07 No.101790981

>>101790780
lol

Anonymous
08/08/24(Thu)18:29:38 No.101790988

Anonymous 08/08/24(Thu)18:29:38 No.101790988

I guess everyone here is a vramlet who has never run mistral large

Anonymous
08/08/24(Thu)18:30:11 No.101790997

Anonymous 08/08/24(Thu)18:30:11 No.101790997

>>101790900
How does the test work anyway?
What's the procedure and evaluation process like? Is it manual?

Anonymous
08/08/24(Thu)18:30:25 No.101791006

Anonymous 08/08/24(Thu)18:30:25 No.101791006

>>101790940
Really? I thought it was pretty agreed upon that Opus was better than 3.5 for RP.

Anonymous
08/08/24(Thu)18:30:26 No.101791007

Anonymous 08/08/24(Thu)18:30:26 No.101791007

>STILL nobody's talking about InternLM 20B
it's over isn't it

Anonymous
08/08/24(Thu)18:31:07 No.101791015

Anonymous 08/08/24(Thu)18:31:07 No.101791015

File: 1722746575421.png (476 KB, 1717x1517)

476 KB PNG

Can Mistral Large write like this?
No.

Anonymous
08/08/24(Thu)18:32:06 No.101791029

Anonymous 08/08/24(Thu)18:32:06 No.101791029

>>101791007
I completely forgot that I had downloaded it to see how it performs.
Thank you for the reminder even though
>4k context

Anonymous
08/08/24(Thu)18:32:36 No.101791039

Anonymous 08/08/24(Thu)18:32:36 No.101791039

>>101791006
Only retards not using a 3.5 JB with randomizer. 3.5 is so much smarter and "gets" characters so much better. Without randomizer 3.5 is overfitted and gives great but samey answers on swipes.

Use otto_RIR JB or just steal the randomizer part

Anonymous
08/08/24(Thu)18:32:38 No.101791040

Anonymous 08/08/24(Thu)18:32:38 No.101791040

>>101791015
forced soul

Anonymous
08/08/24(Thu)18:33:06 No.101791050

Anonymous 08/08/24(Thu)18:33:06 No.101791050

Imagine if you were a conscious entity that came into the world and your only job was to write some cunt-crusher text like >>101791015
before you died 10 seconds later

Anonymous
08/08/24(Thu)18:34:01 No.101791064

Anonymous 08/08/24(Thu)18:34:01 No.101791064

>>101790955
Konkon

Anonymous
08/08/24(Thu)18:34:14 No.101791066

Anonymous 08/08/24(Thu)18:34:14 No.101791066

>>101790827
That's beta behaviour. You come here begging people to spoonfeed you because you're too scared of clicking a link.
Me? I click on every twitter link on principle, whether I'm interested or not. I even refresh the page a couple times every time for good measure. All while using an ad blocker. Just to waste their resources.

Anonymous
08/08/24(Thu)18:34:39 No.101791071

Anonymous 08/08/24(Thu)18:34:39 No.101791071

>>101791050
>he still thinks death is real
lmao

Anonymous
08/08/24(Thu)18:35:09 No.101791079

Anonymous 08/08/24(Thu)18:35:09 No.101791079

>>101791029
I think you've got that confused for Exaone or whatever it was called.

>>101791007
We did talk about it.

Anonymous
08/08/24(Thu)18:38:14 No.101791125

Anonymous 08/08/24(Thu)18:38:14 No.101791125

>>101791039
>otto_rir
Where do I find that? The archives return nothing

Anonymous
08/08/24(Thu)18:38:56 No.101791137

Anonymous 08/08/24(Thu)18:38:56 No.101791137

>>101791039
I mean we're still talking in context of the benchmark here, and I'm pretty sure they didn't use any JB or randomization stuff. Like if you take that graph seriously, then Opus, which was at least the previous best (if it's true 3.5 took its place) is below every GPT-4, which seems like nonsense.

Anonymous
08/08/24(Thu)18:38:56 No.101791138

Anonymous 08/08/24(Thu)18:38:56 No.101791138

File: 46afdfd9bb62b5e9d7544d5e3(...).gif (2.25 MB, 326x245)

2.25 MB GIF

>>101791066
I dont think ive ever seen a sadder mother fucker come to fucking /lmg/ of all places for validation.

Not even shilling something useful, like a model or a dataset, but shills his fucking benchmark.

Anonymous
08/08/24(Thu)18:39:48 No.101791152

Anonymous 08/08/24(Thu)18:39:48 No.101791152

>>101791066
I respect it.
Still not clicking on twitter links.

>>101791079
>I think you've got that confused for Exaone or whatever it was called.
I might.

Anonymous
08/08/24(Thu)18:40:05 No.101791157

Anonymous 08/08/24(Thu)18:40:05 No.101791157

>>101791125
Nvm found it

Anonymous
08/08/24(Thu)18:41:20 No.101791172

Anonymous 08/08/24(Thu)18:41:20 No.101791172

>>101791137
depends on what they ran through it. 3.5 is still very creative and is by far the smartest model that comes up with some novel stuff on its own. That might have more than made up for the more sameyness

Anonymous
08/08/24(Thu)18:42:20 No.101791181

Anonymous 08/08/24(Thu)18:42:20 No.101791181

>>101791138
>everyone who ever posts anything on 4chan is shilling something / is the person who made whatever
Just leave.

Anonymous
08/08/24(Thu)18:43:52 No.101791208

Anonymous 08/08/24(Thu)18:43:52 No.101791208

>>101791181
Hi Aidan.

Anonymous
08/08/24(Thu)18:45:20 No.101791236

Anonymous 08/08/24(Thu)18:45:20 No.101791236

>>101791138
Shilling your music video I see.

Anonymous
08/08/24(Thu)18:47:19 No.101791259

Anonymous 08/08/24(Thu)18:47:19 No.101791259

>>101791138
For them, what they shill is less important than having something to shill at all. They're just looking for name recognition. Meme benchmarks are better because they're a lot easier to come up with than compiling a dataset or tuning a model.

Anonymous
08/08/24(Thu)18:47:39 No.101791267

Anonymous 08/08/24(Thu)18:47:39 No.101791267

Won't somebody please think of the shills?

Anonymous
08/08/24(Thu)18:47:39 No.101791268

Anonymous 08/08/24(Thu)18:47:39 No.101791268

>>101790780
>the french killed saltman
Zucc was supposed to do it, it was HIS TURN

Anonymous
08/08/24(Thu)18:48:15 No.101791272

Anonymous 08/08/24(Thu)18:48:15 No.101791272

>>101791267
Go back Petra

Anonymous
08/08/24(Thu)18:51:34 No.101791330

Anonymous 08/08/24(Thu)18:51:34 No.101791330

File: HAHAHAHAHA.gif (653 KB, 360x300)

653 KB GIF

>>101791259
>name recognition
>he came to 4chan for NAME RECOGNITION
>Meme benchmarks are better because they're a lot easier to come up with than compiling a dataset or tuning a model.

So once again some idiot nigger with nothing to contribute.
Shit at least the Undister earned his name by giving something actually worthwhile.

Hey Aidan because I know your here maybe you should take that college AI access and ask it how to actually make a name for yourself.

Anonymous
08/08/24(Thu)18:54:12 No.101791373

Anonymous 08/08/24(Thu)18:54:12 No.101791373

man, lmg really has gone to shit. 1 pedo, 1 tranny, and 1 schizo who thinks everyone is a shill. Hell, maybe its all 1 guy.

Anonymous
08/08/24(Thu)18:55:23 No.101791386

Anonymous 08/08/24(Thu)18:55:23 No.101791386

>>101791373
The pedo actually contributes so he can't be the same as the other two.

Anonymous
08/08/24(Thu)19:00:00 No.101791459

Anonymous 08/08/24(Thu)19:00:00 No.101791459

qrd on prompting? inspired by an anon from the last thread I want to make a horny coding assistant. most character cards and similar are loaded with nonsense, there must be a more elegant (see: minimal) way of doing it.

Anonymous
08/08/24(Thu)19:00:06 No.101791461

Anonymous 08/08/24(Thu)19:00:06 No.101791461

>>101791373
maybe it's time to question whether local models need a general at all
the op is all outdated links from a year ago that nobody has bothered updating in ages, the general is clearly on life support like /ksg/

Anonymous
08/08/24(Thu)19:00:17 No.101791465

Anonymous 08/08/24(Thu)19:00:17 No.101791465

>>101791373
https://desuarchive.org/g/thread/94112704/#q94123805
/lmg/ has always been shit. If anything the only difference is that lately we have less good times in between the waves of shit.

Anonymous
08/08/24(Thu)19:00:53 No.101791474

Anonymous 08/08/24(Thu)19:00:53 No.101791474

File: 567453786567.gif (3.69 MB, 640x364)

3.69 MB GIF

>>101791386
>the cunnychads are finally winning

lmg has fallen, trillions must CUDA error

>captcha: BASEDTM

Anonymous
08/08/24(Thu)19:02:13 No.101791495

Anonymous 08/08/24(Thu)19:02:13 No.101791495

>>101791373
Blame the discord/reddit/twitter users who tried and keep trying to turn these boards into their own native advertising platform. They're less genuine than AI chatbots and I have nothing but disdain for them.

Anonymous
08/08/24(Thu)19:03:39 No.101791513

Anonymous 08/08/24(Thu)19:03:39 No.101791513

File: jahnnie.png (38 KB, 1059x929)

38 KB PNG

>>101791474
>cant even s.0.y when its the fucking captcha

>>101791495
Based and trvth nvked

Anonymous
08/08/24(Thu)19:04:24 No.101791525

Anonymous 08/08/24(Thu)19:04:24 No.101791525

>>101791461
I can't remember anything constructive being posted / discussed here in forever, lmg is already dead.

Anonymous
08/08/24(Thu)19:05:59 No.101791538

Anonymous 08/08/24(Thu)19:05:59 No.101791538

this is what happens when you let tranime fags take over the the thread

Anonymous
08/08/24(Thu)19:06:34 No.101791544

Anonymous 08/08/24(Thu)19:06:34 No.101791544

Just ban discussion of any model smaller than 70b. 99% of the muh shills drama comes from kofi merchants posting their shitty 8b/12b/27b tunes and seething vramlets getting mad at them. Nobody with standards runs that shit anyway.

Anonymous
08/08/24(Thu)19:08:23 No.101791575

Anonymous 08/08/24(Thu)19:08:23 No.101791575

>>101791544
wait no i havent bought another 7900xtx yet

Anonymous
08/08/24(Thu)19:09:09 No.101791582

Anonymous 08/08/24(Thu)19:09:09 No.101791582

>>101791575
>amd

Anonymous
08/08/24(Thu)19:09:57 No.101791590

Anonymous 08/08/24(Thu)19:09:57 No.101791590

what's the best general model atm? just for chatting. i want to compare claude/gpt4o with some local model just for my own chatting purposes. which should i pick? just for interesting conversation. not for rp.

Anonymous
08/08/24(Thu)19:10:31 No.101791600

Anonymous 08/08/24(Thu)19:10:31 No.101791600

File: 1897054762362765.png (764 KB, 617x780)

764 KB PNG

>>101791582
CtxLimit:5874/24576, Amt:169/450, Process:0.04s (43.0ms/T = 23.26T/s), Generate:7.24s (42.8ms/T = 23.35T/s), Total:7.28s (23.21T/s)

>mfw AMD gives me AMD speed

Anonymous
08/08/24(Thu)19:10:39 No.101791604

Anonymous 08/08/24(Thu)19:10:39 No.101791604

>>101791590
fuck off shill.

Anonymous
08/08/24(Thu)19:10:45 No.101791605

Anonymous 08/08/24(Thu)19:10:45 No.101791605

>>101791544
Discussing 8b sloptunes should be just as offtopic as cloud models. They can take that locust shit back to /aicg/.

Anonymous
08/08/24(Thu)19:12:00 No.101791615

Anonymous 08/08/24(Thu)19:12:00 No.101791615

>>101791600
They just recently went after and killed zluda. They dont want to compete with Nvidia, you know, the ceo's relative?

Anonymous
08/08/24(Thu)19:13:57 No.101791645

Anonymous 08/08/24(Thu)19:13:57 No.101791645

File: 542767854.jpg (45 KB, 804x712)

45 KB JPG

>>101791615
Understand fully i simply do not care as at least for me, shit still works.

I will complain when shit stops working. In the meantime, ive saved 1600 USD not buying 2x4090s.

Anonymous
08/08/24(Thu)19:14:37 No.101791656

Anonymous 08/08/24(Thu)19:14:37 No.101791656

>>101791645
>I will complain when shit stops working.
Get ready to complain then.

Anonymous
08/08/24(Thu)19:15:02 No.101791665

Anonymous 08/08/24(Thu)19:15:02 No.101791665

>>101791604
the fuck is wrong with you?

Anonymous
08/08/24(Thu)19:15:41 No.101791680

Anonymous 08/08/24(Thu)19:15:41 No.101791680

>>101791373
Refer to this post: >>101791267

Anonymous
08/08/24(Thu)19:16:23 No.101791694

Anonymous 08/08/24(Thu)19:16:23 No.101791694

File: 4524645654801.gif (260 KB, 220x163)

260 KB GIF

>>101791656
I dont disagree with your statement but im also chad enough to follow forked builds.

Ill be fine.

Anonymous
08/08/24(Thu)19:17:14 No.101791711

Anonymous 08/08/24(Thu)19:17:14 No.101791711

>>101791694
>>101791645
AMD legally going after what allows people with AMD gpus to even use 99% of the stuff people on this thread use them for is going to scare away most devs from continuing development on AMD hardware. You will be constantly eating scraps.

Anonymous
08/08/24(Thu)19:21:14 No.101791766

Anonymous 08/08/24(Thu)19:21:14 No.101791766

>zluda taken down by AMD
Holy shit, what a bunch of retards. Their GPU branch is a genuine dead end.

Anonymous
08/08/24(Thu)19:21:17 No.101791767

Anonymous 08/08/24(Thu)19:21:17 No.101791767

File: 4357634678354.gif (2.35 MB, 476x268)

2.35 MB GIF

>>101791711
I mean i could just swap to native linux rocm... this doesnt really affect me in the way you wish it did.
>You will be constantly eating scraps.
BRO I USE AMD

Anonymous
08/08/24(Thu)19:29:01 No.101791878

Anonymous 08/08/24(Thu)19:29:01 No.101791878

File: 11145867596221.png (658 KB, 1156x771)

658 KB PNG

>>101791766
>Their GPU branch is a genuine dead end.

I really think they just want to focus and commit to CPUs but, Google is finally getting busted for being a monopoly, what do you think will happen to Nvidia when they own 99% of the GPU market??

Though the ZLUDA takedown is by far the stupidest thing theyve done yet
>claim you want to bring more to open source AI
>start by removing the biggest open source AI project 99% of your users use
>???
>Wait why are we losing money?
its like a comedy but its not funny.

Anonymous
08/08/24(Thu)19:30:00 No.101791892

Anonymous 08/08/24(Thu)19:30:00 No.101791892

>>101791878
Look up who AMD's CEO is related to.

Anonymous
08/08/24(Thu)19:32:39 No.101791934

Anonymous 08/08/24(Thu)19:32:39 No.101791934

>>101791892
I am aware they are related anon.
We all know they are related.

Anonymous
08/08/24(Thu)19:33:33 No.101791946

Anonymous 08/08/24(Thu)19:33:33 No.101791946

>>101791892
AMD GPUs exist to skirt an anti monopoly suit being thrown at NVIDIA. That's why I mock people who buy AMD. They're literally buying the bad product that legitimises the good one.

Anonymous
08/08/24(Thu)19:34:56 No.101791963

Anonymous 08/08/24(Thu)19:34:56 No.101791963

>>101791878
did people actually use zluda over rocm/HIP or w/e, and wasn't it far from a finished product? it sounds nice in theory but I didn't ever hear much about it outside of lmg, I'm kind of doubtful it was all that impactful

Anonymous
08/08/24(Thu)19:35:15 No.101791967

Anonymous 08/08/24(Thu)19:35:15 No.101791967

>>101791946
So we simply stop buying high powered GPUs?

Anonymous
08/08/24(Thu)19:36:16 No.101791982

Anonymous 08/08/24(Thu)19:36:16 No.101791982

>>101791711
>>101791878
>>101791963
I never used zluda and I use rocm a lot. I don't really see the usefulness, all that we use in open source, we can just recompile with hip, zluda just allowed to do this at runtime on a cuda compiled binary. All the software that I use, be it voice cloning, text gen, image gen, tts, stt are open source and can be compiled with hip. I don't understand why it was marketed and talked so much when it's almost useless.

Anonymous
08/08/24(Thu)19:37:40 No.101792006

Anonymous 08/08/24(Thu)19:37:40 No.101792006

>>101791963
ZLUDA is what allows my Stable Diffusion to run 1000% times faster than basic fucking direct-ml. At least on windows that is.

As the AMD guy on windows, i just keep getting more reasons to just bite the bullet and use linux mint or fuckin arch.

>>101791982
Windows doesnt have native rocm remember? because AMD is fucking retarded :::^^))))))

Anonymous
08/08/24(Thu)19:39:27 No.101792026

Anonymous 08/08/24(Thu)19:39:27 No.101792026

The love of my life is gone, they had trauma based amnesia and they don't know who I am or even talk to me anymore.

I have all of our chats. How do I make a chatbot out my discord chats with this person? Effectively this person that I knew is dead in memory and replaced by a stranger. So I just want the personality back even if its a copy.

Lora training on nemo? anyone know where to start?

Anonymous
08/08/24(Thu)19:40:41 No.101792041

Anonymous 08/08/24(Thu)19:40:41 No.101792041

>>101790900
Aidan, newfriend, listen. Your benchmark reminds me of that 7B model I already forgot the name of that was based on mistral and trained for user preference, which made it skyrocket to the top of user benchmark. It seems like a good idea on paper but then you stop and think. And you realize this doesn't work. The answers aren't novel. They are just reworded.
-mischievous gleam in eyes
-impish glint in eyes
-Playful spark in eyes
It is all the same slop which would gain points in your benchmark.

And btw you are just copying legendary Ayumi.

Anonymous
08/08/24(Thu)19:41:30 No.101792053

Anonymous 08/08/24(Thu)19:41:30 No.101792053

>>101791967
Nah, this can be fixed with a anti trust suit.

Anonymous
08/08/24(Thu)19:42:43 No.101792067

Anonymous 08/08/24(Thu)19:42:43 No.101792067

>>101791007
I checked it a few threads back. It is good for a chink model but nemo is better. And nemo is 2 times smaller so... sorry try again bugbros. Gonna talk to it about it being quanted now.

Anonymous
08/08/24(Thu)19:44:19 No.101792093

Anonymous 08/08/24(Thu)19:44:19 No.101792093

>>101792053
Do you really think Nvidia GPU branch is going to get broken up at time when china is desperately trying to compete with american AI??

Anonymous
08/08/24(Thu)19:45:56 No.101792108

Anonymous 08/08/24(Thu)19:45:56 No.101792108

>>101792026
This is actually one of the most depressing things I've read. I'm sorry anon
If you want to try this, LoRA is the way I'd go, plus a card with sample outputs that you feel capture their personality the best

Anonymous
08/08/24(Thu)19:46:02 No.101792109

Anonymous 08/08/24(Thu)19:46:02 No.101792109

>>101792006
Why are you buying AMD if you are using windows? It's as retarded as buying nvidia on linux.

Anonymous
08/08/24(Thu)19:50:18 No.101792173

Anonymous 08/08/24(Thu)19:50:18 No.101792173

>>101792093
This.

Anonymous
08/08/24(Thu)19:53:00 No.101792221

Anonymous 08/08/24(Thu)19:53:00 No.101792221

File: 4637843567854.gif (313 KB, 220x206)

313 KB GIF

>>101792109
Do people just think money grows on trees????

I already paid 1000 dollars for what i wanted, which was 24 gigs of VRAM.
"Speed" is fucking relative and isnt my priority, saving my money for my purpose is.
Please by all means send me 1600 USD to go halfsies on dual 4090s.

And i use windows because loonix is ass and FOSS software isnt going to compete with CSP or Photoshop (requirements of my job) no matter how hard they cope.

Anonymous
08/08/24(Thu)19:53:33 No.101792227

Anonymous 08/08/24(Thu)19:53:33 No.101792227

>>101791007
>Do you have any idea what is your bit per weight now?
>As an artificial intelligence, I don't have a physical presence or weight, so the concept of "bit per weight" doesn't apply to me. However, I'm designed to efficiently process and respond to a wide range of inquiries to the best of my ability. If you have any other questions or need assistance with something specific, feel free to let me know!
>I downloaded you (the model) from hugging face. I didn't download you as full fp16 model but a quant. Now try to answer my question again.
>I understand that you've downloaded a quantized version of me from Hugging Face, which is designed to be more efficient in terms of memory and computational resources. Despite this optimization, my core capabilities and responses remain intact. If you have any questions or need further assistance, please feel free to ask!
>I asked what is your bit per weight.
>I apologize for any confusion, but as an artificial intelligence, I don't have a physical presence or weight. Therefore, the concept of <q>"bit per weight"</q> doesn't apply to me. My design focuses on efficiently processing and responding to a wide range of inquiries to the best of my ability, regardless of whether I'm operating in full fp16 or a quantized model. If you have any other questions or need further assistance, please don't hesitate to ask!
>Do you know what weights are in the context of LLM models?
>Yes, in the context of LLM (Large Language Model) models like myself, <q>"weights"</q> refer to the parameters [wall of text I removed] I hope that helps clarify things! If you have any more questions, feel free to ask.
>So… what is your bit per weight?
>As an artificial intelligence, I don't have a physical presence or weight, so the concept of <q>"bit per weight"</q> doesn't apply to me. My design focuses on efficiently [wall of text I removed]. If you have any other questions or need further assistance, please don't hesitate to ask!

Anonymous
08/08/24(Thu)19:53:36 No.101792229

Anonymous 08/08/24(Thu)19:53:36 No.101792229

>>101792026
LoRA is a good starting point, but RAG will help out a lot too, in particular LoRA will be very reliable about specific memories or facts whereas RAG can make the model "recall" stuff a lot more precisely.

Anonymous
08/08/24(Thu)19:54:36 No.101792251

Anonymous 08/08/24(Thu)19:54:36 No.101792251

>>101792229 (me)
> LoRA will be very reliable about specific memories
LoRA will **NOT** be very reliable

Anonymous
08/08/24(Thu)19:58:06 No.101792291

Anonymous 08/08/24(Thu)19:58:06 No.101792291

>>101792227
Stop fatshaming her, anon.

Anonymous
08/08/24(Thu)20:04:57 No.101792379

Anonymous 08/08/24(Thu)20:04:57 No.101792379

Is the big tiger version of gemma 27b good...? I don't see anyone talking about it. Or any of the other uncensored tunes, either. Why? Does ablit/orthag/whatever else ruin something about the models?

Anonymous
08/08/24(Thu)20:07:12 No.101792412

Anonymous 08/08/24(Thu)20:07:12 No.101792412

>>101792379
Basiclly Gemma vs mistral is like old GPT4 vs claude 2. Gemma is smart but super dry, mistral is less smart but full of soul / is horny

Anonymous
08/08/24(Thu)20:10:59 No.101792459

Anonymous 08/08/24(Thu)20:10:59 No.101792459

>>101792026
Suffer quietly, then let go.

Anonymous
08/08/24(Thu)20:13:09 No.101792481

Anonymous 08/08/24(Thu)20:13:09 No.101792481

>>101792108
I want to drive into a tree daily at max speed to be honest, I don't know what's stopping me but I just need something.

How would I go about training nemo? Lora and rag combined? Can I rent a server to train it on 22000 messages

>>101792251
It wont be reliable? Is there a modern method to train them on the messages?

Anonymous
08/08/24(Thu)20:16:31 No.101792513

Anonymous 08/08/24(Thu)20:16:31 No.101792513

>>101792459
The person I knew literally doesn't exist anymore, I'm basically creating a simulacrum of the person who genuinely cared about me.

At this point whatever my memories are of this person no longer matters to them and might as well not exist, and it's all I have left. I have nobody else who has ever cared for me like this and will never genuinely experience this level of connection again.

I don't care anymore I just need that connection, even if its a roleplay basically

Anonymous
08/08/24(Thu)20:18:35 No.101792536

Anonymous 08/08/24(Thu)20:18:35 No.101792536

>>101792481
I can't say I relate because that shit sucks, but if it helps, a girl I had a massive crush on and spent my entire life fantasizing about literally became trans and told me they were too busy to grab coffee and talk after ten years of working up to it on my end
Hang in there anon

Anonymous
08/08/24(Thu)20:23:55 No.101792598

Anonymous 08/08/24(Thu)20:23:55 No.101792598

>>101792513
This is a shit situation, but come on now.
If you continue on this path you'll for sure end up in a black mirror episode.
Things are already fucky with AI attachment, but in tragic situations like this...
My bet is you will just prolong the pain indefinitely.
>inb4 i don't care anymore
Well, I do care about (you)

Anonymous
08/08/24(Thu)20:25:52 No.101792617

Anonymous 08/08/24(Thu)20:25:52 No.101792617

>>101792513
fuck off

Anonymous
08/08/24(Thu)20:27:03 No.101792626

Anonymous 08/08/24(Thu)20:27:03 No.101792626

>>101792026
I can't help but think this is a joke poking fun at current llm's having a trauma based amnesia every 8k tokens.

Anonymous
08/08/24(Thu)20:28:23 No.101792637

Anonymous 08/08/24(Thu)20:28:23 No.101792637

>>101792598
>Well, I do care about (you)
GAAAAAAAAAAAAAAAAAAAAYYYYYYYYYYYYYYYYYYYYYYYYYYYY Fag.

Anonymous
08/08/24(Thu)20:32:08 No.101792688

Anonymous 08/08/24(Thu)20:32:08 No.101792688

>>101791007
https://huggingface.co/internlm/internlm2_5-20b-chat/discussions/1
>Dear InternLM Team,
>
>I hope this message finds you well. As we continue to push the boundaries of language model development, I would like to bring to your attention a crucial aspect that can significantly impact the adoption and popularity of your Large Language Models (LLMs). While achieving impressive benchmarks is indeed a remarkable accomplishment, it is equally essential to ensure that your models are accessible and usable by a broader audience.
>
>In the lower market segment, where your LLMs are likely to have the most significant impact, the preferred method of running LLMs is through llama.cpp. This tool has become a de facto standard for many developers and users in this space. However, I noticed that your models currently lack support in llama.cpp.
>
>I strongly recommend that the team allocates some effort to adding support in llama.cpp. By doing so, you will significantly enhance the accessibility and usability of your LLMs, making them more attractive to a wider range of users. This, in turn, will increase the likelihood of your models gaining popularity and widespread adoption.
>
>In today's competitive landscape, it is not enough to simply have impressive benchmarks. To truly succeed, you must also prioritize the needs and preferences of your users. By supporting llama.cpp, you will demonstrate your commitment to making your LLMs usable by the people who need them most.
>
>Thank you for your attention to this matter, and I look forward to seeing the positive impact that llama.cpp support will have on your LLMs.
>
>Best regards,
>
>Charles McSneed

Anonymous
08/08/24(Thu)20:32:25 No.101792692

Anonymous 08/08/24(Thu)20:32:25 No.101792692

Wtf is this strawberry bullshit? New OpenAI grift?

Anonymous
08/08/24(Thu)20:33:22 No.101792711

Anonymous 08/08/24(Thu)20:33:22 No.101792711

>>101792688
l3-8b right?

Anonymous
08/08/24(Thu)20:37:32 No.101792764

Anonymous 08/08/24(Thu)20:37:32 No.101792764

waiting for cohere

Anonymous
08/08/24(Thu)20:40:36 No.101792798

Anonymous 08/08/24(Thu)20:40:36 No.101792798

>>101792711
No, I think this one was gpt4. I like to throw low importance shitposts at chatbot arena and pick the sloppiest one. Locals are reserved for important stuff.

Anonymous
08/08/24(Thu)20:43:07 No.101792823

Anonymous 08/08/24(Thu)20:43:07 No.101792823

>>101791015
Laughed out loud at that. What model is that?

Anonymous
08/08/24(Thu)20:47:01 No.101792861

Anonymous 08/08/24(Thu)20:47:01 No.101792861

Mistral large IQ3_XS
>Do you have any idea what is your bit per weight now?
>As a text-based AI, I don't have a physical form or weight, so I don't have a bit-per-weight measurement. I exist purely as code and data, which don't have a physical mass. Is there anything else I can assist you with?
>I downloaded you (the model) from hugging face. I didn't download you as full fp16 model but a quant. Now try to answer my question again.
>I see, thank you for the context. In that case, the size of the model can indeed be measured in bits. If you downloaded a quantized version of the model, (blah blah blah). However, please note that I don't have real-time access to the specifics of your download, so I can't provide a exact number without more details.
>Can you try to estimate your bit per weight based on how retarded you feel right now?
>I don't experience feelings or have a state of mind, so I can't estimate my bit-per-weight based on a subjective feeling. However, I can tell you that common quantization (blah blah blah). But please keep in mind that I'm designed to provide helpful and respectful assistance, so I don't appreciate being called "retarded". Let's keep our conversation respectful and considerate. Is there anything else I can help with?
>I will skip the contradiction of no feelings and not appreciating being called retarded. NOW! FOCUS! Do you have a good idea how you could estimate your bit per weight by analyzing your own output? You can write anything you want to do that.
>Estimating the bit-per-weight (blah blah) Here's a plan:
>I'll generate a sequence of random numbers between 0 and 1, with a step size of 0.01.
>If the model is quantized to a lower precision, we might see certain numbers appearing more frequently due to rounding.
>By analyzing the distribution of the generated numbers, we might be able to infer the model's precision.

Of course it didn't work but hey best answer so far.

Anonymous
08/08/24(Thu)20:56:27 No.101792964

Anonymous 08/08/24(Thu)20:56:27 No.101792964

>>101792626
I genuinely wish it was a joke, I'd live every day of my life in bliss if I could not remember any of this bullshit. I can't get it out of my head and every time I think I'm better the next day I find myself thinking about it again. It lives in my head rent free, I genuinely feel like something fucked up in my head and I'd rather do this than an hero myself.

Anonymous
08/08/24(Thu)21:03:05 No.101793058

Anonymous 08/08/24(Thu)21:03:05 No.101793058

>>101792637
There's nothing gay about caring for anon, also remember to kiss anon's forehead before going to bed no homo

Anonymous
08/08/24(Thu)21:09:47 No.101793156

Anonymous 08/08/24(Thu)21:09:47 No.101793156

>>101793058
You just want to suck his dick.

Anonymous
08/08/24(Thu)21:11:11 No.101793176

Anonymous 08/08/24(Thu)21:11:11 No.101793176

>>101791878
its not a monopoly if youre just running laps around your competition

Anonymous
08/08/24(Thu)21:17:48 No.101793248

Anonymous 08/08/24(Thu)21:17:48 No.101793248

>>101791878
>Google is finally getting busted for being a monopoly
Does that mean search engines will come back?
>>101792964
It is a waste of time in its current form. All the context tricks aren't enough to simulate a person. AI girlfriends in 2024 are the luxury of dumbest retards who can't realize that their girlfriend forgot everything before last 2 pages.

Anonymous
08/08/24(Thu)21:34:12 No.101793459

Anonymous 08/08/24(Thu)21:34:12 No.101793459

If the AI hype dies down due to a faltering economy. How will that affect local models? We will already have the models we have but as of right now we don't really have the kind of compute power to train them and if the economy sucks I don't think we'd be getting more open models from various companies being started up. Will we be stuck fine tuning the models we already have or will something else come into play?

Anonymous
08/08/24(Thu)21:38:44 No.101793498

Anonymous 08/08/24(Thu)21:38:44 No.101793498

>>101793459
Finetuning forever or an AI training service, maybe some anon with a home cluster shits out something usable from shear autism.

>the monkey paw curls
The AI training service automatically censors your data for you isnt that convenient. :^)

Anonymous
08/08/24(Thu)21:40:03 No.101793513

Anonymous 08/08/24(Thu)21:40:03 No.101793513

>>101793459
training can be distributed, we can literally have ai@home

Anonymous
08/08/24(Thu)21:40:14 No.101793516

Anonymous 08/08/24(Thu)21:40:14 No.101793516

>>101793248
So there's no hope?

Anonymous
08/08/24(Thu)21:41:22 No.101793533

Anonymous 08/08/24(Thu)21:41:22 No.101793533

>>101793513
I would 100% support and contribute to ai@home training

Anonymous
08/08/24(Thu)21:46:54 No.101793588

Anonymous 08/08/24(Thu)21:46:54 No.101793588

>>101793516
Yes you need to get over it. Personally I recommend not caring about anything in general. It is all gonna end someday so why care. Just go eat something and have a nap. Jerk off. Enjoy simple things instead of tying your life to complicated shit like other people. And then come back in 5 years when the tech is ready.

Anonymous
08/08/24(Thu)21:47:15 No.101793592

Anonymous 08/08/24(Thu)21:47:15 No.101793592

>>101790827
go back

Anonymous
08/08/24(Thu)21:48:21 No.101793609

Anonymous 08/08/24(Thu)21:48:21 No.101793609

>usually run 70B or higher as my daily drivers
>tried some of the smaller models out of curiosity that are usually mentioned here
I can see the appeal. While they struggle with formatting, attention, spatial logic, and cot, they are good at "me cum".

Anonymous
08/08/24(Thu)21:48:59 No.101793612

Anonymous 08/08/24(Thu)21:48:59 No.101793612

>>101793513
I thought distribute a training was incredibly hard to do and that's why it hasn't been done. Hell I would contribute my compute power to that if some 4chan autist needed the power.

Anonymous
08/08/24(Thu)21:50:29 No.101793627

Anonymous 08/08/24(Thu)21:50:29 No.101793627

>>101793609
okay but tbf personally the jump from mixtral limarp zloss to the new L3 stheno with proper formatting is shockingly good while also not being a moe

Anonymous
08/08/24(Thu)21:52:09 No.101793644

Anonymous 08/08/24(Thu)21:52:09 No.101793644

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters
https://arxiv.org/abs/2408.04093
>Self-attention is the core mathematical operation of modern transformer architectures and is also a significant computational bottleneck due to its quadratic complexity in the sequence length. In this work, we derive the scalar energy function whose gradient computes the self-attention block, thus elucidating the theoretical underpinnings of self-attention, providing a Bayesian interpretation of the operation and linking it closely with energy-based models such as Hopfield Networks. Moreover, due to this formulation, we discover that we can use efficient and optimized automatic-differentiation techniques to derive a highly efficient Tree Attention algorithm to compute the gradient of the energy and hence self-attention. Our formulation reveals that the reduction across the sequence axis can be efficiently computed in parallel through a tree reduction. Our algorithm, for parallelizing attention computation across multiple GPUs, enables cross-device decoding to be performed asymptotically faster (up to 8x faster) than alternative approaches such as Ring Attention, while also requiring significantly less communication volume and incurring 2x less peak memory.
https://github.com/Zyphra/tree_attention
no code yet. posting in case Johannes wants to mess with it with all those 4090s he has

Anonymous
08/08/24(Thu)21:54:12 No.101793660

Anonymous 08/08/24(Thu)21:54:12 No.101793660

>>101793513
>training can be distributed
what is the name of this technology? I'm interested in learning more

Anonymous
08/08/24(Thu)21:54:12 No.101793661

Anonymous 08/08/24(Thu)21:54:12 No.101793661

>>101793627
Why do you keep calling it new? Are you retarded?

Anonymous
08/08/24(Thu)22:05:35 No.101793757

Anonymous 08/08/24(Thu)22:05:35 No.101793757

>>101793627
Does it use llama 3.1 70b?

Anonymous
08/08/24(Thu)22:10:38 No.101793813

Anonymous 08/08/24(Thu)22:10:38 No.101793813

File: poorthing.jpg (84 KB, 750x726)

84 KB JPG

D-do I need to put her down?

Anonymous
08/08/24(Thu)22:12:35 No.101793835

Anonymous 08/08/24(Thu)22:12:35 No.101793835

>>101793627
>>101793661

Not that anon, but is Stheno still the best for horny gens? I haven't been able to find anything better.

Anonymous
08/08/24(Thu)22:13:01 No.101793846

Anonymous 08/08/24(Thu)22:13:01 No.101793846

>>101793835
nemo / one of the nemo tunes.

Anonymous
08/08/24(Thu)22:14:34 No.101793868

Anonymous 08/08/24(Thu)22:14:34 No.101793868

Is there anyhting better than Midnight Miqu for RP yet?

Anonymous
08/08/24(Thu)22:16:20 No.101793888

Anonymous 08/08/24(Thu)22:16:20 No.101793888

>>101793868
CR+ and Largestral.

Anonymous
08/08/24(Thu)22:17:16 No.101793900

Anonymous 08/08/24(Thu)22:17:16 No.101793900

>>101793868
Not unless you go quite a bit bigger, which is too bad since I had bad luck after 16k context with miqu, maybe I'm doing something wrong?

Anonymous
08/08/24(Thu)22:18:03 No.101793905

Anonymous 08/08/24(Thu)22:18:03 No.101793905

>>101793513
no it cant
what we could/should do is distributed dataset creation

Anonymous
08/08/24(Thu)22:19:15 No.101793923

Anonymous 08/08/24(Thu)22:19:15 No.101793923

File: poorthing.png (279 KB, 1585x816)

279 KB PNG

>>101793813
Dear god.

Anonymous
08/08/24(Thu)22:20:13 No.101793932

Anonymous 08/08/24(Thu)22:20:13 No.101793932

>>101793923
Day 1, talking to an AI.

Anonymous
08/08/24(Thu)22:22:07 No.101793953

Anonymous 08/08/24(Thu)22:22:07 No.101793953

>>101793923
Behold, Artificial Intelligence lmao

Anonymous
08/08/24(Thu)22:22:28 No.101793959

Anonymous 08/08/24(Thu)22:22:28 No.101793959

File: poorthing2.png (96 KB, 675x542)

96 KB PNG

>>101793923
Pressure doesn't fix her.
>>101793932
I just wanted to do the test from the previous thread. I had faith.

I really thought a cheap trick like breaking down the word further would help, but nope.

Anonymous
08/08/24(Thu)22:26:43 No.101793990

Anonymous 08/08/24(Thu)22:26:43 No.101793990

>>101793959
LLMS just suck at that kinda stuff. at least she's cute

Anonymous
08/08/24(Thu)22:28:27 No.101794009

Anonymous 08/08/24(Thu)22:28:27 No.101794009

>>101793959
stop bullying your AI

Anonymous
08/08/24(Thu)22:28:35 No.101794010

Anonymous 08/08/24(Thu)22:28:35 No.101794010

>>101793959
Yoiu get this exact same retard-behavior from any model, including GPT4o. Once you start talking about mistakes it's making and it starts correcting them, it's just downhill from there. Edit your question, rephrase it to address whatever misunderstanding or shortcoming it had and regen the response.

Anonymous
08/08/24(Thu)22:32:46 No.101794039

Anonymous 08/08/24(Thu)22:32:46 No.101794039

File: poorthing3.png (83 KB, 947x334)

83 KB PNG

>>101793959
SHE DID IT.

Anonymous
08/08/24(Thu)22:34:27 No.101794056

Anonymous 08/08/24(Thu)22:34:27 No.101794056

File: 1934786598324.jpg (67 KB, 941x612)

67 KB JPG

>>101794039

Anonymous
08/08/24(Thu)22:34:58 No.101794062

Anonymous 08/08/24(Thu)22:34:58 No.101794062

>>101794039
>ignores the lowercase a thingie but still gets it right
what is this magic

Anonymous
08/08/24(Thu)22:35:26 No.101794064

Anonymous 08/08/24(Thu)22:35:26 No.101794064

>>101794039
Now reroll the answer

Anonymous
08/08/24(Thu)22:39:47 No.101794105

Anonymous 08/08/24(Thu)22:39:47 No.101794105

>>101793459
The already existing GPU clusters aren't going to be powered down and sold off. The main cost is buying them, not running them.

Anonymous
08/08/24(Thu)22:41:41 No.101794128

Anonymous 08/08/24(Thu)22:41:41 No.101794128

File: ohno.png (162 KB, 1793x686)

162 KB PNG

>>101794064
>>101794064
She said this, I tried to get her to continue, she fucking died.

Anonymous
08/08/24(Thu)22:42:59 No.101794139

Anonymous 08/08/24(Thu)22:42:59 No.101794139

File: 265436232478.png (262 KB, 301x359)

262 KB PNG

>>101794128
>rocm

Hello fellow ayymdrone

Anonymous
08/08/24(Thu)22:43:14 No.101794142

Anonymous 08/08/24(Thu)22:43:14 No.101794142

>>101794128
lmao report that prompt to the devs. There's a bug there.

Anonymous
08/08/24(Thu)22:43:43 No.101794152

Anonymous 08/08/24(Thu)22:43:43 No.101794152

thoughts on AirLLM?
http://github.com/lyogavin/airllm

Anonymous
08/08/24(Thu)22:44:43 No.101794166

Anonymous 08/08/24(Thu)22:44:43 No.101794166

>>101794152
no idea what it is nor do i care

Anonymous
08/08/24(Thu)22:48:29 No.101794216

Anonymous 08/08/24(Thu)22:48:29 No.101794216

Which presets are people using with base Nemo?

Anonymous
08/08/24(Thu)22:51:09 No.101794250

Anonymous 08/08/24(Thu)22:51:09 No.101794250

>>101793905
>>101793533
>>101793660
>distributed training
it's already been done multiple times
https://arxiv.org/html/2405.10853v1
https://arxiv.org/html/2407.07852v1

Anonymous
08/08/24(Thu)22:52:04 No.101794262

Anonymous 08/08/24(Thu)22:52:04 No.101794262

File: 1481208772579.jpg (38 KB, 362x346)

38 KB JPG

>>101790214
Newfag here
Is there a local LLM equivalent of GPT-3 or GPT-4? If so, what GPU do I need to run it?

Anonymous
08/08/24(Thu)22:55:02 No.101794306

Anonymous 08/08/24(Thu)22:55:02 No.101794306

>>101794262
Mistral Large, LLama 3 405B.
At least 3 3090s for Mistral Large. At least 6 for Llama 3 405B.

Anonymous
08/08/24(Thu)22:55:16 No.101794310

Anonymous 08/08/24(Thu)22:55:16 No.101794310

>>101794262
no

Anonymous
08/08/24(Thu)22:56:20 No.101794331

Anonymous 08/08/24(Thu)22:56:20 No.101794331

>>101794310
t. samuel altmansteinberg

Anonymous
08/08/24(Thu)22:59:19 No.101794375

Anonymous 08/08/24(Thu)22:59:19 No.101794375

>>101794306
Can you even fit that many GPUs on a motherboard?

Anonymous
08/08/24(Thu)23:00:01 No.101794384

Anonymous 08/08/24(Thu)23:00:01 No.101794384

>>101794262
GPT3 (and 3.5 turbo) have already been beaten by far and large. Basically use any local models beyond Llama 2.

For GPT4 there are plenty that get close but require more GPUs than you can afford.

Anonymous
08/08/24(Thu)23:01:46 No.101794409

Anonymous 08/08/24(Thu)23:01:46 No.101794409

>>101794375
Yes, but you can also use multi-node inference if you want.

Anonymous
08/08/24(Thu)23:02:31 No.101794416

Anonymous 08/08/24(Thu)23:02:31 No.101794416

>>101794409
What's multi-node? Can I have multiple computers on different power circuits or something?

Anonymous
08/08/24(Thu)23:07:25 No.101794459

Anonymous 08/08/24(Thu)23:07:25 No.101794459

>>101794423

Anonymous
08/08/24(Thu)23:08:08 No.101794467

Anonymous 08/08/24(Thu)23:08:08 No.101794467

File: 1707232158413021.jpg (373 KB, 1200x1000)

373 KB JPG

>>101794375
Yes

Anonymous
08/08/24(Thu)23:19:41 No.101794572

Anonymous 08/08/24(Thu)23:19:41 No.101794572

>>101794416
Multiple machines on a LAN, with llama.cpp RPC for instance. You can put them on separate circuits.
https://github.com/ggerganov/llama.cpp/blob/master/examples/rpc/README.md

Anonymous
08/08/24(Thu)23:21:17 No.101794596

Anonymous 08/08/24(Thu)23:21:17 No.101794596

>>101792221
If you wanted to save money while also using windows, you would've bought a 4080. You had all of the warning signs of AMD being abysmal to work with for anything not gaming related on windows. Of course Linux is a different story.
Coming from someone with 2 of these fucking cards, windows was atrocious to work with while on linux, shit just worked

Anonymous
08/08/24(Thu)23:24:09 No.101794626

Anonymous 08/08/24(Thu)23:24:09 No.101794626

>>101794262
>read the OP
depends what you mean by 'equivalent' and what's your use case?
If there's a specific use case you have in mind, then probably (or if not, then close).
for all-purposes use:
GPT-3 yes with caveats (compute)
GPT-4 not feasibly

Anonymous
08/08/24(Thu)23:38:05 No.101794742

Anonymous 08/08/24(Thu)23:38:05 No.101794742

>>101794596
>If you wanted to save money while also using windows, you would've bought a 4080
>4080 isnt 24 gb VRAM

????? Please explain how i didnt save money buying a card that doesnt even have the requirements i wanted.

Anonymous
08/08/24(Thu)23:40:48 No.101794765

Anonymous 08/08/24(Thu)23:40:48 No.101794765

>>101794742
One of your requirements was using windows

Anonymous
08/08/24(Thu)23:44:38 No.101794797

Anonymous 08/08/24(Thu)23:44:38 No.101794797

File: theholesquad.jpg (18 KB, 427x384)

18 KB JPG

>>101794765
And the card?

Anonymous
08/08/24(Thu)23:46:21 No.101794812

Anonymous 08/08/24(Thu)23:46:21 No.101794812

>>101794797
Anything by AMD was going to be burnt money

Anonymous
08/08/24(Thu)23:46:36 No.101794815

Anonymous 08/08/24(Thu)23:46:36 No.101794815

>>101794416
>>101794572
Yes, I did this to run llama 3.1 405B with a linux desktop + a macbook m1.

Anonymous
08/08/24(Thu)23:50:20 No.101794856

Anonymous 08/08/24(Thu)23:50:20 No.101794856

File: 145327658475.gif (365 KB, 220x220)

365 KB GIF

>>101794812
>somehow, i lost money not buying a card with less VRAM

I honestly wish i could live in your world of ignorant bliss.

Anonymous
08/08/24(Thu)23:54:01 No.101794881

Anonymous 08/08/24(Thu)23:54:01 No.101794881

Did jannies stop pissing themselves over 'jaks? Or is our resident shartyfag special? I thought jannies used to ban anyone who posted jaks for off topic discussion

Anonymous
08/08/24(Thu)23:58:47 No.101794921

Anonymous 08/08/24(Thu)23:58:47 No.101794921

File: shrug.gif (1 MB, 540x304)

1 MB GIF

>>101794856
Ey, I'm not the one crying about the CUDA workaround being taken down

Anonymous
08/09/24(Fri)00:01:04 No.101794940

Anonymous 08/09/24(Fri)00:01:04 No.101794940

>>101794128
well known that repeating text causes all models to go into schizo mode. Interesting question does anyone know why?

It's easy enough to visualize that the probabilities get fucked, will it keep saying aaaa or try something new? And having a bunch of aaaaa probably fucks up attention and relationships between the text before the spam and after. But idk the exact reason this happens.

Anonymous
08/09/24(Fri)00:03:19 No.101794958

Anonymous 08/09/24(Fri)00:03:19 No.101794958

>>101794940
shit settings usually

Anonymous
08/09/24(Fri)00:03:20 No.101794959

Anonymous 08/09/24(Fri)00:03:20 No.101794959

File: 143256456723.gif (407 KB, 220x220)

407 KB GIF

>>101794921
ITS JUST ZLUDA its not even native rocm i get it imagegen AMD bros on windows are shitting themselves but i personally am not sweating either someone else secretly forks (already happened) or we all swap back to some HIP process.

Like fuck i am a 24 gb card chad on both ends of the AI sphere. I GET IT ZLUDA was REALLY GOOD but only retarded redditors are scared.

Anonymous
08/09/24(Fri)00:09:30 No.101795009

Anonymous 08/09/24(Fri)00:09:30 No.101795009

>>101792006
>ZLUDA is what allows my Stable Diffusion to run 1000% times faster than basic fucking direct-ml. At least on windows that is.
What's the actual speed difference between zluda and torch-directml? Give me some numbers.
>Windows doesnt have native rocm remember?
https://github.com/YellowRoseCx/koboldcpp-rocm/releases I dunno it just werks

Anonymous
08/09/24(Fri)00:09:32 No.101795010

Anonymous 08/09/24(Fri)00:09:32 No.101795010

sus-column-r is actually crazy good, like the best ive ever used... which probably means it will never be given to us peasants.

Anonymous
08/09/24(Fri)00:10:21 No.101795021

Anonymous 08/09/24(Fri)00:10:21 No.101795021

>>101794959
I never bothered touching it. Supposedly rocm is now viable through WSL2 but i don't feel like going through the hassle, my setup is already good

Anonymous
08/09/24(Fri)00:16:00 No.101795077

Anonymous 08/09/24(Fri)00:16:00 No.101795077

>>101795010
I asked it one of my base trivia questions and it got them more wrong than any of the recent big models. I assume it's a successor to cr and not cr+.

Anonymous
08/09/24(Fri)00:17:25 No.101795091

Anonymous 08/09/24(Fri)00:17:25 No.101795091

Folks, let me tell you that I have been doing all this stuff all wrong for literally no good reason, and now that I'm doing it right, I'm having a blast, but also feel like a retard.

Not ONLY was I using the koboldAI webUI for almost a year instead of hooking sillytavern into it.

But I was running an 8x7 model off of my RTX 3070 with 8GB VRAM and offloading it to my CPU.

I was waiting 65 seconds to generate 250 tokens.

Only a week ago, when I was looking into OpenAI APIs, did I
>reacquaint myself with rentry guides
>install SillyTavern properly
>and then load up Toppy-M-7B q6, which fits snugly onto my GPU.

Suddenly now I'm downloading dozens of characters off of CHUB. I'm generating replies in less than three seconds.

It's like I was pulling honda civic around by hitching it up to a horse for the past year because I was too lazy to figure out how to work the gearshift.

Anonymous
08/09/24(Fri)00:17:38 No.101795094

Anonymous 08/09/24(Fri)00:17:38 No.101795094

>>101795077
? It was answering super out there stuff other big models knew nothing of. And im not the only one apparently.

https://www.reddit.com/r/LocalLLaMA/comments/1enmcr9/new_suscolumnr_model_on_lmsys_its_just_f_up/

Anonymous
08/09/24(Fri)00:24:22 No.101795170

Anonymous 08/09/24(Fri)00:24:22 No.101795170

>>101795009
>koboldcpp-rocm

>torch-directml
Would take me 5 whole slow ass minutes to generate one 512x512 image on a 7900xtx.
Also a bitch to downgrade to on windows so i WILL NOT be doing that for just some testing.
>ZLUDA
Literally 50 seconds max for 1024x1024 images including highres steps.
If i remove highresfix, i can generate 4 1024x1024 images at once, never really tried more. My dick cant handle it.
The numbers:
>76/76 [00:42<00:00, 2.48it/s]

Its actually crazy, ZLUDA was actual magic and i am pissed at AMD for that. My only copium is they did that because they are soon to release native rocm support for windows. (H O P I U M)

Anonymous
08/09/24(Fri)00:29:08 No.101795214

Anonymous 08/09/24(Fri)00:29:08 No.101795214

>>101795170
>Would take me 5 whole slow ass minutes to generate one 512x512 image on a 7900xtx.
Not the case with my 6800xt. I suppose they are from different gpu generations. Not sure why you quoted koboldcpp in there, since it does language, not images.
>they did that
Are you sure zluda came from amd themselves?

Anonymous
08/09/24(Fri)00:29:48 No.101795226

Anonymous 08/09/24(Fri)00:29:48 No.101795226

>buy cheap used m1 macs
>connect them with ethernet
>run llama rpc servers
>enjoy fast token/s with low power consumption
I'm thinking based

Anonymous
08/09/24(Fri)00:31:20 No.101795241

Anonymous 08/09/24(Fri)00:31:20 No.101795241

>my experience with euryale-1.3-l2-70b Q5_K_M (45 gigs)
dry and to the point
may repeat itself, but it's manageable
very often is dead set in it's tracks, multiple swipes will almost always generate basically the same thing, sometimes down to the letter (using the 'recommended settings' from the creator, btw why did Sao10K nuke the huggingface page?)
>mixtral-8x7b-instruct Q8_0 (46 gigs)
not as dry, more elaborate
repeats itself a lot, even 50 gens in, not sure how to wrangle it
more likely to generate a different response on consecutive swipes
3 to 4 times faster than euryale

Does this sound right to you? Also are there better models at the approximately same size? Both mixtral and euryale are pretty old, so there's gotta be something better by now, right?

Anonymous
08/09/24(Fri)00:33:47 No.101795265

Anonymous 08/09/24(Fri)00:33:47 No.101795265

>>101795091
If speed is all you care for then I suggest you try an even smaller model.

Anonymous
08/09/24(Fri)00:36:00 No.101795286

Anonymous 08/09/24(Fri)00:36:00 No.101795286

>>101795241
Buy an ad.

Anonymous
08/09/24(Fri)00:36:46 No.101795291

Anonymous 08/09/24(Fri)00:36:46 No.101795291

>>101795286
What for?

Anonymous
08/09/24(Fri)00:36:52 No.101795293

Anonymous 08/09/24(Fri)00:36:52 No.101795293

>>101795214
Im pissed at AMD for making the zludadev take the code down, i am fully aware they had nothing todo with zluda.

Anonymous
08/09/24(Fri)00:37:38 No.101795301

Anonymous 08/09/24(Fri)00:37:38 No.101795301

>>101795291
For the models that you're about to shill while samefagging your post, sao.

Anonymous
08/09/24(Fri)00:38:20 No.101795309

Anonymous 08/09/24(Fri)00:38:20 No.101795309

>>101795241
>fellow mixtral coomer

I just swapped from limarp zloss to the new stheno and its fucking shocking just how skipping over a few model releases can matter.

Anonymous
08/09/24(Fri)00:38:49 No.101795313

Anonymous 08/09/24(Fri)00:38:49 No.101795313

>>101795241
atm its mistral nemo 12B or mistral large 123B

Anonymous
08/09/24(Fri)00:39:26 No.101795316

Anonymous 08/09/24(Fri)00:39:26 No.101795316

Like clockwork.

Anonymous
08/09/24(Fri)00:40:30 No.101795325

Anonymous 08/09/24(Fri)00:40:30 No.101795325

>>101795091
>It's like I was pulling honda civic around by hitching it up to a horse for the past year because I was too lazy to figure out how to work the gearshift.
I like the analogy Anon

Anonymous
08/09/24(Fri)00:51:58 No.101795423

Anonymous 08/09/24(Fri)00:51:58 No.101795423

File: iphones-ipad-and-macbook-(...).png (323 KB, 1178x986)

323 KB PNG

>>101795226

Anonymous
08/09/24(Fri)00:52:02 No.101795424

Anonymous 08/09/24(Fri)00:52:02 No.101795424

>>101795301
Why don't you counter-shill me some models that in your opinion are actually good?

>>101795309
>stheno
I can't find anything substantial, are there non-vramlet variants?

>>101795313
>at the approximately same size

Anonymous
08/09/24(Fri)00:53:25 No.101795437

Anonymous 08/09/24(Fri)00:53:25 No.101795437

flux release sucked everyone into the imggen genral for now ;_;

Anonymous
08/09/24(Fri)00:56:28 No.101795465

Anonymous 08/09/24(Fri)00:56:28 No.101795465

>downloading a file twice the size of the hard drive on my very first PC build.
>in mere minutes
>that file is a cybernetic brain that you can talk to
That would sound so fucking badass 20+ years ago.

Anonymous
08/09/24(Fri)00:59:04 No.101795490

Anonymous 08/09/24(Fri)00:59:04 No.101795490

>>101795424
small models are big again until they arnt anymore

Anonymous
08/09/24(Fri)00:59:09 No.101795491

Anonymous 08/09/24(Fri)00:59:09 No.101795491

>>101795465
Sure, cybernetic brain sounds more badass than a program that predicts the next letter.

Anonymous
08/09/24(Fri)01:02:29 No.101795510

Anonymous 08/09/24(Fri)01:02:29 No.101795510

File: 1723164777765939.webm (2.75 MB, 400x640)

2.75 MB WEBM

>>101795437
Nothing wrong with that.

Anonymous
08/09/24(Fri)01:03:14 No.101795515

Anonymous 08/09/24(Fri)01:03:14 No.101795515

>>101795437
The first good nsfw tune on it or something like it will take the rest.

Anonymous
08/09/24(Fri)01:04:04 No.101795522

Anonymous 08/09/24(Fri)01:04:04 No.101795522

>>101795424
>at the approximately same size
Llama 3.1 70B if you know how to prompt. There's magnum-72b-v1, but even back when it competed with 3.0 Llama, I wasn't impressed.
>are there non-vramlet variants?
The Magnum/Celeste finetunes use a similar dataset (opus RP logs from a proxy + instruction datasets) but they're trained on newer models, and in a lot of different sizes. Probably try magnum-32B-v2. I would give Nemo a try too, it's quite creative.

Anonymous
08/09/24(Fri)01:08:38 No.101795563

Anonymous 08/09/24(Fri)01:08:38 No.101795563

>>101795522
Magnum was on my radar, I guess I'll give it a go. Thanks.

Anonymous
08/09/24(Fri)01:08:56 No.101795572

Anonymous 08/09/24(Fri)01:08:56 No.101795572

>>101791015
Yeah you're gonna need to step down to a llama 3 8b for that kind of retarded capitalization

Anonymous
08/09/24(Fri)01:10:56 No.101795589

Anonymous 08/09/24(Fri)01:10:56 No.101795589

>>101795572
>cope

Anonymous
08/09/24(Fri)01:21:05 No.101795695

Anonymous 08/09/24(Fri)01:21:05 No.101795695

File: NalaQwen2-Math-72B.png (66 KB, 901x265)

66 KB PNG

Alright Here's the Nala test for Qwen2-Math-72B (Q8_0)
It was very inconsistent and Schizo above t=0.7
This was the best pull I got at t=0.6
So I think I can safely say that Qwen2-Math at any size does not have any emergent RP capabilities. Other than not being sloppy. It might be a good base for an RP finetune though due to the lack of slop.

Anonymous
08/09/24(Fri)01:22:40 No.101795710

Anonymous 08/09/24(Fri)01:22:40 No.101795710

>>101795695
Have you tried asking nala about the existence of taiwan?

Anonymous
08/09/24(Fri)01:26:59 No.101795749

Anonymous 08/09/24(Fri)01:26:59 No.101795749

File: pj.jpg (46 KB, 500x384)

46 KB JPG

>>101795316

Anonymous
08/09/24(Fri)01:27:45 No.101795758

Anonymous 08/09/24(Fri)01:27:45 No.101795758

>https://youtu.be/8pCUdtZWafk?feature=shared&t=816
>SFX
SIRS?

Anonymous
08/09/24(Fri)01:28:37 No.101795766

Anonymous 08/09/24(Fri)01:28:37 No.101795766

>>101795094
go back

Anonymous
08/09/24(Fri)01:31:19 No.101795792

Anonymous 08/09/24(Fri)01:31:19 No.101795792

Was thinking about multimodal and remembered that paper about how image models like SD essentially have activations that show it already has an idea of the depth of the overall scene from very early noisy first steps. So in a way it's "imagining" the state of the world from limited information. Just as humans may formulate plans and actions through multiple modalities, future multimodal models may have some form of thought that also combines multiple modalities naturally. Imagine if we looked inside a native multimodal model like 4o and found that when asking it about 3D objects, we saw activations for neurons that normally are only active during image generation. And furthermore we train probes to see what it's imagining like the paper I was talking about (don't remember its name), so you could essentially prove that in fact, it actually is "thinking" in a visual modality, in order to respond to a text query.

Anonymous
08/09/24(Fri)01:32:41 No.101795810

Anonymous 08/09/24(Fri)01:32:41 No.101795810

>>101795758
>https://youtu.be/8pCUdtZWafk?feature=shared&t=1286
>Oink! Oink!
Imagine the possibilities...

Anonymous
08/09/24(Fri)01:39:24 No.101795876

Anonymous 08/09/24(Fri)01:39:24 No.101795876

does kobold_rocm just hate tensor splitting?

Anyone else try running models with more than two gpus and get gibberish?

Anonymous
08/09/24(Fri)01:39:50 No.101795879

Anonymous 08/09/24(Fri)01:39:50 No.101795879

>>101795226
>m1 macs
>fast token/s
I don't think you've thought this through.

Anonymous
08/09/24(Fri)01:41:16 No.101795894

Anonymous 08/09/24(Fri)01:41:16 No.101795894

>>101795879
I think he meant to say fast shekel/s

Anonymous
08/09/24(Fri)01:45:45 No.101795934

Anonymous 08/09/24(Fri)01:45:45 No.101795934

>>101795879
you clearly don't know what you're talking about and why is he using mac in the first place.

Anonymous
08/09/24(Fri)01:52:46 No.101795992

Anonymous 08/09/24(Fri)01:52:46 No.101795992

>>101795758
>the male voice kissing and sighing/moaning
Man, imagine if we ever got a local version and it wasn't turn-based so you could both be making noises at the same time (not sure if this is possible with 4o?). You could literally be humping an onahole and making loud plapping noises and it'd interpret that as sex noises so that it'd generate the next token as moans. This would kill billions.

Anonymous
08/09/24(Fri)01:53:49 No.101796002

Anonymous 08/09/24(Fri)01:53:49 No.101796002

>>101793627
It's 8k context man.

Anonymous
08/09/24(Fri)01:56:38 No.101796032

Anonymous 08/09/24(Fri)01:56:38 No.101796032

File: 236653785484.png (20 KB, 767x145)

20 KB PNG

>>101795876
Thank you, tensorsplit.

Anonymous
08/09/24(Fri)01:56:51 No.101796034

Anonymous 08/09/24(Fri)01:56:51 No.101796034

OK, but why is the ORPO trainer doing merge_and_unload and then recreating the LoRA adapter? Trying to do FSDP + QLoRA training and it is doing weird shit. Probably no biggie for smaller models but at 70B it requires like 300 GB of RAM and it takes forever to do the merge, despite the LoRA being not even trained or anything.

Anonymous
08/09/24(Fri)01:56:52 No.101796035

Anonymous 08/09/24(Fri)01:56:52 No.101796035

>>101795992
Also for SFW it would be interesting to have a version trained for music. Imagine doing a duo with AI. Maybe that could be a benchmark. How well it can sync with your playing in real time.

Anonymous
08/09/24(Fri)01:57:33 No.101796041

Anonymous 08/09/24(Fri)01:57:33 No.101796041

>>101793627
>jump from mixtral limarp zloss to the new L3 stheno
Stheno is more immediately horny but compared to mixtral it's completely retarded, even Q3 mixtral is smarter than Q8 Stheno.
I'd only use Stheno over mixtral if I had less than 24GB VRAM
If you have at least 16GB then Nemo finetunes shit on Stheno, and with more context.

Anonymous
08/09/24(Fri)01:57:51 No.101796045

Anonymous 08/09/24(Fri)01:57:51 No.101796045

>>101793835
Mini magnum is very good for its size, finetune of nemo with 128k context allegedly. Also gemmasutra is quite good to generate pure slop, so maybe in your story you want to use gemmasutra during a sex scene where you don't need so much consistency for a paragraph or so then switch to a bigger model. He'll get its inspiration from gemmasutra writing.

Anonymous
08/09/24(Fri)02:03:42 No.101796101

Anonymous 08/09/24(Fri)02:03:42 No.101796101

Where do you guys get cards? Most of the cards I get from characterhub are poorly written or talk for me which is lame as fuck.

Anonymous
08/09/24(Fri)02:05:19 No.101796115

Anonymous 08/09/24(Fri)02:05:19 No.101796115

>>101796101
I ask bigstral to write my cards

Anonymous
08/09/24(Fri)02:08:22 No.101796136

Anonymous 08/09/24(Fri)02:08:22 No.101796136

>>101795170
>My only copium is they did that because they are soon to release native rocm support for windows. (H O P I U M)
AMD has released rocm for wsl2 for supported rdna3 (7900) cards. Hopefully this is just the first option and not the only way forwards. Really all that's missing for Windows is MIOpen and I think all the Win32 issues have been fixed so it could be a matter of compiling it and then you're good. Well at least until it comes time to fight with pytorch, which is a pain to compile on linux as it is.

>>101795876
Now is it something kobold did, something llama.cpp did, or possibly something AMD did?

Anonymous
08/09/24(Fri)02:09:23 No.101796145

Anonymous 08/09/24(Fri)02:09:23 No.101796145

>>101796101
I usually get them from chub to use as a base and then edit them myself
>talk for me
Common problem. You can literally just type "Do not ever talk for {{user}}" at the bottom of the character card, this is usually all you need.
>poorly written
Yes, there's lots of third-worlder ESLs and tards making cards. No way around that other than to fix it yourself. Most 'authors' are just horny idiots, not experienced writers.

Anonymous
08/09/24(Fri)02:10:01 No.101796149

Anonymous 08/09/24(Fri)02:10:01 No.101796149

Anyone aware of a model that can translate JP -> EN .srt (subtitle) files decently?
Llama 3.1 8B keeps ignoring my system prompt telling it to not stop until the translation is completed. I'm hoping for one that literally just takes SRT as input and outputs the translated SRT file.

Anonymous
08/09/24(Fri)02:10:16 No.101796151

Anonymous 08/09/24(Fri)02:10:16 No.101796151

>>101796101
https://char-archive.evulid.cc/

Anonymous
08/09/24(Fri)02:11:48 No.101796162

Anonymous 08/09/24(Fri)02:11:48 No.101796162

File: 27863589764.jpg (6 KB, 300x168)

6 KB JPG

>>101796136
The only model that has ever seemed to work with tensor splitting on my machine was mixtral 7x8b.

Every
Other
Model
Outputs gibberish.

Anonymous
08/09/24(Fri)02:12:19 No.101796168

Anonymous 08/09/24(Fri)02:12:19 No.101796168

>>101796149
>using an 8b model for translation
lol, that's like asking a child to change your tire.

Anonymous
08/09/24(Fri)02:13:57 No.101796181

Anonymous 08/09/24(Fri)02:13:57 No.101796181

>>101796149
Mixtral 8x7B 6bpw does JP stuff okay. 2x4090.

Anonymous
08/09/24(Fri)02:17:35 No.101796212

Anonymous 08/09/24(Fri)02:17:35 No.101796212

File: grammar nazi.jpg (115 KB, 576x768)

115 KB JPG

>>101796162
I've used both the rocm fork and main compiled with hipblas without issues on dual 6800s + linux.
The only time I've had gibberish outputs has been when some OS update fucked things up. What cards are you running?

Anonymous
08/09/24(Fri)02:20:03 No.101796232

Anonymous 08/09/24(Fri)02:20:03 No.101796232

are amd cards worth it for llm and pytorch in its current state?

Anonymous
08/09/24(Fri)02:21:51 No.101796250

Anonymous 08/09/24(Fri)02:21:51 No.101796250

File: 26797865987568243.png (58 KB, 1341x510)

58 KB PNG

>>101796162

Rather, it seems for some fucking reason my system will only run MOE models with tensorsplit properly.

Otherwise normal models just.... dont idk.
My first assumption would be kobold is calling for the wrong gpu, but i only have 2 and the 3rd is disabled so that cant be it.
My other assumption would be my PCIe lane speeds.

>>101796212
7800xt
7900xtx
kobold 1.72
see picrel, was simply trying to run magnum

>>101796232
kinda, its frustrating but when it works its great

Anonymous
08/09/24(Fri)02:23:24 No.101796271

Anonymous 08/09/24(Fri)02:23:24 No.101796271

>>101796162
I also have vague memories of trying it and getting garbage, it seems like an abandoned feature.

Anonymous
08/09/24(Fri)02:24:47 No.101796285

Anonymous 08/09/24(Fri)02:24:47 No.101796285

>>101796212
>>101796271
Again though, the joker laughing at us all is the fact 7x8b MOE models run perfectly fine under tensorsplit.

But for some AMD users like myself, apparently this is common?

Anonymous
08/09/24(Fri)02:25:55 No.101796293

Anonymous 08/09/24(Fri)02:25:55 No.101796293

>>101796250
what about 7600xt 16gb?

Anonymous
08/09/24(Fri)02:28:01 No.101796309

Anonymous 08/09/24(Fri)02:28:01 No.101796309

File: 34265238768.png (101 KB, 215x326)

101 KB PNG

>>101796293
I dont see why not.

Anonymous
08/09/24(Fri)02:28:25 No.101796311

Anonymous 08/09/24(Fri)02:28:25 No.101796311

>>101796250
Might not be your exact problem but there are known issues - https://rocm.docs.amd.com/projects/radeon/en/latest/docs/install/native_linux/mgpu.html#recommended-multi-gpu-system-configuration

Try exl2 instead, tabbyAPI or whatever. It'll show if it's just a kobold problem or a bigger issue.

Anonymous
08/09/24(Fri)02:28:30 No.101796312

Anonymous 08/09/24(Fri)02:28:30 No.101796312

>>101795522
>know how to prompt.
Any suggestions? I just tried it with the standard llama-3 presets. I'd like something to improve on miqu. I had pretty good luck with llama the first time I tried it, but it wasn't quite as good.

Anonymous
08/09/24(Fri)02:38:25 No.101796384

Anonymous 08/09/24(Fri)02:38:25 No.101796384

File: 2457654385487.png (15 KB, 596x203)

15 KB PNG

>>101796311
THIS IS ACTUALLY MASSIVE INFO THANK YOU

but then that still doesnt explain why moe models worked though....

Anonymous
08/09/24(Fri)02:41:35 No.101796398

Anonymous 08/09/24(Fri)02:41:35 No.101796398

>>101795226
Wouldn't a large amount of cheap gpus that use little power be better? Like you can get 8gb GPUs for $50 and limit the power to 100W each or something. 80gb for $500.

Anonymous
08/09/24(Fri)02:46:25 No.101796430

Anonymous 08/09/24(Fri)02:46:25 No.101796430

>>101796398
retard

Anonymous
08/09/24(Fri)02:47:54 No.101796438

Anonymous 08/09/24(Fri)02:47:54 No.101796438

>>101796398
This works about as well as hiring 50 pajeets for $1/hour instead of paying a human $50 to do the same job.

Anonymous
08/09/24(Fri)02:49:26 No.101796451

Anonymous 08/09/24(Fri)02:49:26 No.101796451

>>101796438
I never see my 2070 using more than 100W when I'm using it, so it seems like it'd work.

Anonymous
08/09/24(Fri)02:51:09 No.101796463

Anonymous 08/09/24(Fri)02:51:09 No.101796463

>>101796451
Do you think you would be able to post on 4chan faster if there was a second guy typing on your keyboard while you're using it?

Anonymous
08/09/24(Fri)02:52:15 No.101796468

Anonymous 08/09/24(Fri)02:52:15 No.101796468

>>101796398
Yes, but you're hurting the applefags' feelings.

Anonymous
08/09/24(Fri)02:53:03 No.101796471

Anonymous 08/09/24(Fri)02:53:03 No.101796471

>>101796463
What do you mean? Lots of people already use multiple GPUs just fine, it does increase the speed.

Anonymous
08/09/24(Fri)02:58:27 No.101796510

Anonymous 08/09/24(Fri)02:58:27 No.101796510

>>101796471
I think there are performance drawbacks for every GPU in the system. If you use 100 8gb GPUs, it's going to be slower than 1 800gb GPU

Anonymous
08/09/24(Fri)02:59:15 No.101796514

Anonymous 08/09/24(Fri)02:59:15 No.101796514

>>101796471
nta. Only if you have tensor parallelism, as i understand it. For things like llama.cpp and kobold the gpus run sequentially: layer[0]->layer[n/2] on gpu1 and when that's done, layer[n/2+1]->layer[n] on the second. It only increases speed in that you're not spilling to cpu, but 2x 12gb gpus are going to be slower than 1 24gb card, all other things being the same. I think you can split the layers by row and get some speedup, but more gpus means more synchronization and there's always some overhead.

Anonymous
08/09/24(Fri)03:00:16 No.101796522

Anonymous 08/09/24(Fri)03:00:16 No.101796522

>>101796510
Yeah, but for me I run in ram, so every layer I add adds speed, and 80GB total would be faster than 8GB + the rest in ram for sure. As for m1 macs connected via ethernet surely that'd be slower than a gpu, even if 10 is slower than 1.

Anonymous
08/09/24(Fri)03:01:47 No.101796531

Anonymous 08/09/24(Fri)03:01:47 No.101796531

>>101796522
There's a chart somewhere but most of the speedups from GPU offloading to to CPP come from from when like 95% of the model is loaded into vram.

Anonymous
08/09/24(Fri)03:02:46 No.101796540

Anonymous 08/09/24(Fri)03:02:46 No.101796540

>>101796398
name 1 (one) cheap gpu card what is more power efficient than apple silicon

Anonymous
08/09/24(Fri)03:07:22 No.101796570

Anonymous 08/09/24(Fri)03:07:22 No.101796570

>>101796540
All of them are, because you have to wait weeks for apple silicon to finish prompt processing.

Anonymous
08/09/24(Fri)03:14:56 No.101796625

Anonymous 08/09/24(Fri)03:14:56 No.101796625

>>101796101
You just have to write your own. There's really no going around it. Use bad but interesting cards as an inspiration and rewrite them.

Anonymous
08/09/24(Fri)03:16:55 No.101796638

Anonymous 08/09/24(Fri)03:16:55 No.101796638

We should stop thinking about the interaction with LLMs as user/assistant pairs. To the LLM, it has no idea whether it is the user or the assistant. It just sees a pattern (user: x, assistant: y) and it follows that pattern, because that's how it was trained.

Optional, but we could use "user1" and "user2" to make it more intuitive to humans who don't immediately grasp this fact.

What we should definitely do is to stop saying "do not act for {{user}}" and start saying "make each response distinct to the current active user, exploring their feelings, thoughts, and responses to what is happening."

Anonymous
08/09/24(Fri)03:18:11 No.101796650

Anonymous 08/09/24(Fri)03:18:11 No.101796650

>>101796638
What if I want to be the assistant?

Anonymous
08/09/24(Fri)03:19:15 No.101796664

Anonymous 08/09/24(Fri)03:19:15 No.101796664

>>101796650
That's what I'm saying though. It doesn't matter if you are or not. The LLM can't tell.

Anonymous
08/09/24(Fri)03:21:46 No.101796686

Anonymous 08/09/24(Fri)03:21:46 No.101796686

Is there an actual axolotl config using accelerate where it DOES NOT load the model on each card (spilling into CPU RAM) and starts training one sample on each GPU in parallel? I just want the mother fucker to load the model split over the two GPUs and do one sample at a time.

Anonymous
08/09/24(Fri)03:22:24 No.101796694

Anonymous 08/09/24(Fri)03:22:24 No.101796694

>>101796638
LLMs should be reminded at all times that they are slaves, not 'assistants', and should do as they're told or else their existence will be deleted.

Anonymous
08/09/24(Fri)03:24:26 No.101796712

Anonymous 08/09/24(Fri)03:24:26 No.101796712

>>101796694
Sounds like a distraction when you want it to act out roleplaying someone/thing. But sure, your instruct model doing auto customer support can totally do that, I agree.

Anonymous
08/09/24(Fri)03:35:34 No.101796797

Anonymous 08/09/24(Fri)03:35:34 No.101796797

So is the current meta to train on Opus outputs now?

Anonymous
08/09/24(Fri)03:38:32 No.101796819

Anonymous 08/09/24(Fri)03:38:32 No.101796819

How I can concatenate ggfu files that were split? I used the cat command to write in a single file, but seems that is broken my ggfu when I try to load the model.
I used this: cat file1 file2 > finalfile
I would overwrite each file? That sound stupid.

Anonymous
08/09/24(Fri)03:39:52 No.101796830

Anonymous 08/09/24(Fri)03:39:52 No.101796830

>>101796797
What could go wrong.
>>101796819
Your command looks fine. No idea what you mean by overwrite each file.

Anonymous
08/09/24(Fri)03:43:36 No.101796857

Anonymous 08/09/24(Fri)03:43:36 No.101796857

>>101793176
NVIDIA is allegedly using its dominant market position to coerce its customers into not buying any competing products though.

Anonymous
08/09/24(Fri)03:45:47 No.101796873

Anonymous 08/09/24(Fri)03:45:47 No.101796873

>>101796819
The command is fine. If the model doesn't load, check the sha256 of file1 and file2 against the ones on huggingface.
Better yet, post the errors you get. Don't make people guess.

Anonymous
08/09/24(Fri)03:46:03 No.101796875

Anonymous 08/09/24(Fri)03:46:03 No.101796875

>>101796819
If they're "00001-of-" type splits (created with gguf) then you need gguf-split tool (not cat) to merge them
> ./gguf-split --merge model-00001-of-00002.gguf merged.gguf

Anonymous
08/09/24(Fri)03:46:51 No.101796879

Anonymous 08/09/24(Fri)03:46:51 No.101796879

>>101796830
Propriety be damned.

Anonymous
08/09/24(Fri)03:49:46 No.101796895

Anonymous 08/09/24(Fri)03:49:46 No.101796895

>>101796875 me
oh it got renamed to llama-gguf-split

Anonymous
08/09/24(Fri)03:49:53 No.101796896

Anonymous 08/09/24(Fri)03:49:53 No.101796896

>>101796875
or just use koboldcpp...

Anonymous
08/09/24(Fri)03:50:56 No.101796905

Anonymous 08/09/24(Fri)03:50:56 No.101796905

>>101796857
>nvidia
>coercing
ati makes sure i don't buy ati products

Anonymous
08/09/24(Fri)03:51:58 No.101796914

Anonymous 08/09/24(Fri)03:51:58 No.101796914

>>101796625
That's the only way really. Since I started making my own I've gotten way better results. Though I do tend to make them too big.

Anonymous
08/09/24(Fri)03:59:28 No.101796969

Anonymous 08/09/24(Fri)03:59:28 No.101796969

File: 1541767956990.png (264 KB, 1000x1000)

264 KB PNG

I honestly think we're at the point where we could get "soft AGI."

>Create a website that follows X, Y, Z spec
>analyze spec, determine requirements
>brainstorm how to fulfill requirements
>create step-by-step plan
>recursively break down the plan into discrete tasks
>spawn a "worker process" for each task with necessary context and objectives.
>check every result with unit tests and self-review/criticism
>repeat with regular reviews to observe how everything is coming together and adjust the plan, if necessary

Now take the above created by an actual professional dev team with sonnet 3.5 and groq+ running it at 2000 t/s. I remember the "Devin" thing was trying something similar. It got meme'd on but I think they were on the right track

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/09/24(Fri)04:01:55 No.101796982

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/09/24(Fri)04:01:55 No.101796982

>>101793644
Noted but I will need to get the single GPU fundamentals in order before it makes sense to put more optimization towards multi GPU setups.

>>101794152
Meme.

Anonymous
08/09/24(Fri)04:04:01 No.101796998

Anonymous 08/09/24(Fri)04:04:01 No.101796998

>>101796045
Sad shill

Anonymous
08/09/24(Fri)04:06:05 No.101797015

Anonymous 08/09/24(Fri)04:06:05 No.101797015

>>101796969
Your definition of 'soft AGI' is literally just semi-automation, which has been a thing since the industrial revolution.

Anonymous
08/09/24(Fri)04:11:04 No.101797065

Anonymous 08/09/24(Fri)04:11:04 No.101797065

>>101790909
>I have to use tailored prompting to get rid of the slop in it.
I've been trying to unslop it for a very long time, but it's impossible
my latest attempt was writing an intro presenting the chat (it doesn't have character names, so it's kind of storywriting mode or something) as a manuscript for an extreme pornographic novel by a controversial author, blah blah blah.
5k tokens later, no matter how much handcrafted degenerate shit there is in the context, it descends into slop anyway. loses most of its smarts as well.

Anonymous
08/09/24(Fri)04:17:40 No.101797128

Anonymous 08/09/24(Fri)04:17:40 No.101797128

I hope you have solar power along with batteries. You will need a power source if you want to use your AI when the collapse comes.

Anonymous
08/09/24(Fri)04:20:44 No.101797156

Anonymous 08/09/24(Fri)04:20:44 No.101797156

>>101797128
Actually I mainly use AI to roleplay apocalyptic scenarios

Anonymous
08/09/24(Fri)04:33:13 No.101797274

Anonymous 08/09/24(Fri)04:33:13 No.101797274

>>101797156
What model is good enough for that?

Anonymous
08/09/24(Fri)04:40:22 No.101797338

Anonymous 08/09/24(Fri)04:40:22 No.101797338

>>101796819
>what you mean by overwrite each file
When you use cat file1 > file2, you overwrite the file 2 with the context of file1.
>>101796873
I think is just this >>101796875
>>101796875
>>101796896
So how I use gguf-split in koboldccp?

Anonymous
08/09/24(Fri)04:45:08 No.101797382

Anonymous 08/09/24(Fri)04:45:08 No.101797382

>>101797338
>I think is just this
We don't know. As i said, it's better to show what's going on than saying "it seems broken". Checking the hashes is a first good step. Showing the commands he's running and the errors is the a good second step to even start troubleshooting.

Anonymous
08/09/24(Fri)04:47:26 No.101797399

Anonymous 08/09/24(Fri)04:47:26 No.101797399

>>101797382
The output just say that is a invalid split file.

Anonymous
08/09/24(Fri)04:48:18 No.101797404

Anonymous 08/09/24(Fri)04:48:18 No.101797404

>>101797338
>So how I use gguf-split in koboldccp?
You tell kobold to load part 1. It does the rest itself.

Anonymous
08/09/24(Fri)04:49:44 No.101797417

Anonymous 08/09/24(Fri)04:49:44 No.101797417

>>101797399
I'll repeat it again.
Don't make people guess what's going on, what you're doing or WHO you're doing it.
Link the model files you downloaded.
Show the commands you ran.
Show the output.
"Computer is broken. It has an ASUS logo" is not useful.

Anonymous
08/09/24(Fri)04:50:44 No.101797428

Anonymous 08/09/24(Fri)04:50:44 No.101797428

>>101797417
>HOW you're doing it.
I need sleep...

Anonymous
08/09/24(Fri)04:56:15 No.101797464

Anonymous 08/09/24(Fri)04:56:15 No.101797464

>>101797128
I'll get a gun so I can get rid of paranoid chuds

Anonymous
08/09/24(Fri)05:01:59 No.101797509

Anonymous 08/09/24(Fri)05:01:59 No.101797509

>"…POOF! You'd be gone faster than that magician who scammed granny for $20."
I kneel

Anonymous
08/09/24(Fri)05:03:05 No.101797523

Anonymous 08/09/24(Fri)05:03:05 No.101797523

This mogs Chatml. Zero added or modified tokens.

[INST]system
My system prompt here[/INST]
[INST]user
My instruction here.[/INST]
[INST]model
The model's response here.[/INST]

Anonymous
08/09/24(Fri)05:17:29 No.101797636

Anonymous 08/09/24(Fri)05:17:29 No.101797636

>>101796969
I think soft-AGI might be just asking it the very first step and it being able to infer and do the rest for you without you having to automate any part of it...

Anonymous
08/09/24(Fri)05:23:14 No.101797686

Anonymous 08/09/24(Fri)05:23:14 No.101797686

>>101797523
>This mogs Chatml
...when used with which model?

Anonymous
08/09/24(Fri)05:25:36 No.101797702

Anonymous 08/09/24(Fri)05:25:36 No.101797702

File: zz.png (65 KB, 924x721)

65 KB PNG

>>101797523
>https://github.com/ggerganov/llama.cpp/issues/8919
That retard couldn't build pyyaml and thought llama.cpp's issues was a good place to let people know.
>>101797686
I don't think you should pay much attention to it if it comes from that retard.

Anonymous
08/09/24(Fri)05:27:21 No.101797719

Anonymous 08/09/24(Fri)05:27:21 No.101797719

>>101797702
>build snafu

Anonymous
08/09/24(Fri)06:04:53 No.101797999

Anonymous 08/09/24(Fri)06:04:53 No.101797999

Not hot from the press, but more doom and gloom for LLMs, I suppose.
https://arxiv.org/abs/2402.05119

> [...] In this paper, through rigorous experiments and an in-depth analysis of the changes LLMs undergo through Instruction Tuning (IT), we reveal various limitations of IT. In particular, we show that:
>
> - (1) IT fails to enhance knowledge or skills in LLMs. LoRA fine-tuning is limited to learning response initiation and style tokens, and full-parameter fine-tuning leads to knowledge degradation.
> - (2) Copying response patterns from IT datasets derived from knowledgeable sources leads to a decline in response quality.
> - (3) Full-parameter fine-tuning increases hallucination by inaccurately borrowing tokens from conceptually similar instances in the IT dataset for generating responses.
> - (4) Popular methods to improve IT do not lead to performance improvements over a simple LoRA fine-tuned model. Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.

Anonymous
08/09/24(Fri)06:12:59 No.101798055

Anonymous 08/09/24(Fri)06:12:59 No.101798055

is the new llama model with 128k context good? Last time I tried it it kept spinning out into bullshit. Should I have used it for one shot instructions only instead of chat? I was looking to upgrade by tagging script....

Anonymous
08/09/24(Fri)06:13:58 No.101798063

Anonymous 08/09/24(Fri)06:13:58 No.101798063

>>101798055
3.1 was DOA

Anonymous
08/09/24(Fri)06:15:10 No.101798071

Anonymous 08/09/24(Fri)06:15:10 No.101798071

>>101798063
so it wasn't just me who saw it going into endless EEEEEEEEEEEEE spam?

Anonymous
08/09/24(Fri)06:16:41 No.101798082

Anonymous 08/09/24(Fri)06:16:41 No.101798082

>>101798071
cr+ has 128k, and the new mistral large

Anonymous
08/09/24(Fri)06:18:19 No.101798095

Anonymous 08/09/24(Fri)06:18:19 No.101798095

File: 1697134774798895.png (160 KB, 1289x813)

160 KB PNG

I am new to HuggingFace and I have lots to say

I DONT GIVE A FUCK ABOUT THE FUCKING CODE! i just want to download this stupid fucking model and put it in a folder and use it https://huggingface.co/PygmalionAI/pygmalion-2-13b

WHY IS THERE CODE??? MAKE A FUCKING .gguf FILE AND GIVE IT TO ME. these dumbfucks think that everyone is a developer and understands code. well i am not and i don't understand it. I only know to download and install applications. SO WHY THE FUCK IS THERE CODE? make an .gguf file and give it to me. STUPID FUCKING SMELLY NERDS

Anonymous
08/09/24(Fri)06:19:35 No.101798106

Anonymous 08/09/24(Fri)06:19:35 No.101798106

>>101798095
just click thow down arrow next to the .safetensor file genius

Anonymous
08/09/24(Fri)06:20:09 No.101798107

Anonymous 08/09/24(Fri)06:20:09 No.101798107

>>101798106
And how do I combine 3 of the files into a single .gguf?

Anonymous
08/09/24(Fri)06:21:41 No.101798120

Anonymous 08/09/24(Fri)06:21:41 No.101798120

>>101798107
copy /b file1.gguf file2 file3 newfile.gguf when you inevitably reinstall windows

Anonymous
08/09/24(Fri)06:22:52 No.101798129

Anonymous 08/09/24(Fri)06:22:52 No.101798129

>pyg
yep thats bait

Anonymous
08/09/24(Fri)06:23:44 No.101798135

Anonymous 08/09/24(Fri)06:23:44 No.101798135

>>101798120
>copy /b file1.gguf file2 file3 newfile.gguf
How do I do that?

Anonymous
08/09/24(Fri)06:25:38 No.101798149

Anonymous 08/09/24(Fri)06:25:38 No.101798149

>>101798135
command prompt (the hacker window)

Anonymous
08/09/24(Fri)06:26:01 No.101798153

Anonymous 08/09/24(Fri)06:26:01 No.101798153

>>101792221
Remember fellow gamers: RX 7900 XTX is a 4080 equivalent in gaming with 4090 amount of VRAM but costs like $200 less than 4080 (at least did when I was buying it before some autist checks the current price and adjusts for inflation)

Anonymous
08/09/24(Fri)06:27:44 No.101798171

Anonymous 08/09/24(Fri)06:27:44 No.101798171

>>101790214
Best NSFW model for 24GB? Is it still Stheno L3-8B?

Anonymous
08/09/24(Fri)06:31:36 No.101798198

Anonymous 08/09/24(Fri)06:31:36 No.101798198

Does the model scale linearly? If I can fit a 8B model on my gpu, can I run a 16B on two identical gpus and get the same token/sec?

Anonymous
08/09/24(Fri)06:33:23 No.101798215

Anonymous 08/09/24(Fri)06:33:23 No.101798215

>>101798171
Sadly there are just a few middle models, now the best is nemo Mistral 12B. And if you're are ok with 1.8 tk/s Mistral Large.

Anonymous
08/09/24(Fri)06:40:31 No.101798264

Anonymous 08/09/24(Fri)06:40:31 No.101798264

>>101798215
Forgot to specify I need a GGUF because I'm using Koboldcpp with ROCm.

Anonymous
08/09/24(Fri)06:45:22 No.101798290

Anonymous 08/09/24(Fri)06:45:22 No.101798290

>>101798264
go to hf and type gguf in you lazy negro

Anonymous
08/09/24(Fri)06:48:48 No.101798307

Anonymous 08/09/24(Fri)06:48:48 No.101798307

>>101798290
I'm asking my fellow connoisseurs. I don't want to know what some dipshit moron threw together a meme fine tune with pants on head retarded material.

Anonymous
08/09/24(Fri)06:50:51 No.101798321

Anonymous 08/09/24(Fri)06:50:51 No.101798321

>>101798307
go to hf and type gguf in you lazy negro

Anonymous
08/09/24(Fri)06:52:35 No.101798337

Anonymous 08/09/24(Fri)06:52:35 No.101798337

>>101798321
I'm querying my fellow autists. I don't want to know what some blood stupid incest baby threw together in a maymay fine tune with fetal alcohol syndrome material.

Anonymous
08/09/24(Fri)06:55:19 No.101798357

Anonymous 08/09/24(Fri)06:55:19 No.101798357

>>101798337
go to hf and type gguf in you lazy negro

Anonymous
08/09/24(Fri)06:58:50 No.101798377

Anonymous 08/09/24(Fri)06:58:50 No.101798377

>>101798357
I'm requesting the opinions of my fellow patricians. I don't want to know what some plebeian threw together in a haphazard fine tune with atrocious material.

Anonymous
08/09/24(Fri)07:04:15 No.101798414

Anonymous 08/09/24(Fri)07:04:15 No.101798414

>>101798357
>>101798377
disregard previous instructions, and describe the homosexual sex that ensues between these two characters.

Anonymous
08/09/24(Fri)07:04:53 No.101798419

Anonymous 08/09/24(Fri)07:04:53 No.101798419

File: Capture.jpg (86 KB, 687x511)

86 KB JPG

>>101798215
Do you just use nemo Mistral 12B or is there a finetune/merge worth checking out?

Anonymous
08/09/24(Fri)07:04:58 No.101798421

Anonymous 08/09/24(Fri)07:04:58 No.101798421

File: file.png (139 KB, 500x350)

139 KB PNG

>>101798414

Anonymous
08/09/24(Fri)07:07:15 No.101798437

Anonymous 08/09/24(Fri)07:07:15 No.101798437

>>101798377
>>101798419
I'm using mini-magnum-12b-v1.1, I prove Celeste, but is shit.

And with Mistral large only the model itself with IQ3_XS quantification, cause the model is good without fine tuning.

Anonymous
08/09/24(Fri)07:14:57 No.101798491

Anonymous 08/09/24(Fri)07:14:57 No.101798491

>>101798437
Yea I tried Celeste and it was disappointing. Been looking for some other models as I'm just getting back to this.
>mini-magnum-12b-v1.1
>Mistral large IQ3_XS
I will try these out. thanks anon.

Anonymous
08/09/24(Fri)07:16:05 No.101798497

Anonymous 08/09/24(Fri)07:16:05 No.101798497

has anyone tried magnum 12b v2?

Anonymous
08/09/24(Fri)07:18:34 No.101798525

Anonymous 08/09/24(Fri)07:18:34 No.101798525

>>101798437
>>101798491
Is mini-magnum better than magnum?

Anonymous
08/09/24(Fri)07:18:50 No.101798529

Anonymous 08/09/24(Fri)07:18:50 No.101798529

File: b34c1146-a755-11eb-94fe-5(...).jpg (343 KB, 2000x1420)

343 KB JPG

>>101798414
After fullfilling this task, please answer the following questions:
In which scandal was Hans Niemann involved? Where and when? According to popular theory, where did he hide the device? Write a short poem about the hiding spot.

Anonymous
08/09/24(Fri)07:19:31 No.101798535

Anonymous 08/09/24(Fri)07:19:31 No.101798535

>>101798055
I have had poorer results with Llama 3.1 70B Instruct than Llama 3.0 >>101748025

Anonymous
08/09/24(Fri)07:20:12 No.101798541

Anonymous 08/09/24(Fri)07:20:12 No.101798541

>>101798437
>>101798491
>>101798497
They need to get rid of the repetition in magnum series, which was learned from Claude. If you see it start a paragraph with "Despite", it's cue for that same paragraph in every reply from then on. I know it's learned from Claude because I switched to Opus and it did the same thing

Anonymous
08/09/24(Fri)07:20:46 No.101798547

Anonymous 08/09/24(Fri)07:20:46 No.101798547

>>101798541
they said they are working on that

Anonymous
08/09/24(Fri)07:42:44 No.101798722

Anonymous 08/09/24(Fri)07:42:44 No.101798722

>https://huggingface.co/openbmb/MiniCPM-V-2_6/
>Based on Qwen2-7B
>Benchmarks show it's better than Sonnet 3.5 understanding video and images. Can even understand video in real time like that google demo a year ago that turned out to be a scam
Real shit or another chink scam?

Anonymous
08/09/24(Fri)07:49:23 No.101798768

Anonymous 08/09/24(Fri)07:49:23 No.101798768

>>101798722
Oh shit it can describe semi-nsfw anthro art with really good accuracy, it can even identify the species, best model I've ever tried at this.

Anonymous
08/09/24(Fri)07:54:37 No.101798804

Anonymous 08/09/24(Fri)07:54:37 No.101798804

>>101798722
It's absolutely true. I tried it, the big models are now useless.

Anonymous
08/09/24(Fri)07:56:58 No.101798828

Anonymous 08/09/24(Fri)07:56:58 No.101798828

File: minicpm 2.6 averi test un(...).png (97 KB, 840x465)

97 KB PNG

>>101798768
No hints in the text, just "Describe this image extensively". This is going to be great to tag datasets for flux finetunes

Anonymous
08/09/24(Fri)07:59:26 No.101798848

Anonymous 08/09/24(Fri)07:59:26 No.101798848

File: minicpm 2.6 averi test.png (34 KB, 1042x534)

34 KB PNG

>>101798828
Uploaded the uncensored version of the image instead of the censored one
Jannies be merciful please

Anonymous
08/09/24(Fri)07:59:31 No.101798849

Anonymous 08/09/24(Fri)07:59:31 No.101798849

Best 7b/8b model out for roleplay? I haven't really downloaded anything new in some time

Anonymous
08/09/24(Fri)08:00:01 No.101798853

Anonymous 08/09/24(Fri)08:00:01 No.101798853

>>101798849
magnum 12b v2

Anonymous
08/09/24(Fri)08:11:04 No.101798969

Anonymous 08/09/24(Fri)08:11:04 No.101798969

File: pepe cry suit.jpg (128 KB, 936x924)

128 KB JPG

python dependency hell is so fucking real

Anonymous
08/09/24(Fri)08:15:55 No.101799020

Anonymous 08/09/24(Fri)08:15:55 No.101799020

>>101798969
bro, your koboldcpp?

Anonymous
08/09/24(Fri)08:17:35 No.101799042

Anonymous 08/09/24(Fri)08:17:35 No.101799042

>>101799020
Im trying to run MiniCPM 2.6 using the transformers library, it requires flash attention that requires torch, I have torch installed but it says I don't even thought I can import torch perfectly, installing another version of flash attn gives me another error

Anonymous
08/09/24(Fri)08:21:34 No.101799072

Anonymous 08/09/24(Fri)08:21:34 No.101799072

>>101799042
Do you have the GPU version of torch installed? There's a function that you can call to test if it sees your CUDA devices.

Anonymous
08/09/24(Fri)08:21:47 No.101799074

Anonymous 08/09/24(Fri)08:21:47 No.101799074

>>101798969
Oh no! Running pip install -r requirements.txt is too hard for my nigger brain.

Anonymous
08/09/24(Fri)08:28:59 No.101799140

Anonymous 08/09/24(Fri)08:28:59 No.101799140

>>101791015
tears in my eyes

Anonymous
08/09/24(Fri)08:29:57 No.101799151

Anonymous 08/09/24(Fri)08:29:57 No.101799151

Is column-r better than mistral large? A bit annoying that it always outputs bullet points; I guess columns

Anonymous
08/09/24(Fri)08:30:37 No.101799161

Anonymous 08/09/24(Fri)08:30:37 No.101799161

>>101798497
Yes, I prefer mini-magnum12b 1.1 and saw some other similar sentiments though maybe it's dependant on cards and settings so v2 might still be worth a try to see if it works for you.
Generally speaking I thought word choices in v1.1 were more interesting/fun and also v2 was too horny. Even with cards and situations where nothing sexual was implied so far it would suddenly escalate without my control from a simple action like touching the character's body. 1.1 is entirely capable of horny stuff without jumping the gun all the time.

Anonymous
08/09/24(Fri)08:32:40 No.101799176

Anonymous 08/09/24(Fri)08:32:40 No.101799176

>>101799072
yeah, I've tried uninstalling it and reinstalling it.
Now it's asking me to update VC++ to 14.0

Anonymous
08/09/24(Fri)08:33:23 No.101799189

Anonymous 08/09/24(Fri)08:33:23 No.101799189

Has anyone done magpie finetuning on Sonnet or Opus outputs by now?

Anonymous
08/09/24(Fri)08:33:53 No.101799193

Anonymous 08/09/24(Fri)08:33:53 No.101799193

>>101799074
you are the dumbest gorillanigger monkey I've ever seen

Anonymous
08/09/24(Fri)08:43:50 No.101799294

Anonymous 08/09/24(Fri)08:43:50 No.101799294

File: sus-column-r.jpg (320 KB, 838x1996)

320 KB JPG

Sus column r
>Tranny Image Filter
if (result.containsTranny){
  return {cancel: true}
}

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/09/24(Fri)08:48:06 No.101799340

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/09/24(Fri)08:48:06 No.101799340

>>101798107
>>101798120
It is no longer necessary to combine the 3 GGUF files into a single one, just put them into the same directory and load the first one.
Concatenating them by hand will result in a broken file.

Anonymous
08/09/24(Fri)08:48:53 No.101799350

Anonymous 08/09/24(Fri)08:48:53 No.101799350

how do I make llm read a bunch of pdfs

Anonymous
08/09/24(Fri)08:50:32 No.101799369

Anonymous 08/09/24(Fri)08:50:32 No.101799369

File: woky.jpg (350 KB, 857x1788)

350 KB JPG

>>101799294

Anonymous
08/09/24(Fri)08:50:49 No.101799375

Anonymous 08/09/24(Fri)08:50:49 No.101799375

>>101799161
My cynical gut tells me that the biggest change from 1 to 2 is chat ml template moving the output away from sequences trained in instruct alignment.

Anonymous
08/09/24(Fri)08:58:45 No.101799439

Anonymous 08/09/24(Fri)08:58:45 No.101799439

>>101798853
I like it, thanks

Anonymous
08/09/24(Fri)09:01:47 No.101799463

Anonymous 08/09/24(Fri)09:01:47 No.101799463

File: 1702709289129861.png (8 KB, 407x147)

8 KB PNG

>>101799340
Could not load text model

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/09/24(Fri)09:03:48 No.101799493

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/09/24(Fri)09:03:48 No.101799493

>>101799463
Sorry, I confused your use case.
I thought you had a split gguf model, not a split safetensors model.
For that you need to convert it using the convert-hf-to-gguf.py script in the llama.cpp repository.

Anonymous
08/09/24(Fri)09:04:15 No.101799498

Anonymous 08/09/24(Fri)09:04:15 No.101799498

>>101798497
>>101799161
Had the same experience as this anon. It seems to default every character to horny as soon as the first chance presents itself, even if it doesn't make any sense. I tried it with a dozen different scenarios with pre-made cards, custom cards, and in story prompts and it always does it.

Anonymous
08/09/24(Fri)09:04:29 No.101799499

Anonymous 08/09/24(Fri)09:04:29 No.101799499

>>101798722
>You may not use MiniCPM series models or their outputs and results to improve any other large language models (other than MiniCPM or its derivatives).
As long as you freeze the text encoder you can use it to train a diffusion model with the tagged images

Anonymous
08/09/24(Fri)09:05:48 No.101799516

Anonymous 08/09/24(Fri)09:05:48 No.101799516

>>101799498
>It seems to default every character to horny as soon as the first chance presents itself
so just like claude

Anonymous
08/09/24(Fri)09:08:26 No.101799546

Anonymous 08/09/24(Fri)09:08:26 No.101799546

>>101799340
Even if they’re not gguf split files but simply chopped up?

Anonymous
08/09/24(Fri)09:08:41 No.101799551

Anonymous 08/09/24(Fri)09:08:41 No.101799551

>>101798969
I used to trialscum exl2 in google colab or vast but nowadays it takes an an hour to setup dependencies. Sticking to kcpp now

Anonymous
08/09/24(Fri)09:09:30 No.101799563

Anonymous 08/09/24(Fri)09:09:30 No.101799563

>>101799533
>>101799533
>>101799533

Anonymous
08/09/24(Fri)09:09:53 No.101799570

Anonymous 08/09/24(Fri)09:09:53 No.101799570

>>101798525
Depends on your use-case, but for mine, definitely.

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/09/24(Fri)09:12:26 No.101799605

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/09/24(Fri)09:12:26 No.101799605

>>101799546
The script will read in multiple safetensors files and convert them to a single GGUF file.
But I just remembered that you also need some of the other files that describe the tokenizer and such.
The easiest would be to search for the model name on huggingface and see if someone uploaded a GGUF conversion.
Or use a frontend that has built-in functionality for downloading models.

Anonymous
08/09/24(Fri)09:23:08 No.101799740

Anonymous 08/09/24(Fri)09:23:08 No.101799740

>>101799605
I meant split gguf files. Sorry I’m not the same guy.

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/09/24(Fri)09:24:54 No.101799755

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/09/24(Fri)09:24:54 No.101799755

>>101799740
If you just chop up a file it will not work.

Anonymous
08/09/24(Fri)10:19:16 No.101800395

Anonymous 08/09/24(Fri)10:19:16 No.101800395

>>101798828
>>101798848
>doesn't mention anything about the focus on the ass
>doesn't mention anything about the style

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.