/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/25/24(Thu)08:56:49 No.100173514

File: miku bread.jpg (270 KB, 1024x1024)

270 KB JPG

/lmg/ - Local Models General Anonymous 04/25/24(Thu)08:56:49 No.100173514 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100154945 & >>100166886

►News
>(04/24) Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
>(04/23) Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
>(04/21) Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
>(04/18) Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
04/25/24(Thu)09:02:08 No.100173573

Anonymous 04/25/24(Thu)09:02:08 No.100173573

File: 1691041725883639.png (359 KB, 512x512)

359 KB PNG

what are the requirements for using a local model together with an LLM?
i have 64GB RAM and 16GB VRAM on an AMD system. i normally use koboldcpp for llms and comfy for SD stuff.

Anonymous
04/25/24(Thu)09:02:53 No.100173584

Anonymous 04/25/24(Thu)09:02:53 No.100173584

It's over

Anonymous
04/25/24(Thu)09:03:22 No.100173590

Anonymous 04/25/24(Thu)09:03:22 No.100173590

>>100173514
>Previous threads: >>100154945 & >>100166886

Anonymous
04/25/24(Thu)09:04:11 No.100173601

Anonymous 04/25/24(Thu)09:04:11 No.100173601

>>100173514
>>100173573
>>100173584
>>100173590
good morning sir!

Anonymous
04/25/24(Thu)09:04:55 No.100173612

Anonymous 04/25/24(Thu)09:04:55 No.100173612

>>100173573
The absolute state of /lmg/

Anonymous
04/25/24(Thu)09:05:20 No.100173621

Anonymous 04/25/24(Thu)09:05:20 No.100173621

>>100173573
>an internet connection
>the ability to read
>a lot of time
I think that about sums it up

Anonymous
04/25/24(Thu)09:06:29 No.100173643

Anonymous 04/25/24(Thu)09:06:29 No.100173643

>>100173573
depends on what size you're willing to run. you should be able to run a 8b - 11b model and have enough space for sd as well probably.

Anonymous
04/25/24(Thu)09:07:29 No.100173650

Anonymous 04/25/24(Thu)09:07:29 No.100173650

>people still recommending Mythomax and fucking CR+ to newbie VRAMlets
Why? Is this some form of gatekeeping I'm too deep in to understand?

Anonymous
04/25/24(Thu)09:10:07 No.100173687

Anonymous 04/25/24(Thu)09:10:07 No.100173687

File: mizuasobi.png (1.2 MB, 1304x744)

1.2 MB PNG

►Recent Highlights from the Previous Thread: >>100166886

--Enabling Local Language Models to Access External Sources: >>100170746 >>100170878 >>100170905 >>100170924 >>100170942 >>100170947 >>100171324 >>100171202
--Optimizing LLMs for Reasoning: Phi's Limitations and Future Directions: >>100167878 >>100167897
--Anon's Experience with Llama 3 70b Instruct: Shortening Responses Near Context Limit: >>100167911 >>100169713 >>100169792 >>100170112 >>100170062 >>100170270
--Noticable Quality Drop with Quantization in Llama 3 Models: >>100169493 >>100169506 >>100169525 >>100169914
--Are Lengthy Multi-Rule Prompts Killing Model Creativity?: >>100167192
--Anon's Llama Model Performance Benchmarks: >>100167274 >>100167298 >>100167910 >>100167941 >>100168292
--The Utility of Large Language Models: Beyond Fiction Generation: >>100167521 >>100167544 >>100167555
--Can LLMs Generate PDBs from Decompiled Programs?: >>100167690 >>100167736 >>100167871 >>100170388 >>100170564 >>100170589 >>100170630 >>100170953
--Anon's Take on Meta Stock Drop: Faith, Hope, and Market Volatility: >>100168186 >>100168191 >>100168690 >>100168736 >>100168749 >>100168765 >>100168789
--Best Model for ERP and Productivity Tasks?: >>100171747 >>100172054 >>100172096 >>100172361 >>100172407 >>100172423
--Snapdragon X Plus: Promising AI Performance or Overhyped?: >>100168557 >>100168605 >>100168624 >>100168651 >>100168671 >>100168774
--Llama-3-Instruct Model Discussion: Censorship, Prompt Structure, and Role-Playing: >>100167135 >>100167187 >>100167229 >>100167265 >>100167298 >>100167307 >>100167350 >>100167575 >>100167610 >>100167631
--Understanding the Difference Between Uncensored Models and Psycho Models: >>100167678 >>100167724 >>100167813 >>100167880 >>100170302
--Integrating Comfyui with Stable Diffusion 3: >>100168344 >>100168378 >>100168420 >>100168685
--Miku (free space): >>100168445 >>100166912 >>100170598 >>100171118 >>100173294

►Recent Highlight Posts from the Previous Thread: >>100166891

Anonymous
04/25/24(Thu)09:11:20 No.100173701

Anonymous 04/25/24(Thu)09:11:20 No.100173701

>>100173573
>how do I use an LLM with an LLM
anon...

Anonymous
04/25/24(Thu)09:12:28 No.100173717

Anonymous 04/25/24(Thu)09:12:28 No.100173717

File: 1704467287466611.png (444 KB, 512x512)

444 KB PNG

>>100173701

>>100173573
>what are the requirements for using a local model together with an LLM?
whoops. i actually meant image gen.
what i want is to run llm with SD in something like ST. how well does that work?

pls no bully

Anonymous
04/25/24(Thu)09:13:18 No.100173727

Anonymous 04/25/24(Thu)09:13:18 No.100173727

Anyone try this yet? https://huggingface.co/TheDrummer/Moistral-11B-v3

Anonymous
04/25/24(Thu)09:13:45 No.100173734

Anonymous 04/25/24(Thu)09:13:45 No.100173734

>>100173717
Wait for true multimodal LLaMa 3, producing perfect Miku images and RP.

Anonymous
04/25/24(Thu)09:14:32 No.100173745

Anonymous 04/25/24(Thu)09:14:32 No.100173745

>>100173727
I normally love downloading random slop meme models but that name is stupid as fuck so no

Anonymous
04/25/24(Thu)09:15:45 No.100173752

Anonymous 04/25/24(Thu)09:15:45 No.100173752

>>100173745
https://old.reddit.com/r/LocalLLaMA/comments/1cc6xb1/moistral_11b_v3_the_finetuned_moist_just_got/
Reddit seems to love it.
>Cream-Phi-2
kek

Anonymous
04/25/24(Thu)09:16:28 No.100173762

Anonymous 04/25/24(Thu)09:16:28 No.100173762

https://huggingface.co/TheBloke/platypus-yi-34b-GGUF

This model, of all things, performs the best at ooba's secret bechmark.

Anonymous
04/25/24(Thu)09:22:49 No.100173826

Anonymous 04/25/24(Thu)09:22:49 No.100173826

File: Screenshot 2024-04-25 152236.png (113 KB, 590x217)

113 KB PNG

>>100171184
I grabbed the L3 8B 64k context model and tried it with a close to 16k token chat I have.
It wasn't coherent, so either the claimed 64k context is bs or there's might be something wrong with the q8 gguf.
I want to rule out user error at least.
Has anyone else tried it yet?

Anonymous
04/25/24(Thu)09:23:15 No.100173829

Anonymous 04/25/24(Thu)09:23:15 No.100173829

>koboldcpp rocm updated again
we still hanging in there AMD bros

Anonymous
04/25/24(Thu)09:25:25 No.100173858

Anonymous 04/25/24(Thu)09:25:25 No.100173858

>>100173826
Guy probably just edited the config and called it a day.

Anonymous
04/25/24(Thu)09:26:07 No.100173869

Anonymous 04/25/24(Thu)09:26:07 No.100173869

File: ITS HAPPENING.gif (826 KB, 320x213)

826 KB GIF

>>100173829
>ITS REAL
aAAaaaaAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

>There was some big changes upstream, that's why it's taken a while to update kcpp-rocm, trying to get it to work.

YELLOWROSE I LOVE YOU AND WHAT YOU DO FOR AMDBROS

Anonymous
04/25/24(Thu)09:30:30 No.100173914

Anonymous 04/25/24(Thu)09:30:30 No.100173914

>>100173858
Yeah probably.
I don't know if extending the context is actually possible.
I'd assume you'd have to retrain the model from the ground up.

Anonymous
04/25/24(Thu)09:32:47 No.100173938

Anonymous 04/25/24(Thu)09:32:47 No.100173938

>>100173914
Nah, you can do large context tuning
It does need to be a full finetune though, I doubt a LoRA could handle it

Anonymous
04/25/24(Thu)09:40:56 No.100174018

Anonymous 04/25/24(Thu)09:40:56 No.100174018

Tf is the snowflake arctic thing? How much ram?

Anonymous
04/25/24(Thu)09:41:53 No.100174028

Anonymous 04/25/24(Thu)09:41:53 No.100174028

File: 19215413059.png (8 KB, 581x104)

8 KB PNG

>>100173869
Nvm its busted with models and settings that work on 1.62 :[

Anonymous
04/25/24(Thu)09:46:20 No.100174090

Anonymous 04/25/24(Thu)09:46:20 No.100174090

File: 1969783154.jpg (52 KB, 527x177)

52 KB JPG

>>100174028
Ummmmm YellowRose???

Anonymous
04/25/24(Thu)09:47:02 No.100174096

Anonymous 04/25/24(Thu)09:47:02 No.100174096

>>100173829
Why don't you just use linux fucking retard

Anonymous
04/25/24(Thu)09:47:55 No.100174110

Anonymous 04/25/24(Thu)09:47:55 No.100174110

File: 692BFCF6-3E35-4369-BEA3-C(...).jpg (278 KB, 750x1151)

278 KB JPG

> keeping the dream alive

Anonymous
04/25/24(Thu)09:49:07 No.100174119

Anonymous 04/25/24(Thu)09:49:07 No.100174119

>>100173752
localllama is extremely clueless so that doesn't mean anything
most of them probably upvoted it because le funny name without trying it

Anonymous
04/25/24(Thu)10:07:24 No.100174329

Anonymous 04/25/24(Thu)10:07:24 No.100174329

>>100173938
Not that anon, but I could swear that SuperHOT LoRA was a thing.
I guess I'm getting it mixed up with SuperCOT.

Anonymous
04/25/24(Thu)10:14:09 No.100174408

Anonymous 04/25/24(Thu)10:14:09 No.100174408

>>100174329
superhot lora was a thing and while it mostly worked a full ft is obviously better

Anonymous
04/25/24(Thu)10:19:18 No.100174462

Anonymous 04/25/24(Thu)10:19:18 No.100174462

>>100173514
>https://huggingface.co/chargoddard/llama3-42b-v0

So this has 76 mmlu which is really interesting. Has anyone here tested it? How does it compared to 70B/8B? Is it improved over 8B or is it retarded?

Anonymous
04/25/24(Thu)10:20:03 No.100174470

Anonymous 04/25/24(Thu)10:20:03 No.100174470

>>100174462
Everyone who tested it called it irreparably retarded.

Anonymous
04/25/24(Thu)10:22:13 No.100174498

Anonymous 04/25/24(Thu)10:22:13 No.100174498

>>100174470
It is not retarded. It is schizophrenic. It has a beautiful mind but can't communicate its thoughts very well. Honestly everyone ITT should love it because it is so relatable.

Anonymous
04/25/24(Thu)10:24:35 No.100174531

Anonymous 04/25/24(Thu)10:24:35 No.100174531

File: quant.png (51 KB, 969x507)

51 KB PNG

currently making a few exl2 quants for Moistral v3. 8bpw and 5.5bpw for 8gb vramlets

Anonymous
04/25/24(Thu)10:25:59 No.100174549

Anonymous 04/25/24(Thu)10:25:59 No.100174549

I have a macbook air m2 with 8 gb ram laying around because of work. Is there any worthwhile llm I could run on it?

Anonymous
04/25/24(Thu)10:27:25 No.100174567

Anonymous 04/25/24(Thu)10:27:25 No.100174567

>>100174470
Interesting, they are working on doing the same to instruct model, let's see if the results change. Time to try frankenmerges for now-
https://huggingface.co/raincandy-u/Llama-3-Aplite-Instruct-4x8B-MoE

Anonymous
04/25/24(Thu)10:27:27 No.100174569

Anonymous 04/25/24(Thu)10:27:27 No.100174569

>>100174549
Quanted mistral 7B or llama 3 8b, I guess.

Anonymous
04/25/24(Thu)10:28:57 No.100174585

Anonymous 04/25/24(Thu)10:28:57 No.100174585

>>100174549
hahahahaha, no

Anonymous
04/25/24(Thu)10:30:14 No.100174602

Anonymous 04/25/24(Thu)10:30:14 No.100174602

>>100174549
that 8gb needs to be shared with the rest of the OS, so you're looking at like 4-6 for the model
you could run quanted llama 3 8b at best

Anonymous
04/25/24(Thu)10:31:35 No.100174617

Anonymous 04/25/24(Thu)10:31:35 No.100174617

>>100173717
I think you will want to reserve how ever much space the SD model takes up, and then only load the LLM layers that will fit with your desired context.

Anonymous
04/25/24(Thu)10:32:59 No.100174636

Anonymous 04/25/24(Thu)10:32:59 No.100174636

>>100174549
https://huggingface.co/apple/OpenELM

Anonymous
04/25/24(Thu)10:34:24 No.100174662

Anonymous 04/25/24(Thu)10:34:24 No.100174662

>>100174567
>they
It is a guy.

Anonymous
04/25/24(Thu)10:36:13 No.100174683

Anonymous 04/25/24(Thu)10:36:13 No.100174683

>>100173914
>I don't know if extending the context is actually possible.
feels like 2023 all over again

Anonymous
04/25/24(Thu)10:36:19 No.100174684

Anonymous 04/25/24(Thu)10:36:19 No.100174684

>>100174110
Now explain what it means in non-wikipedia faggotry terms

Anonymous
04/25/24(Thu)10:37:02 No.100174691

Anonymous 04/25/24(Thu)10:37:02 No.100174691

>>100174662
I don't see their pronouns listed anywhere.

Anonymous
04/25/24(Thu)10:37:15 No.100174693

Anonymous 04/25/24(Thu)10:37:15 No.100174693

any decent phi3 finetunes yet?

Anonymous
04/25/24(Thu)10:37:41 No.100174696

Anonymous 04/25/24(Thu)10:37:41 No.100174696

>>100174110
So why hasn't anyone done llama.cpp bitnet yet? Is it because everyone is lazy or because the existing bitnet models use row-wise scaling factors which llama.cpp doesn't support at all?

Anonymous
04/25/24(Thu)10:41:01 No.100174732

Anonymous 04/25/24(Thu)10:41:01 No.100174732

>>100174662
>>100174691
What if it's a woman? You know, not a troon, but a real vagina.

Anonymous
04/25/24(Thu)10:42:05 No.100174741

Anonymous 04/25/24(Thu)10:42:05 No.100174741

>>100174110
Just remind the companies that they can release their bitnet models without the fp16 weights likely making it a huge ordeal to finetune them.

Anonymous
04/25/24(Thu)10:42:15 No.100174742

Anonymous 04/25/24(Thu)10:42:15 No.100174742

>>100174732
It is a guy.
>>100174691
It is a guy.

Anonymous
04/25/24(Thu)10:42:39 No.100174747

Anonymous 04/25/24(Thu)10:42:39 No.100174747

File: 197943296573298.png (121 KB, 463x576)

121 KB PNG

>>100174567
>https://huggingface.co/raincandy-u/Llama-3-Aplite-Instruct-4x8B-MoE
>SOMEBODY ACTUALLY MADE A 4x8B
>Q6 is only 20gb
WE ARE SO FVCKING BACK
WE HAVE NEVER BEEN THIS BACK BEFORE
I DONT EVEN CARE IF ITS SLOP

Anonymous
04/25/24(Thu)10:43:17 No.100174757

Anonymous 04/25/24(Thu)10:43:17 No.100174757

>>100174747
Why are zoomers like this?

Anonymous
04/25/24(Thu)10:43:35 No.100174758

Anonymous 04/25/24(Thu)10:43:35 No.100174758

>>100174747
Well post your logs from it

Anonymous
04/25/24(Thu)10:44:06 No.100174766

Anonymous 04/25/24(Thu)10:44:06 No.100174766

>>100174747
When we get tunes like NousHermes, wizardLM, etc... the frankenmerges will be really good.

Anonymous
04/25/24(Thu)10:45:47 No.100174780

Anonymous 04/25/24(Thu)10:45:47 No.100174780

>>100174758
Download speeds are bad in america for no reason

Anonymous
04/25/24(Thu)10:46:51 No.100174797

Anonymous 04/25/24(Thu)10:46:51 No.100174797

C-R+ user here. I tried Llama 3 70B instruct and it was slop. I tried Llama 3 70B base and it was schizophrenic.
What's the deal with the people saying it's good? Is there a magic prompt? You can't even preload context because it only has 8k max.

Anonymous
04/25/24(Thu)10:49:13 No.100174820

Anonymous 04/25/24(Thu)10:49:13 No.100174820

File: porky.png (325 KB, 576x566)

325 KB PNG

>>100174780
>for no reason

Anonymous
04/25/24(Thu)10:51:41 No.100174848

Anonymous 04/25/24(Thu)10:51:41 No.100174848

>>100174780
>for no reason
Oh, there are reasons.
http://irregulators.org/bookofbrokenpromises/
The numbers on that one are slightly inflated IIRC, but the general idea is correct.

Anonymous
04/25/24(Thu)10:53:16 No.100174869

Anonymous 04/25/24(Thu)10:53:16 No.100174869

>>100174820
>>100174848
I already know the reason you redditors who doesnt its not 2014 anymore

Anonymous
04/25/24(Thu)10:53:58 No.100174873

Anonymous 04/25/24(Thu)10:53:58 No.100174873

>>100174797
Llama 3 seems to be sensitive to formatting and templates, if you are using the wrong ones you get schizo, also make sure to pull the latest frontends as they all had bugs early on.

Anonymous
04/25/24(Thu)10:55:40 No.100174889

Anonymous 04/25/24(Thu)10:55:40 No.100174889

>>100174747
>moe frankenmerges
Isn't this basically like merging slop but you don't do the final step (where you calculate the average of all changes and add it into base model) you instead leave all those slop tunes out there so they eat up all the ram and then ask the client to average it out? So you just 4x the required ram for absolutely no reason except retards will buy it?

CPuMAXx/VI !CPuMAXx/VI
04/25/24(Thu)10:57:26 No.100174912

CPuMAXx/VI !CPuMAXx/VI 04/25/24(Thu)10:57:26 No.100174912

File: sfa-q8-test-1.png (41 KB, 835x976)

41 KB PNG

>>100174018
Its a 476.27B mixture of experts model with 128 experts (2 active).
The main download is 1TB. Q8 is 472GB
It's claiming 4096 context, which is disappointing if true, to say the least.
I've managed to quant it down to Q8 with --skip-unknown and am trying to run it after making a few llama.cpp code tweaks to go beyond 60 experts. It has reserved 486GB of RAM to load at that size.
It's currently outputting tokens for me, but there's some kind of fundamental problem because they appear to be half nonsense.
>弘 Hello saf Season Secretary opportun duties season winter Flora</s></s></s>

Anonymous
04/25/24(Thu)10:58:36 No.100174923

Anonymous 04/25/24(Thu)10:58:36 No.100174923

>>100174889
Nobody cares, dude.

Anonymous
04/25/24(Thu)11:00:36 No.100174957

Anonymous 04/25/24(Thu)11:00:36 No.100174957

>>100174912
It's not based on llama. Are you sure llama.cpp has added support for it? It's only been a day, I'm surprised it converted and ran without errors.

Anonymous
04/25/24(Thu)11:00:38 No.100174958

Anonymous 04/25/24(Thu)11:00:38 No.100174958

File: Screenshot 2024-04-25 at (...).png (50 KB, 783x177)

50 KB PNG

>>100174912
I have 512GB of ram which could fit Q8. Currently downloading to quant too. Why the --skip-unknown?

Anonymous
04/25/24(Thu)11:00:39 No.100174960

Anonymous 04/25/24(Thu)11:00:39 No.100174960

>CAPTIANS LOG 425
llama3 has been out for several weeks and mythomax3 still hasn't been made. neither have any good finetunes like a holodeck or nous hermes. no news on a possible bitnet 70b either. all the hype gone. all the locals have turned to sonnet and opus. half way through 24 and not a single decent 40b in sight for regular 24gb vram folk that only have a single card.

Anonymous
04/25/24(Thu)11:01:48 No.100174973

Anonymous 04/25/24(Thu)11:01:48 No.100174973

File: file.png (44 KB, 611x334)

44 KB PNG

Do you think we'll ever get that 70B model? And how neutered will it be?

CPuMAXx/VI !CPuMAXx/VI
04/25/24(Thu)11:02:15 No.100174976

CPuMAXx/VI !CPuMAXx/VI 04/25/24(Thu)11:02:15 No.100174976

>>100174957
>Are you sure llama.cpp has added support for it
I'm almost certain they haven't.
I'm also shocked it works at all

Anonymous
04/25/24(Thu)11:02:33 No.100174980

Anonymous 04/25/24(Thu)11:02:33 No.100174980

>>100174973
It's over. Microsoft put the axe to them.

Anonymous
04/25/24(Thu)11:02:56 No.100174984

Anonymous 04/25/24(Thu)11:02:56 No.100174984

>>100174973
either we get a new 70b trained on llama 3 or nothing

Anonymous
04/25/24(Thu)11:03:07 No.100174986

Anonymous 04/25/24(Thu)11:03:07 No.100174986

>>100174889
If some models are better than others at a particular task the output should be weighted toward the better ones (useful for including formatting code, etc. in responses). The idea is that you trade VRAM for more parameters without increasing compute requirements.

It's a terrible trade-off for local inference, though.

Anonymous
04/25/24(Thu)11:03:25 No.100174988

Anonymous 04/25/24(Thu)11:03:25 No.100174988

>>100174960
>only have a single card.
If you didnt come into this hobby with at least 1 good card and didnt get another one its basically joever

Anonymous
04/25/24(Thu)11:04:04 No.100174995

Anonymous 04/25/24(Thu)11:04:04 No.100174995

>>100174973
We'll get it after llama2-34b finishes red teaming

Anonymous
04/25/24(Thu)11:04:18 No.100174998

Anonymous 04/25/24(Thu)11:04:18 No.100174998

>>100174986
>If some models are better than others at a particular task the output should be weighted toward the better ones
But that requires training a gate layer that decides where the input goes. Is it this kind of frankenmerge?

Anonymous
04/25/24(Thu)11:05:14 No.100175013

Anonymous 04/25/24(Thu)11:05:14 No.100175013

>>100174766
OpenHermes/NousHermes is a meme I'll never understand. It contains some good datasets (OpenOrca, Capybara, Airoboros-the good part, Wizard70k) and shit datasets (CamelAi slop, glaive code, alpaca-gpt4, Airoboros-the shit part). The overall result is a mess that can't follow instructions well, is overly verbose and ignores system prompts, yet people praise it like it's the best tune ever.

Anonymous
04/25/24(Thu)11:06:13 No.100175023

Anonymous 04/25/24(Thu)11:06:13 No.100175023

>>100174976
https://github.com/ggerganov/llama.cpp/issues/6877

Anonymous
04/25/24(Thu)11:07:05 No.100175032

Anonymous 04/25/24(Thu)11:07:05 No.100175032

>>100174988
There is zero reason to buy 2 cards just to run llms. 2 cards do nothing for gaming, for ai art or music. The only reason for a second card is so you can run unoptimized language models that are inferior to free cloud based ones.
No thanks anon. I'm happy with my 4090, when something finally fits on that we'll be cool. I'm not going to be one of those retards trying to hang 6 cards in open air so I can run a 8x22b still dumber than sonnet which is free.

Anonymous
04/25/24(Thu)11:07:31 No.100175039

Anonymous 04/25/24(Thu)11:07:31 No.100175039

>>100174998
I don't know enough about MergeKit internals to know what it uses for the base router here. I was assuming a fine-tuned MoE, but you're right that this probably isn't fine-tuned.

Anonymous
04/25/24(Thu)11:08:53 No.100175061

Anonymous 04/25/24(Thu)11:08:53 No.100175061

>>100175032
thats crazy man, but Im here to run my AI locally.

Anonymous
04/25/24(Thu)11:10:10 No.100175070

Anonymous 04/25/24(Thu)11:10:10 No.100175070

>>100175032
based, if nvidia still had support for SLI, that would be great

Anonymous
04/25/24(Thu)11:10:44 No.100175077

Anonymous 04/25/24(Thu)11:10:44 No.100175077

>>100174973
I'm considering buying another 3090 for the fuckhuge version.

Anonymous
04/25/24(Thu)11:11:08 No.100175082

Anonymous 04/25/24(Thu)11:11:08 No.100175082

>>100175032
>buying 2 cards is unthinkable for him
https://www.reddit.com/r/LocalLLaMA/comments/1c9l181/10x3090_rig_romed82tepyc_7502p_finally_complete/
lol lmao

Anonymous
04/25/24(Thu)11:13:13 No.100175101

Anonymous 04/25/24(Thu)11:13:13 No.100175101

File: file.png (56 KB, 801x414)

56 KB PNG

>>100175032
that's some strong copium there buddy

Anonymous
04/25/24(Thu)11:15:43 No.100175127

Anonymous 04/25/24(Thu)11:15:43 No.100175127

>>100175082
>Q: How is the performance? A: To continue the spirit of transparency, I'll load one of the slower/VRAM hogging models. Llama-3 70B in full precision. It takes up about 155GB of VRAM which I've spread across all ten cards intentionally. With this, I'm getting between 3-4 tokens per second depending on how high of context. A little over 4.5 t/s for small context, about 3/s for 15k context.
>he spent 13k for this

Anonymous
04/25/24(Thu)11:16:49 No.100175143

Anonymous 04/25/24(Thu)11:16:49 No.100175143

What if when 405B releases, it finally beats GPT4.
But then OAI releases GPT4V and makes GPT4 free.
Would the $10k 10 card 100GB VRAM setups have been worth it?

Anonymous
04/25/24(Thu)11:19:08 No.100175162

Anonymous 04/25/24(Thu)11:19:08 No.100175162

>>100174096
What's the current AMD Linux meta? I was trying to get exllama running last year, and after getting my rocm installation set up and my torch environment finagled correctly, it ran like absolute shit for larger models/ context because flash attention 2 still has no rocm support for consumer hardware. I've checked back every now again to see if there are any updates, but I've mostly just been using koboldcpp-rocm as well because it's been the easiest to dial in the right tradeoff between speed and model quality with my graphics card and cpu offloading.

Anonymous
04/25/24(Thu)11:19:30 No.100175169

Anonymous 04/25/24(Thu)11:19:30 No.100175169

>>100175143
I'm gonna share a secret with you anon, gpt4 is already free if you pirate it. Logless, trackerless, and it works on your fucking phone. The only cope here are the retards who fell for the vram bait.

Anonymous
04/25/24(Thu)11:19:42 No.100175171

Anonymous 04/25/24(Thu)11:19:42 No.100175171

>>100175127
>>he spent 13k for this
he admitted in the comments to being an old boomer, got to spend the grandkids money so they don't inherit anything before kicking the bucket don't ya know.

Anonymous
04/25/24(Thu)11:20:34 No.100175184

Anonymous 04/25/24(Thu)11:20:34 No.100175184

>>100174096
because im not a tranny

Anonymous
04/25/24(Thu)11:20:47 No.100175187

Anonymous 04/25/24(Thu)11:20:47 No.100175187

>>100175143
I don't think people with 4x4090 care if they beat SOTA models, they care about freedom, custom finetunes can beat SOTA at specific tasks

Anonymous
04/25/24(Thu)11:21:51 No.100175198

Anonymous 04/25/24(Thu)11:21:51 No.100175198

>>100175187
Where are those custom finetunes at anon?

Anonymous
04/25/24(Thu)11:21:54 No.100175199

Anonymous 04/25/24(Thu)11:21:54 No.100175199

>>100174960
sonnet and opus have too many claudisms and put the chrachter card above the context so you can be talking to someone for three hours and make no progress.

Anonymous
04/25/24(Thu)11:23:10 No.100175209

Anonymous 04/25/24(Thu)11:23:10 No.100175209

>>100174797
CR+ doesn't follow instructions well, at least with the quants that fits in 48GB of VRAM. And the 70B-instruct is pretty good at following instructions and learning from the context. For that reason I prefer to use L3, CR+ is not very usable in that state.
You need to modify the default 'assistant' role of the chat template to remove the censorship.
I think Rope scale/alpha works well to scale the context.
If you're a /aids/-tier promptlet, you might want to stick to CR+.

Anonymous
04/25/24(Thu)11:23:13 No.100175210

Anonymous 04/25/24(Thu)11:23:13 No.100175210

File: file.png (99 KB, 512x288)

99 KB PNG

>>100175143
All micunny rp is free until FBI asks to open up.

Anonymous
04/25/24(Thu)11:23:16 No.100175211

Anonymous 04/25/24(Thu)11:23:16 No.100175211

>>100175198
they don't exist yet, but you got the message

Anonymous
04/25/24(Thu)11:24:03 No.100175220

Anonymous 04/25/24(Thu)11:24:03 No.100175220

>>100175199
>put the chrachter card above the context
you can fix this with author notes anon or a dozen other ways like appending it to the jailbreak

Anonymous
04/25/24(Thu)11:24:05 No.100175221

Anonymous 04/25/24(Thu)11:24:05 No.100175221

Llama-3-8B-Instruct-32k is pretty good at staying in character, but it isn't "intelligent" enough to do that and output a UI at the same time.
Not bad.

Anonymous
04/25/24(Thu)11:25:03 No.100175237

Anonymous 04/25/24(Thu)11:25:03 No.100175237

>>100175171
Spending that much and not even knowing how to use it is painful to watch.

Anonymous
04/25/24(Thu)11:25:04 No.100175238

Anonymous 04/25/24(Thu)11:25:04 No.100175238

>>100175211
>He bought 4x4090s for custom finetunes that will come in two more weeks

Anonymous
04/25/24(Thu)11:27:25 No.100175265

Anonymous 04/25/24(Thu)11:27:25 No.100175265

>>100175032
they want to gen loli porn
it's really as simple as that

Anonymous
04/25/24(Thu)11:28:43 No.100175275

Anonymous 04/25/24(Thu)11:28:43 No.100175275

>>100175162
Flash attention doesn't matter that much for our LLM usage, it mostly matter if using batching. Hell, flash attention is not required or recommended by default with exllama. If you had trouble running model, that was not the cause, for LLM, AMD have better speed per dollar.
But anyway, I'm also disappointed with how slow they are working on flash attention, for image gen it significantly reduce vram usage, you can get it working with some hacks on RDNA3 but official support is still supposedly in the work.
I just use llama.cpp since models have gone so big but now that we are back with a good 8b, might go back to run exclusively on GPU with exllama.

Anonymous
04/25/24(Thu)11:28:44 No.100175277

Anonymous 04/25/24(Thu)11:28:44 No.100175277

>>100175220
I'm gonna tell it to do something and pray? I'm sure that'll work and it won't ignore me 5 messages latter.

Anonymous
04/25/24(Thu)11:28:50 No.100175281

Anonymous 04/25/24(Thu)11:28:50 No.100175281

>>100175143
Still not sending you my logs, Sammy boy.
>>100173826
>>100173858
I tried it after quanting it down to 8bpw in exl2. Works fine up to around 22k context or so with RoPE alpha @ 5, then shits the bed.

Anonymous
04/25/24(Thu)11:30:17 No.100175300

Anonymous 04/25/24(Thu)11:30:17 No.100175300

>>100175169
I really, really hope this is not one of those snarky copebrag posts where you say "it's possible, i just won't tell you how!" with a smugsoyjak face on you. Or it could be plain bait.
With that aside, how do you "pirate" a model with trillions of parameters and run it on your phone? Please enlighten us, and spare us the usual "mmhmh not le telling you bro" reddit shit. inb4 asking a discord troon for a proxy key or scrapping git repos

Anonymous
04/25/24(Thu)11:30:19 No.100175301

Anonymous 04/25/24(Thu)11:30:19 No.100175301

>>100174960
>all the locals have turned to sonnet and opus
Not locals. Those are the tourists and fake ass shitposters. Don't go full retard.

Anonymous
04/25/24(Thu)11:31:10 No.100175318

Anonymous 04/25/24(Thu)11:31:10 No.100175318

>>100175238
I wouldn't buy 4x4090 for that, unless I begin to work with LLMs, but 2 used 3090 to run 70b Q4 is not a bad idea

Anonymous
04/25/24(Thu)11:33:32 No.100175347

Anonymous 04/25/24(Thu)11:33:32 No.100175347

>>100175301
I account for some of the claude posts because I wanted to see how green the grass was on the other side.

It's purple. Which isn't bad, just different.

There are parts that I want to bring back to llm that claude did better, but it's not worth switching over. It's worth dying my own grass for though.

Anonymous
04/25/24(Thu)11:33:50 No.100175356

Anonymous 04/25/24(Thu)11:33:50 No.100175356

>>100175277
sounds like a serious skill issue to me anon I literally just spoonfed you how to fix your problem, if anything claude has better recall than any local model it can reach 200k context, what can our shitty models do? barely 32k and they forget shit in the middle so it's basically 8k in front and 8k in the back. what you're talking about is a non-problem for anyone competent.

Anonymous
04/25/24(Thu)11:34:20 No.100175366

Anonymous 04/25/24(Thu)11:34:20 No.100175366

any cr+ tunes?

Anonymous
04/25/24(Thu)11:36:02 No.100175388

Anonymous 04/25/24(Thu)11:36:02 No.100175388

>>100175356
My problem isn't with the memory. My problem is with claude having a mind of it's own. You can tell it things, you can change the jailbreak, but 5 messages latter claude goes "No, I like these tokens better. Cry about it, what you gonna do? Up my repetition penalty?"

Anonymous
04/25/24(Thu)11:36:09 No.100175390

Anonymous 04/25/24(Thu)11:36:09 No.100175390

>>100175347
It's ok to demo them for the purposes of checking out the enemy. That doesn't mean you've "turned" to them.

Anonymous
04/25/24(Thu)11:36:45 No.100175400

Anonymous 04/25/24(Thu)11:36:45 No.100175400

>>100175169
>Logless, trackerless
Sounds like some scraped key and server that is for a big enough company where they run it on their own server instead of through open ai and obviously don't expect someone to have hacked them. But logless and trackerless from open ai that is by definition impossible.

Anonymous
04/25/24(Thu)11:37:54 No.100175420

Anonymous 04/25/24(Thu)11:37:54 No.100175420

>>100175127
>he spent 13k for a talking AI text computer comparable with SOTA that corpos spend billions on

Anonymous
04/25/24(Thu)11:38:11 No.100175423

Anonymous 04/25/24(Thu)11:38:11 No.100175423

>>100175300
you answered it yourself, you pirate the key, if you're too stupid to do this then you connect to a proxyhost of someone else who has, it's not copebrag but I'm not going to spoonfeed you shit you can take time and learn yourself. you know what happens when retards do that? other idiots come along, brag, flaunt, then it gets fixed and I have to figure out a new way to do this shit. no thanks.

Anonymous
04/25/24(Thu)11:38:34 No.100175424

Anonymous 04/25/24(Thu)11:38:34 No.100175424

>>100175400
skill issue confirmed

Anonymous
04/25/24(Thu)11:42:04 No.100175472

Anonymous 04/25/24(Thu)11:42:04 No.100175472

>>100175423
>aicg not sending their finest

Anonymous
04/25/24(Thu)11:42:33 No.100175474

Anonymous 04/25/24(Thu)11:42:33 No.100175474

>>100175423
>you know what happens when retards do that? other idiots come along, brag
Well why did you say that out loud then, if you don't want more people to ruin it for you? Your best course of action is to shut the fuck up about i and gatekeep it yourself.
Maybe you do want to brag after all.

Anonymous
04/25/24(Thu)11:44:51 No.100175502

Anonymous 04/25/24(Thu)11:44:51 No.100175502

>>100175474
>idiots coping they bought 4 fucking 4090s when shits free
why didn't you just steal them anon?

Anonymous
04/25/24(Thu)11:47:03 No.100175535

Anonymous 04/25/24(Thu)11:47:03 No.100175535

I already use GPT-4 for my job. I'm still running local when not on the job. It was never a cost issue. I don't care if you're Sam, /aicg/ or whatever, you're a smelly ass shitposter, fuck off.

Anonymous
04/25/24(Thu)11:47:55 No.100175544

Anonymous 04/25/24(Thu)11:47:55 No.100175544

Has anyone else been playing around with using something akin to a 'Thoughts'/'Plan'/... headers for the Respond? I think that this could generally result in repetition if they are not unique. So I was thinking of filtering it out of the context afterwards to avoid repetition.
First tests seem promising to me.

Anonymous
04/25/24(Thu)11:50:03 No.100175562

Anonymous 04/25/24(Thu)11:50:03 No.100175562

>>100175101
do you find 275w limit to be the sweet spot? I didn't find any degradation in speed at 250w

Anonymous
04/25/24(Thu)11:50:56 No.100175575

Anonymous 04/25/24(Thu)11:50:56 No.100175575

File: 00041-404906826_1.png (1.79 MB, 1456x1024)

1.79 MB PNG

>>100175423
>>100175502
mfw dual A6000s keeping me toasty quanting another experiment while yet nother /aicg/ streetrat seethes that he has to scrape and scrimp pirated keys for a fleeting taste of the good stuff

Anonymous
04/25/24(Thu)11:54:58 No.100175624

Anonymous 04/25/24(Thu)11:54:58 No.100175624

>>100175535
wat? It was never a cost issue? Then what is your reason? Are you going to say something stupid like privacy because you're too stupid to obscure your data?
>I'm choosing to eat a shitburger at home BECAUSE REASONS
This is you anon.

Anonymous
04/25/24(Thu)11:56:16 No.100175644

Anonymous 04/25/24(Thu)11:56:16 No.100175644

>>100175624
You really thought we were spending thousands of dollars on GPUs because $20/month cost too much?

Anonymous
04/25/24(Thu)11:56:25 No.100175645

Anonymous 04/25/24(Thu)11:56:25 No.100175645

>>100175624
Learn to cook brownoid

Anonymous
04/25/24(Thu)11:57:50 No.100175663

Anonymous 04/25/24(Thu)11:57:50 No.100175663

>>100175575
Mhmm I'm really seething here I didn't buy a dozen extra cards. I'm so mad you have no idea. Boy if only I had 96gb vram so I could run at a decent quant. Damn I'm mad. FUCK!

Anonymous
04/25/24(Thu)11:59:26 No.100175683

Anonymous 04/25/24(Thu)11:59:26 No.100175683

>>100175663
t. net worth: $23,404.68

Anonymous
04/25/24(Thu)11:59:38 No.100175687

Anonymous 04/25/24(Thu)11:59:38 No.100175687

>>100175644
Honestly I thought it was because you're just a fucking retard but maybe I'm wrong. That's why I asked you what the reason was. I still think it's because you're a fucking retard but we'll see if you reply with a good answer.

Anonymous
04/25/24(Thu)12:01:28 No.100175708

Anonymous 04/25/24(Thu)12:01:28 No.100175708

Fucking cattle, please eat the bugs and own nothing

Anonymous
04/25/24(Thu)12:04:04 No.100175740

Anonymous 04/25/24(Thu)12:04:04 No.100175740

Dumb question: How to use LCUDA on Windows koboldai? Or is it impossible? Is it better than ROCm?

Anonymous
04/25/24(Thu)12:06:44 No.100175782

Anonymous 04/25/24(Thu)12:06:44 No.100175782

Is setting rope for llama3 as easy as it was for l2? Does the quality drop significantly or is it completely fine to do it

Anonymous
04/25/24(Thu)12:07:58 No.100175796

Anonymous 04/25/24(Thu)12:07:58 No.100175796

>>100175683
Do you really think posting your bank account value will win your argument?

Anonymous
04/25/24(Thu)12:09:42 No.100175818

Anonymous 04/25/24(Thu)12:09:42 No.100175818

File: 00003-1532105500_1.png (1.2 MB, 1024x1024)

1.2 MB PNG

>>100175663
>tfw digital streetshitter anon tries very very hard to ironypost

Anonymous
04/25/24(Thu)12:11:11 No.100175836

Anonymous 04/25/24(Thu)12:11:11 No.100175836

>>100175032
Based

Anonymous
04/25/24(Thu)12:12:35 No.100175853

Anonymous 04/25/24(Thu)12:12:35 No.100175853

>>100174096
i do. i use dualboot. and i only use SD on linux.
but i just usually use windows due to a few key work related programs and kcpp is just easy to use.

Anonymous
04/25/24(Thu)12:15:44 No.100175898

Anonymous 04/25/24(Thu)12:15:44 No.100175898

>>100175853
Just use VMs with PCI(e) passthrough.

Anonymous
04/25/24(Thu)12:15:59 No.100175900

Anonymous 04/25/24(Thu)12:15:59 No.100175900

>>100175644
claude isn't 20$ a month, it's 20$ a month to use on their website.

Anonymous
04/25/24(Thu)12:23:26 No.100175972

Anonymous 04/25/24(Thu)12:23:26 No.100175972

>>100175900
i wouldn't be confortable sending my scenarios to claude lmao.

Anonymous
04/25/24(Thu)12:24:33 No.100175988

Anonymous 04/25/24(Thu)12:24:33 No.100175988

>https://huggingface.co/BXBX/Moistral-11B-v3-8.0bpw-h8-exl2
Done quanting Moistral v3 8bpw exl2, fits on 12GB VRAM with full context
5bpw for 8GB vramlets coming soon

Anonymous
04/25/24(Thu)12:26:27 No.100176005

Anonymous 04/25/24(Thu)12:26:27 No.100176005

File: WooMiku.png (1.75 MB, 800x1248)

1.75 MB PNG

>>100175818
Poverty is noble

Anonymous
04/25/24(Thu)12:35:36 No.100176098

Anonymous 04/25/24(Thu)12:35:36 No.100176098

>>100174958
is that work or your own hw? impressive, very nice.

Anonymous
04/25/24(Thu)12:35:41 No.100176099

Anonymous 04/25/24(Thu)12:35:41 No.100176099

>>100175687
nta but I run local models because I enjoy running models locally at home. I get better satisfaction and more enjoyment knowing that it's all on my machine. I wouldn't expect you to understand or care. Even if Claude opus was suddenly free for everyone I would still choose an inferior local model. We live in a different world: I started at the bottom, and my gens have only improved over time. Gpt at least has gotten measurably worse over time. How many times has /aicg/ gone through proxygeddon? compare that to the zero (0) times I have been denied access to my local compute. I enjoy the technology, I enjoy seeing the improvements in models, and I truly do not give a fuck even if corpo models were free 1 billion context and came with a synchronized vibrating onahole.

Anonymous
04/25/24(Thu)12:38:42 No.100176130

Anonymous 04/25/24(Thu)12:38:42 No.100176130

>>100175687
NTA but you speak like a autist retard

Anonymous
04/25/24(Thu)12:40:19 No.100176153

Anonymous 04/25/24(Thu)12:40:19 No.100176153

File: 1709079209.png (877 KB, 1290x606)

877 KB PNG

can i get the latest redpill on using llms for coding assistance? im talking:
- explaining code blocks
- searching for bugs
- creating patches from descriptions of the desired effects

also, is it possible to train an llm on a given codebase to make it more useful?

t. lazy retard

Anonymous
04/25/24(Thu)12:41:46 No.100176172

Anonymous 04/25/24(Thu)12:41:46 No.100176172

>>100176153
The latest is still the oldest. LLMs are only useful for shitting out jeet code and will not be of any use to a human.

nta
04/25/24(Thu)12:43:32 No.100176194

nta 04/25/24(Thu)12:43:32 No.100176194

>>100176099
>nta but I run local models because I enjoy running models locally at home. I get better satisfaction and more enjoyment knowing that it's all on my machine.
Autism
>Gpt at least has gotten measurably worse over time.
It is still miles better than local models though
>How many times has /aicg/ gone through proxygeddon? compare that to the zero (0) times I have been denied access to my local compute.
Proxygeddon only happens to poorfags.
>I enjoy the technology, I enjoy seeing the improvements in models, and I truly do not give a fuck even if corpo models were free 1 billion context and came with a synchronized vibrating onahole.
Again, autism.

Anonymous
04/25/24(Thu)12:43:45 No.100176195

Anonymous 04/25/24(Thu)12:43:45 No.100176195

>>100175740
>Dumb question
Yes.
>How to use LCUDA
What is that?
>on Windows koboldai
Koboldai is the pytorch-based one.
>Is it better than ROCm
ROCm is for AMD GPUs.

If you have Linux you can use koboldai with a cuda pytorch for your nvidia card, or you can use koboldai with rocm pytorch for your supported amd card. If you have Windows pytorch+amd=no, and if you have nvidia using WSL and running cuda pytorch in there is recommended.

Anonymous
04/25/24(Thu)12:43:50 No.100176199

Anonymous 04/25/24(Thu)12:43:50 No.100176199

>>100173514
wait a second
OP added petra to the miku bread.. you're kidding me

Anonymous
04/25/24(Thu)12:44:00 No.100176201

Anonymous 04/25/24(Thu)12:44:00 No.100176201

>>100176172
ok but what about reading code? explaining shit, just helping me comb through code. like a million jeets who grep the code on my behalf, is that possible?
would finetuning help here?

Anonymous
04/25/24(Thu)12:46:33 No.100176244

Anonymous 04/25/24(Thu)12:46:33 No.100176244

>>100176153
It's honestly hard to find a good use for LLM on coding tasks. I tried multiple times using LLM be it local, GPT-4, opus on any subject I was knowledgeable about and they were just a waste of time.
The only time were they are useful is looking for basic shit in a language I'm not familiar, it's faster than using a search engine. But for that, any model is good enough, I currently just use llama 3 8b for that. I use some neovim plugin but 80% of the times I use it for editing text like mail or commit messages instead of code.

Anonymous
04/25/24(Thu)12:46:40 No.100176246

Anonymous 04/25/24(Thu)12:46:40 No.100176246

So has anyone built a chatbot with hierarchical annotated memory yet? Not agent stuff, just simply what we've been using already, except with a better memory system than simple vector DB RAG.

Anonymous
04/25/24(Thu)12:47:07 No.100176254

Anonymous 04/25/24(Thu)12:47:07 No.100176254

File: 849.png (498 KB, 1066x863)

498 KB PNG

>tfw you realize meta got dedicated pajeets filtering next llama models

Anonymous
04/25/24(Thu)12:48:16 No.100176261

Anonymous 04/25/24(Thu)12:48:16 No.100176261

>>100176201
I've found L3 70b is the best at spitting out useful code that works out of the box and is ok for analysis but is hamstrung by its medieval context limits.
Yi-34b-200k is surprisingly good for analysis using in-context training if the portion of your codebase fits into the context limit.

Anonymous
04/25/24(Thu)12:48:43 No.100176268

Anonymous 04/25/24(Thu)12:48:43 No.100176268

>>100176199
Gotta give him credit. All it took was some subtlety.

Anonymous
04/25/24(Thu)12:49:47 No.100176287

Anonymous 04/25/24(Thu)12:49:47 No.100176287

File: file.png (49 KB, 730x409)

49 KB PNG

>trained multiple ridiculously performant fine-tunes
which ones?

Anonymous
04/25/24(Thu)12:50:35 No.100176299

Anonymous 04/25/24(Thu)12:50:35 No.100176299

>>100176287
>LLaMA 3
>extended context from 8K -> 128K
Ok, where have you all been keeping this from me.

Anonymous
04/25/24(Thu)12:51:15 No.100176312

Anonymous 04/25/24(Thu)12:51:15 No.100176312

>>100176194
>autism
sure, and?
>it is still miles better than local models though
not anymore, at least for erp. and if you were to remove its ability to search the internet I think it would generally suck at everything with how lobotomized it has gotten
>proxygeddon only happens to poorfags
such as yourself? since a few grand is obviously beyond your purchasing power
>again, autism
thankfully my autism has gotten me a job that pays well enough that I could buy a brand new 3090 every two weeks without compromising my lifestyle or having to draw blood for my mortgage payment

Anonymous
04/25/24(Thu)12:52:28 No.100176325

Anonymous 04/25/24(Thu)12:52:28 No.100176325

>>100176287
>128K context
Is the idiot mixing up Llama 3 with Phi-3?

Anonymous
04/25/24(Thu)12:53:44 No.100176345

Anonymous 04/25/24(Thu)12:53:44 No.100176345

File: 00036-468519150.png (1.69 MB, 1456x1024)

1.69 MB PNG

>>100176312
>my autism has gotten me a job that pays well enough that I could buy a brand new 3090 every two weeks without compromising my lifestyle
Based buyer and saver

Anonymous
04/25/24(Thu)12:53:53 No.100176349

Anonymous 04/25/24(Thu)12:53:53 No.100176349

>>100176299
First I've heard of it.

Anonymous
04/25/24(Thu)12:54:02 No.100176352

Anonymous 04/25/24(Thu)12:54:02 No.100176352

>>100176325
Was Phi's 128k version even real context? Like why even release the 4k version if they have 128k?

Anonymous
04/25/24(Thu)12:54:31 No.100176361

Anonymous 04/25/24(Thu)12:54:31 No.100176361

>>100173727
It has the less slop and gpt-isms i've seen in a long while, it's not the smartest but the vocabulary sells it for me and is a breath of fresh air (been messing around with it for an hour or two)

Anonymous
04/25/24(Thu)12:54:45 No.100176365

Anonymous 04/25/24(Thu)12:54:45 No.100176365

>>100176325
>twitter AI personality has no idea what he's talking about
many such cases

Anonymous
04/25/24(Thu)12:55:08 No.100176369

Anonymous 04/25/24(Thu)12:55:08 No.100176369

>>100176299
https://huggingface.co/MaziyarPanahi/Llama-3-8B-Instruct-32k-v0.1-GGUF
https://huggingface.co/NurtureAI/Meta-Llama-3-8B-Instruct-64k-GGUF

Anonymous
04/25/24(Thu)12:55:23 No.100176371

Anonymous 04/25/24(Thu)12:55:23 No.100176371

>>100176199
I recognize our dear old petra, same antics.

Anonymous
04/25/24(Thu)12:55:40 No.100176379

Anonymous 04/25/24(Thu)12:55:40 No.100176379

>>100176345
This smile reminds me of that one image that was drawn by the drawfag that ended up trooning.

Anonymous
04/25/24(Thu)12:55:54 No.100176383

Anonymous 04/25/24(Thu)12:55:54 No.100176383

>>100176369
>32k
>64k
That's not 128K.

Anonymous
04/25/24(Thu)12:56:27 No.100176387

Anonymous 04/25/24(Thu)12:56:27 No.100176387

It's so funny how certain models work so much better with the wrong prompt format if you aren't trying to use it as an assistant.
Talking about Qwen 1.5 32B specifically, but I've seen that happen to other models too.

Anonymous
04/25/24(Thu)13:01:38 No.100176455

Anonymous 04/25/24(Thu)13:01:38 No.100176455

File: 00022-1199107278.png (1.22 MB, 1024x1024)

1.22 MB PNG

>>100176369
>Meta-Llama-3-8B-Instruct-64k
Fake as fuck. No instructions or description of what they did to extend the context. Tested it in exl2 already and it barely works up to 20k with rope

Anonymous
04/25/24(Thu)13:04:26 No.100176485

Anonymous 04/25/24(Thu)13:04:26 No.100176485

File: 24-04-19 09-59-12 1242.jpg (153 KB, 1024x1024)

153 KB JPG

I've been playing with the small 8B llama3, and I notice it likes to "O-oh..." me a lot - both with a very simple prompt directly in the llama.cpp API, as well as some of my favorite cards in SillyTavern.
I haven't tried 70B yet, since I'm on vacation at the moment and only have the 32GB macbook to play with.

Anonymous
04/25/24(Thu)13:06:00 No.100176502

Anonymous 04/25/24(Thu)13:06:00 No.100176502

Anyone have a sense about mradermacher's older imatrix quants, given that the llama3 ones are broken? I downloaded one of his WizardLM2 8x22B, and I'm trying to figure out if I need to get a different one (or download a third of a terabyte to quant it myself)

From https://github.com/ggerganov/llama.cpp/issues/6841 it sounds like the breakage was resulting in outright garbage, as opposed to subtle quality loss. So, given that the model I have is not spewing obvious garbage, it seems likely fine. But I wanted to double-check, in case there's an insidious "subtly worse" failure mode that I would never notice.

Anonymous
04/25/24(Thu)13:06:35 No.100176506

Anonymous 04/25/24(Thu)13:06:35 No.100176506

Do you think the upcoming Phi 7B or 14B will beat Llama 3 8B?

Anonymous
04/25/24(Thu)13:06:50 No.100176511

Anonymous 04/25/24(Thu)13:06:50 No.100176511

>>100176244
>>100176261
thanks for the info guys, wish i could leave you some reddit gold but this website doesnt let me :(

Anonymous
04/25/24(Thu)13:08:23 No.100176525

Anonymous 04/25/24(Thu)13:08:23 No.100176525

>>100176506
yes it will be more slopped

Anonymous
04/25/24(Thu)13:10:09 No.100176548

Anonymous 04/25/24(Thu)13:10:09 No.100176548

>>100176506
I think it will have strengths and weaknesses over the Llama but not beat it. It is a very different dataset and that will show in what it can do well. The 3.8B already beats all 70B+ local models on some problems I tested it with.

Anonymous
04/25/24(Thu)13:10:13 No.100176550

Anonymous 04/25/24(Thu)13:10:13 No.100176550

miku posters are unhinged

Anonymous
04/25/24(Thu)13:11:24 No.100176563

Anonymous 04/25/24(Thu)13:11:24 No.100176563

>>100176195
**ZLUDA mb

Anonymous
04/25/24(Thu)13:11:32 No.100176566

Anonymous 04/25/24(Thu)13:11:32 No.100176566

File: Miguruguru.png (1.62 MB, 800x1248)

1.62 MB PNG

>>100176550
>unhinged
I think you mean "ascended"

Anonymous
04/25/24(Thu)13:11:38 No.100176567

Anonymous 04/25/24(Thu)13:11:38 No.100176567

>>100176548
>The 3.8B already beats all 70B+ local models on some problems I tested it with.
I can hardly believe this unless you post logs.

Anonymous
04/25/24(Thu)13:12:10 No.100176575

Anonymous 04/25/24(Thu)13:12:10 No.100176575

>>100176550
Unfortunately I only have niche tastes, not mental illness. Otherwise I could blame it on mental illness, rather than just being a weirdo.

Anonymous
04/25/24(Thu)13:12:16 No.100176577

Anonymous 04/25/24(Thu)13:12:16 No.100176577

>>100176566
Good morning, sir.

Anonymous
04/25/24(Thu)13:13:40 No.100176592

Anonymous 04/25/24(Thu)13:13:40 No.100176592

spoonfeed me an easy way to set up a high quality TTS for text generation webui.

Anonymous
04/25/24(Thu)13:14:06 No.100176600

Anonymous 04/25/24(Thu)13:14:06 No.100176600

>>100176194
I think anon is retarded for buying hardware without getting net returns on the investment (they could at least sell their GPU compute on vast.ai and pay off the cost of the GPU in like a year, but being a provider is more difficult than just buying from a provider and saving your money).
I am looking forward to renting 400+gb of vram for $10-20 an hour to try erping with llama 400b.
That hardware would cost me $50,000 (more like $100,000-200,000 with h100's, if I bought the hardware I would go for a 2x mi300x or 20x 3090's). Considering the fact that I fap in like 10 minutes every day, it would take me 2500 days for the $50,000 worth of hardware to be more worth it than renting for $20 a day (and I don't even fap to AI every day or account for the cost of electricity).
But who knows maybe 400b will be shit for ERP, and nobody can finetine it.

Anonymous
04/25/24(Thu)13:15:52 No.100176623

Anonymous 04/25/24(Thu)13:15:52 No.100176623

>>100176600
>erping with llama 400b.
>That hardware would cost me $50,000
Like, a fifth of that if you're not retarded.

Anonymous
04/25/24(Thu)13:16:35 No.100176635

Anonymous 04/25/24(Thu)13:16:35 No.100176635

>>100176600
>hardware would cost me $50,000
still cheaper than getting divorced

Anonymous
04/25/24(Thu)13:16:44 No.100176639

Anonymous 04/25/24(Thu)13:16:44 No.100176639

70b rp undi finetune wen

Anonymous
04/25/24(Thu)13:18:28 No.100176674

Anonymous 04/25/24(Thu)13:18:28 No.100176674

>>100176567
I guard my test set so that it will never have even the slightest chance of being trained on, so I will not do that. You're free to distrust my claims.

Anonymous
04/25/24(Thu)13:18:44 No.100176678

Anonymous 04/25/24(Thu)13:18:44 No.100176678

>>100176639
monday, 3pm

Anonymous
04/25/24(Thu)13:20:16 No.100176703

Anonymous 04/25/24(Thu)13:20:16 No.100176703

>>100176287
I tried the dolphin 8B finetune and yeah it's uncensored but it made it retarded. I got base 8B to solve a simple math problem (yes, I know) but then the dolphin finetune failed.

Anonymous
04/25/24(Thu)13:20:23 No.100176705

Anonymous 04/25/24(Thu)13:20:23 No.100176705

>>100176566
sir please do not redeem ze miku shartsune bloody bastard kind sir thank you

Anonymous
04/25/24(Thu)13:20:53 No.100176716

Anonymous 04/25/24(Thu)13:20:53 No.100176716

>>100176550
they were never on a hinge to begin with

Anonymous
04/25/24(Thu)13:21:17 No.100176725

Anonymous 04/25/24(Thu)13:21:17 No.100176725

>>100176623
I like arguing over hardware, give me your dream setup for 400b (even if it's Q4, I am probably gonna rent for Q8).

Anonymous
04/25/24(Thu)13:21:36 No.100176731

Anonymous 04/25/24(Thu)13:21:36 No.100176731

File: 24-04-19 22-00-14 1393.jpg (202 KB, 1024x1024)

202 KB JPG

>>100176600
>I think anon is retarded for buying hardware without getting net returns on the investment (they could at least sell their GPU compute on vast.ai and pay off the cost of the GPU in like a year, but being a provider is more difficult than just buying from a provider and saving your money).
I highly doubt you can break even on electricity costs from vast, let alone pay back your hardware. I feel like the only party making money would be vast.

Anonymous
04/25/24(Thu)13:25:28 No.100176797

Anonymous 04/25/24(Thu)13:25:28 No.100176797

>>100176725
>give me your dream setup for 400b
2 x C4140 (8xV100 32GB) = 256GB VRAM for $10k
>(even if it's Q4, I am probably gonna rent for Q8).
Pretty sure bigger models suffer less by being quantized. Q4 should be fine, but even if it's not, I have a spare 3090 and can offload the rest to RAM.

Anonymous
04/25/24(Thu)13:26:54 No.100176813

Anonymous 04/25/24(Thu)13:26:54 No.100176813

>>100176731
>I feel like the only party making money would be vast.
Why bother doing math if you have feels, right?

Anonymous
04/25/24(Thu)13:27:38 No.100176822

Anonymous 04/25/24(Thu)13:27:38 No.100176822

running local is just stupid in 2024 I really don't see the point and all the arguments are just justifying my reasons further in fact all I'm really seeing is cope and retards with too many cards

Anonymous
04/25/24(Thu)13:28:43 No.100176841

Anonymous 04/25/24(Thu)13:28:43 No.100176841

How do you know how many context tokens a model can handle?

Anonymous
04/25/24(Thu)13:32:27 No.100176910

Anonymous 04/25/24(Thu)13:32:27 No.100176910

File: 24-04-19 10-05-34 1251.jpg (223 KB, 1024x1024)

223 KB JPG

>>100176813
OK, how much profit do you make from vast.ai?

Anonymous
04/25/24(Thu)13:35:54 No.100176960

Anonymous 04/25/24(Thu)13:35:54 No.100176960

>>100176623
>400b.
>$50,000
What is the price point where you would start considering a mail order bride? And what would be the number of beaks for that price where ai wins over bride?

Anonymous
04/25/24(Thu)13:40:20 No.100177023

Anonymous 04/25/24(Thu)13:40:20 No.100177023

>>100176960
A wife might more financially sound if you have zero income but otherwise you have to consider the 50%+ of all your wealth and income you pay in perpetuity

Anonymous
04/25/24(Thu)13:40:53 No.100177031

Anonymous 04/25/24(Thu)13:40:53 No.100177031

>>100176960
Women are fucking expensive to keep happy. More so if you have children. Just one kid will cost you quarter to half a million dollars before you can legally kick them out. So, beaks can 10x and they'd still be cheaper in the long run.

Anonymous
04/25/24(Thu)13:41:10 No.100177041

Anonymous 04/25/24(Thu)13:41:10 No.100177041

>>100176960
NTA but 3D can't compete with AI fantasy roleplay.

Anonymous
04/25/24(Thu)13:41:54 No.100177053

Anonymous 04/25/24(Thu)13:41:54 No.100177053

>>100176841
It usually says on the model page, but if you're running a gguf it's also in the meta data that's displayed when you load up the model

Anonymous
04/25/24(Thu)13:42:15 No.100177062

Anonymous 04/25/24(Thu)13:42:15 No.100177062

>>100176960
The only reason I don't have a mail order bride is nobody taught me how do to that. Hell I think some of the countries pay YOU to get a girl a greencard.

Anonymous
04/25/24(Thu)13:44:14 No.100177101

Anonymous 04/25/24(Thu)13:44:14 No.100177101

>>100176822
You're not wrong but what other options do we have? I'm not waiting ten years for ai to get better I'll just play with it now even if it's bad.

Anonymous
04/25/24(Thu)13:45:04 No.100177114

Anonymous 04/25/24(Thu)13:45:04 No.100177114

>>100176312
>not anymore, at least for erp.
lol
lmao even
this is false, but even if you weren't saying this out of your ass, how would you know? aren't you a LOCAL autist? Or are you telling me you tried Claude Opus? Did you lurk aicg to see how good Claude Opus is?
I guess this tells a lot about you.
>such as yourself? since a few grand is obviously beyond your purchasing power
Cope. I would rather invest my money to retire earlier than waste all my money on niche hardware that will lose its value and become deprecated in a few years.

Anonymous
04/25/24(Thu)13:46:18 No.100177130

Anonymous 04/25/24(Thu)13:46:18 No.100177130

>>100176960
>mail order bride
yeah let me pay to get into a retarded relationship that will simmer with resentment until it explodes, sounds like a great investment

Anonymous
04/25/24(Thu)13:47:00 No.100177136

Anonymous 04/25/24(Thu)13:47:00 No.100177136

>>100176797
maybe you might find that at a local liquidation auction that won't accept shipping, but I have a feeling that you will only get like half the vram for $10k, and getting 400gb would add up to around $40k.
>>100176813
Not anon, I think the people hosting are 100% making money, but I think what anon is referring to is that residentially you don't have access to cheap electricity, and cheap ISP service (and I think business rates are cheaper than residential + less taxes but the downside is I think you need to own a company building in a business zone or something).
So it's the same problem with mining bitcoin that people felt when mining coins on a gaming GPU's cost more in electricity than the wattage they pay.
I still think you can pay off your 4090 in like a few years, but if it's constantly at 100% power draw (400watts) the cost would be like 15 cents per kilowatt, so that's $500 per year. But if selling to vast.ai per hour is 20 cents (below market) you get $1750, so you pay off your 4090 in a year on paper (realistically its not at 100% load 24/7 but also not rented 24/7, and not counting internet and what vast takes out).
>>100176635
Honestly I wish I could cope and say "robo wife is cheaper" I think sex is 100% free if you try to keep it that way, and the only downside is that humans have ego and they don't follow and like everything you do unlike an AI. I don't want a GF because I think women are going to fuck up my self confidence as a virgin and I won't be truely happy if the girl isn't truely happy, and it's weird how girls on dating apps casually had sex with 30 guys, I feel like there is some sort of societal imbalance preventing people from just being together forever. So I guess I'm an AI incel???

Anonymous
04/25/24(Thu)13:47:51 No.100177145

Anonymous 04/25/24(Thu)13:47:51 No.100177145

>>100177130
if you don't treat her like shit enough fucking will make any girl love you cause oxytocin. the tough part for most is getting to the fucking part

Anonymous
04/25/24(Thu)13:50:30 No.100177180

Anonymous 04/25/24(Thu)13:50:30 No.100177180

>>100177136
>maybe you might find that at a local liquidation auction that won't accept shipping, but I have a feeling that you will only get like half the vram for $10k, and getting 400gb would add up to around $40k.
Again you retards and your feelings. I already have one. Just need to get a second in the upcoming months.

Anonymous
04/25/24(Thu)13:50:47 No.100177181

Anonymous 04/25/24(Thu)13:50:47 No.100177181

>>100177114
i'll have this hardware and still retire early. and I was specifically referencing gpt4 which has measurably gotten worse, there have been academic papers about this even
Two grand for 2x3090 plus a few hundred for 128gb memory has literally zero bearing on my retirement whatsoever. I am so sorry that you are struggling in life, and I hope that things get easier for you in the future. I'm going to continue having fun with my local models and I'm skeptical that there's anything you can do about it

Anonymous
04/25/24(Thu)13:55:02 No.100177253

Anonymous 04/25/24(Thu)13:55:02 No.100177253

Can llm learn anything from large code base?

Anonymous
04/25/24(Thu)13:55:35 No.100177260

Anonymous 04/25/24(Thu)13:55:35 No.100177260

File: BoheMiku.png (1.74 MB, 800x1248)

1.74 MB PNG

>>100177136
>I think sex is 100% free if you try to keep it that way
Yeah, but you tend to end up with chicks like picrel

Anonymous
04/25/24(Thu)13:55:49 No.100177263

Anonymous 04/25/24(Thu)13:55:49 No.100177263

File: tttet.jpg (428 KB, 1825x1152)

428 KB JPG

>>100161344

Anonymous
04/25/24(Thu)13:56:17 No.100177269

Anonymous 04/25/24(Thu)13:56:17 No.100177269

>>100177145
idk anon, in my experience you gotta hit that infatuation mark before the fucking for the woman love to set and cure properly

Anonymous
04/25/24(Thu)13:56:57 No.100177286

Anonymous 04/25/24(Thu)13:56:57 No.100177286

>>100177181
>/lmg/ actually believes they'll be able to run gpt4 on 2x3090s
holy cope batman
well you'll figure it out eventually how are those L3 finetunes coming along btw?

Anonymous
04/25/24(Thu)13:57:22 No.100177292

Anonymous 04/25/24(Thu)13:57:22 No.100177292

>>100177263
Me on the left side of the right image

Anonymous
04/25/24(Thu)13:59:21 No.100177322

Anonymous 04/25/24(Thu)13:59:21 No.100177322

>>100177181
I see, so you close your eyes to avoid facing the reality... Such unfiltered cope.

Anonymous
04/25/24(Thu)14:00:30 No.100177340

Anonymous 04/25/24(Thu)14:00:30 No.100177340

>>100177181
>2x3090
LMAO, I hope you have plans to buy more for LLaMA 3 400B

Anonymous
04/25/24(Thu)14:00:45 No.100177343

Anonymous 04/25/24(Thu)14:00:45 No.100177343

>>100177263
Is the pixel Teto AI-generated? If so, model?

Anonymous
04/25/24(Thu)14:02:19 No.100177370

Anonymous 04/25/24(Thu)14:02:19 No.100177370

>>100177181
>"richfag"
>didn't buy 4090
ngmi

Anonymous
04/25/24(Thu)14:03:12 No.100177380

Anonymous 04/25/24(Thu)14:03:12 No.100177380

>>100177370
only retards go with 4090s ideally you want 10 to 20 3090s to future proof yourself

Anonymous
04/25/24(Thu)14:03:20 No.100177382

Anonymous 04/25/24(Thu)14:03:20 No.100177382

>>100177340
400B won't be noticeably better than 70B anyway. Mememarks aren't everything.

Anonymous
04/25/24(Thu)14:04:00 No.100177394

Anonymous 04/25/24(Thu)14:04:00 No.100177394

>>100177343
https://www.mediafire.com/view/zzr1x9dzf0b9vuz

Anonymous
04/25/24(Thu)14:06:46 No.100177426

Anonymous 04/25/24(Thu)14:06:46 No.100177426

>>100177380
why are you calling CUDA dev retarded? take it back

Anonymous
04/25/24(Thu)14:07:56 No.100177440

Anonymous 04/25/24(Thu)14:07:56 No.100177440

>>100177382
This is true. There's a paper from Google that shows scaling up compute without scaling up training data will net you minimal gains. And from the looks of things we're plateauing data-wise. Sure Altman may try to retard strength it but he won't get his superintelligence that way

Anonymous
04/25/24(Thu)14:08:43 No.100177452

Anonymous 04/25/24(Thu)14:08:43 No.100177452

>>100177382
I actually agree with you. I still think OpenAI/Anthropic has some secret sauce.

Anonymous
04/25/24(Thu)14:14:07 No.100177501

Anonymous 04/25/24(Thu)14:14:07 No.100177501

File: PersonalMikuDJ.png (59 KB, 1136x912)

59 KB PNG

>>100177343
nta, but pixelArtDiffusionXL_spriteShaper.safetensors [7adffa28d4] works really well for me

Anonymous
04/25/24(Thu)14:14:42 No.100177511

Anonymous 04/25/24(Thu)14:14:42 No.100177511

>>100177452
The secret sauce is 256x1B

Anonymous
04/25/24(Thu)14:18:47 No.100177559

Anonymous 04/25/24(Thu)14:18:47 No.100177559

>>100177114
>Did you lurk aicg
NTA but i go in there maybe once a month and the few logs i've seen posted are roughly the same as the ones you see in here, except the perversions are an order of magnitude more retarded.
this really is the dumbest shit to get upset over or try to argue about

Anonymous
04/25/24(Thu)14:19:11 No.100177565

Anonymous 04/25/24(Thu)14:19:11 No.100177565

>>100177501
>>100177263
>>100177260
>>100176910
>>100176731
https://www.youtube.com/watch?v=fsUvejZPTLI&t=3595s

Anonymous
04/25/24(Thu)14:21:49 No.100177597

Anonymous 04/25/24(Thu)14:21:49 No.100177597

>>100177452
the secret sauce is proprietary datasets containing copyrighted information. beyond that I really don't think they're doing much more than some fancy vector db and plugins to pull from external sources on GPT's end. Claude I don't think has any tricks like that, just a good well-curated dataset.

Anonymous
04/25/24(Thu)14:22:00 No.100177599

Anonymous 04/25/24(Thu)14:22:00 No.100177599

>>100177380
>only retards go with 4090s ideally you want 10 to 20 3090s to future proof yourself
3090 and 4090 have essentially the same bandwidth bottleneck, so if you have 2 3090's you will run the same speed as 2 4090's.
more larger the model = more bandwidth needed, and for inference nvlink / pcie is not the bottleneck, the bandwidth is the bottleneck.
So if you can run Q4 70b model at like 15tk/s on 2 3090's you should be able to get 3-5tk/s if you got more 3090's to run Q4 400b (someone with 10 3090's loaded 70b full precision with 150gb of vram usage and it ran at 3-5tk/s https://old.reddit.com/r/LocalLLaMA/comments/1c9l181/10x3090_rig_romed82tepyc_7502p_finally_complete/).
If you want something that will run 400b at a fast speed, you need something like a h200 (it cost as much as a luxury car) or AMD's mi300x (fraction of the price, but requires special OAM for the mobo and AMD LOL).

Anonymous
04/25/24(Thu)14:24:41 No.100177626

Anonymous 04/25/24(Thu)14:24:41 No.100177626

>>100177599
*take this with a grain of salt, I have zero knowledge in actual AI or benchmarks, I am looking for someone to call me an idot

Anonymous
04/25/24(Thu)14:25:47 No.100177634

Anonymous 04/25/24(Thu)14:25:47 No.100177634

>>100176502
No way anon. After seeing the way he acts when others point out the holes in his bad imat files I'm staying clear of all his shit

Anonymous
04/25/24(Thu)14:29:36 No.100177674

Anonymous 04/25/24(Thu)14:29:36 No.100177674

Can anyone point me to code that will let me display images in the Gradio chatbot? I have the image available on local disk and I would like it to present it to me in the chat.
Ive tried embedding it as markdown code and returning markdown code to no avail.

Anonymous
04/25/24(Thu)14:30:11 No.100177677

Anonymous 04/25/24(Thu)14:30:11 No.100177677

>>100177599
You are a very smart and valuable person.

Anonymous
04/25/24(Thu)14:32:05 No.100177695

Anonymous 04/25/24(Thu)14:32:05 No.100177695

>>100174110
https://mathchan.org/ai/ needs more love

Anonymous
04/25/24(Thu)14:33:47 No.100177720

Anonymous 04/25/24(Thu)14:33:47 No.100177720

>>100177599
>(someone with 10 3090's loaded 70b full precision with 150gb of vram usage and it ran at 3-5tk/s https://old.reddit.com/r/LocalLLaMA/comments/1c9l181/10x3090_rig_romed82tepyc_7502p_finally_complete/).
If that reddit retard knew what tensor parallelism was and wasn't running at full precision, maybe his speeds wouldn't be shit.

Anonymous
04/25/24(Thu)14:34:53 No.100177732

Anonymous 04/25/24(Thu)14:34:53 No.100177732

>>100177695
Can we move /lmg/ there? The captcha would keep out the riffraff. Maybe less raiding and thread hijacking.

Anonymous
04/25/24(Thu)14:35:55 No.100177740

Anonymous 04/25/24(Thu)14:35:55 No.100177740

>>100177559
Ignorance truly is bliss...

Anonymous
04/25/24(Thu)14:39:33 No.100177788

Anonymous 04/25/24(Thu)14:39:33 No.100177788

This might explain some stuff for people with quanted llama 3 models.

https://www.reddit.com/r/LocalLLaMA/comments/1cci5w6/quantizing_llama_3_8b_seems_more_harmful_compared/

Apparently llama 3 takes a huge hit going down from 8 bit to 6 bit unlike older models which didn't take a huge hit till under 5 bit.

Anonymous
04/25/24(Thu)14:41:02 No.100177813

Anonymous 04/25/24(Thu)14:41:02 No.100177813

>>100176005
> jew construct.

Anonymous
04/25/24(Thu)14:41:41 No.100177821

Anonymous 04/25/24(Thu)14:41:41 No.100177821

>>100177263
teto with sexo

Anonymous
04/25/24(Thu)14:42:43 No.100177831

Anonymous 04/25/24(Thu)14:42:43 No.100177831

>How Good Are Low-bit Quantized LLaMA3 Models? An Empirical Study
https://arxiv.org/abs/2404.14047

>>100177788
Makes sense. They trained on 15T tokens. Each weight packs a lot more information, meaning quantization is going to hurt more compared to undertrained models.

Anonymous
04/25/24(Thu)14:45:53 No.100177864

Anonymous 04/25/24(Thu)14:45:53 No.100177864

Guys...
what if...
Guys, listen!

What if we trained a LLM to predict... the previous token of a sentence?

Anonymous
04/25/24(Thu)14:46:43 No.100177876

Anonymous 04/25/24(Thu)14:46:43 No.100177876

File: 1706272736148234.jpg (227 KB, 960x960)

227 KB JPG

>test my two new 3090s by loading a 3bpw mixtral onto each
>temps spike to 80~90C and the fans sound like they're about to take off
I guess I'll have to replace the thermal pads on these. The 3090s I already have came with the pads already swapped so I didn't realize how lucky I was.

Anonymous
04/25/24(Thu)14:46:56 No.100177878

Anonymous 04/25/24(Thu)14:46:56 No.100177878

>>100177599
Does VRAM overclock worth it?

Anonymous
04/25/24(Thu)14:48:15 No.100177897

Anonymous 04/25/24(Thu)14:48:15 No.100177897

>>100177634
this - don't be a child and blame llama.cpp for your shitty quant. he is clearly a very emotional individual. bart's quants have usually worked for me and there's way less whining when they don't. like it should be.

Anonymous
04/25/24(Thu)14:48:28 No.100177899

Anonymous 04/25/24(Thu)14:48:28 No.100177899

>>100176703
maybe they figured out "safety" such that trying to finetune it away will make it retarded.

Anonymous
04/25/24(Thu)14:49:14 No.100177917

Anonymous 04/25/24(Thu)14:49:14 No.100177917

>>100177878
pretty much all GPU's are already overclocked, some people are underclocking their GPU so it's more power efficient and so fans don't spin so hard.

Anonymous
04/25/24(Thu)14:50:42 No.100177938

Anonymous 04/25/24(Thu)14:50:42 No.100177938

>>100177788
It is reddit so it is like the spergs here that say they see a huge difference between Q8 and Q5 because it touched their cock incorrectly that one time. Except in /lmg/ someone will call him a faggot and a retard and on reddit people will be nice to him.

Any memeplexity measurements done for quants? That is actually the only thing memeplexity is good for. Also makes me think that if he is actually correct (even without giving any source for what was revealed in his dream) that would mean that bitnet is dead. The spare unneeded extra accuracy of weights is a thing of past and now you are going to all be running a 13B 8bit or a 30B 8 bit for those who got 2 cards.

Anonymous
04/25/24(Thu)14:52:20 No.100177955

Anonymous 04/25/24(Thu)14:52:20 No.100177955

>>100177938
The buzzword to content ratio in this post is off the charts.

Anonymous
04/25/24(Thu)14:53:57 No.100177978

Anonymous 04/25/24(Thu)14:53:57 No.100177978

>>100177938
Why didn't you just read up on BitNet before spouting bullshit about it? I bet you think it's a quant method too huh?

Anonymous
04/25/24(Thu)14:55:09 No.100177991

Anonymous 04/25/24(Thu)14:55:09 No.100177991

>>100177978
Nope I stand by what I said if you don't understand the point then you are dumb.

Anonymous
04/25/24(Thu)14:55:27 No.100177994

Anonymous 04/25/24(Thu)14:55:27 No.100177994

>>100177938
Even if these heavily trained models hurt from quantization more, it doesn't follow that 13BQ8 > 30BQ4. Packing more data into floating point weights is clearly a horribly inefficient and slow process. Even if a bitnet 70B saturates before fp16 70B (which is something I'd worry about), it should still be better than a 30BQ4 of equal size trained for the same time.
Also the best way to compare quants is kl divergence, but ppl is a reasonable substitute.

Anonymous
04/25/24(Thu)14:57:56 No.100178027

Anonymous 04/25/24(Thu)14:57:56 No.100178027

>>100176566
She's just a bunch of noise-pollution, a digital abomination created to torture our poor ears. Miku's 'ascension' is just a myth perpetuated by her brainwashed fanbase.

Anonymous
04/25/24(Thu)14:59:56 No.100178051

Anonymous 04/25/24(Thu)14:59:56 No.100178051

>>100177991
Your "point" goes out the window when you're technically incorrect

Anonymous
04/25/24(Thu)15:04:42 No.100178115

Anonymous 04/25/24(Thu)15:04:42 No.100178115

>>100177899
That or the Dolphin dataset is garbage. Because the answers are always really short and bad.

Anonymous
04/25/24(Thu)15:05:31 No.100178124

Anonymous 04/25/24(Thu)15:05:31 No.100178124

>>100172723
What stack are you using?

Anonymous
04/25/24(Thu)15:07:48 No.100178149

Anonymous 04/25/24(Thu)15:07:48 No.100178149

>>100178124
probably some gay c++ library like imgui

Anonymous
04/25/24(Thu)15:07:52 No.100178151

Anonymous 04/25/24(Thu)15:07:52 No.100178151

>>100178124
nta but looks like imgui

Anonymous
04/25/24(Thu)15:10:26 No.100178185

Anonymous 04/25/24(Thu)15:10:26 No.100178185

>>100178149
>>100178151
yeah seems like it thanks

Anonymous
04/25/24(Thu)15:12:28 No.100178209

Anonymous 04/25/24(Thu)15:12:28 No.100178209

>>100178051
>you're technically incorrect
kill yourself you nigger

Anonymous
04/25/24(Thu)15:17:08 No.100178251

Anonymous 04/25/24(Thu)15:17:08 No.100178251

File: 20-meetingthepope.jpg (84 KB, 960x539)

84 KB JPG

>>100178149
<= imgui's dev

Anonymous
04/25/24(Thu)15:18:13 No.100178265

Anonymous 04/25/24(Thu)15:18:13 No.100178265

>Fire up beat saber
>Look for a song to play
>Most of my songs are Miku
>Remember the meltdown yesterday...
Thanks trannies....

Anonymous
04/25/24(Thu)15:20:18 No.100178281

Anonymous 04/25/24(Thu)15:20:18 No.100178281

>>100178265
Beat saber more like meat saber

Anonymous
04/25/24(Thu)15:21:31 No.100178293

Anonymous 04/25/24(Thu)15:21:31 No.100178293

>>100177831
where are the SmoothQuant quantized models of L3-70B-Instruct then? They tested it but they didn't upload the models on their own HF repo? Their repo is here: https://huggingface.co/mit-han-lab

Anonymous
04/25/24(Thu)15:22:36 No.100178307

Anonymous 04/25/24(Thu)15:22:36 No.100178307

>>100178281
It is a god game for basement autists. I hate physical exercise but it tickled my autism enough that I am still playing it at least once a week for 3 years now.

Anonymous
04/25/24(Thu)15:23:51 No.100178318

Anonymous 04/25/24(Thu)15:23:51 No.100178318

>>100178293
It is at the bottom of summary... https://huggingface.co/LLMQ

Anonymous
04/25/24(Thu)15:24:06 No.100178322

Anonymous 04/25/24(Thu)15:24:06 No.100178322

>>100173727
>Closed dataset
>Kobold
Nah.

Anonymous
04/25/24(Thu)15:36:03 No.100178471

Anonymous 04/25/24(Thu)15:36:03 No.100178471

>>100177831
This is why I don't understand why they chose 8 and 70b. They know the market hardware available. Why the fuck aren't they making a 35b or a 40b? What good is a 70b we have to run at low quants? Is the only point to win stupid benchmarks? Okay then we need a benchmark for 40b, what's that? There are no 40b models? Then you lost to yi, congrats zuck, you lost to yi!

Anonymous
04/25/24(Thu)15:46:29 No.100178594

Anonymous 04/25/24(Thu)15:46:29 No.100178594

>>100177597
The Claude dataset must be the most interesting one in all of ML imo. It has such a different personality compared to EVERY other language model. I'd love to know what they did.

Given how much smuttier it is than all the others I guess it's possible the only difference is that Anthropic doesn't remove stuff like ASSTR or Literotica from the dataset? But I'm not sure those alone would lead to it having such a different and more human-like personality.

Anonymous
04/25/24(Thu)15:51:31 No.100178656

Anonymous 04/25/24(Thu)15:51:31 No.100178656

>>100178115
It's the dataset. Every single one of them have so much shit data from deciding to ouroboros synthetic data from other bots and not cleaning it up. I can't believe every one of these people decided that the state of things was fine because they were able to improve on prior Llama releases so 3 would be no different. The finetune for Llama 3 was obviously done on mined Meta social network data and there is no way any synthetic data is going to match that quality. I guess I'm going to have to suck it up and download one of those "uncensoring" finetune models if possible but man, this really sucks that the community got that complacent and fine with the state of things. I don't think for the next 3 months anyone is going to be able to fine-tune past what the official instruct release did because of how much work is needed to clean a dataset to get it anywhere near where it needs to be.

Anonymous
04/25/24(Thu)15:54:05 No.100178696

Anonymous 04/25/24(Thu)15:54:05 No.100178696

File: file.png (1.22 MB, 768x768)

1.22 MB PNG

My anime image of the day.

Anonymous
04/25/24(Thu)15:55:46 No.100178725

Anonymous 04/25/24(Thu)15:55:46 No.100178725

Can you make a learning model understand causality? How would you encode causality into a model? What would be your mechanism for making a model understand causality? Would you bruteforce it using statistical techniques?
The principles of most models I've seen so far are about encoding world data (text, audio, video, etc..) as compressed bits of information into models. Do you think that current models are able to infer causality from the encoded bits of information as an emergent property? And does it do well the longer you train on the data and the more tokens you feed it?

Anonymous
04/25/24(Thu)15:57:42 No.100178744

Anonymous 04/25/24(Thu)15:57:42 No.100178744

>>100178471
right now you will need to buy new hardware to buy the latest and greatest AI models.
however the prices are not going down, so right now we need to spend around $1500-2000 to run 70b, next 2 years you will need to spend around $4000 on the next latest and greatest AI.

Anonymous
04/25/24(Thu)16:02:37 No.100178801

Anonymous 04/25/24(Thu)16:02:37 No.100178801

>>100178725
>And does it do well the longer you train on the data and the more tokens you feed it?
Yes no maybe? Made me realize that probably at the beginning of training it is learning compression of data instead of actual reasoning. Eventually it should start learning reasoning because it will let it compress more efficiently, but now that I thought about it maybe the problem is that the structure of the network ends up in a sort of local minimum of compression and can't really learn reasoning efficiently? But that would be pretty easy to prove or disprove if you use some benchmark for reasoning during training to see if the reasoning accuracy progresses at the same rate as reciting wikipedia. Also I am just a 4chan moron so I don't know what I am talking about.

Anonymous
04/25/24(Thu)16:06:19 No.100178851

Anonymous 04/25/24(Thu)16:06:19 No.100178851

>>100173514
I'm trying to figure out how an chatbot could be integrated into a game. I suppose that if you lead with a prompt explaining to the bot the context of the NPC it could talk in the moment, and if you have it say some command e.g. *follow player* or *attack* you could have it interact with the world.

Is there some extensive research group or similar where one can read up on what ideas people have come up with, and how they execute it?

Anonymous
04/25/24(Thu)16:06:30 No.100178854

Anonymous 04/25/24(Thu)16:06:30 No.100178854

File: 1.png (66 KB, 702x318)

66 KB PNG

picrel response with prompt from : >>100171961
this DPO tune, working and failing at the same time.https://huggingface.co/mradermacher/Llama3-8B-DPO-uncensored-GGUF

Anonymous
04/25/24(Thu)16:07:35 No.100178870

Anonymous 04/25/24(Thu)16:07:35 No.100178870

>The sexual tension builds deeper in her spleen, her body responding eagerly.
wat

Anonymous
04/25/24(Thu)16:07:37 No.100178871

Anonymous 04/25/24(Thu)16:07:37 No.100178871

>>100178725
Isn't this attention? Statistically bruteforcing what appears to be causality? Is there really some essence of understanding the causal links? Is it just a mirage from dumb rule following? In the Chinese Room does that matter?

Anonymous
04/25/24(Thu)16:08:55 No.100178888

Anonymous 04/25/24(Thu)16:08:55 No.100178888

>>100178870
How is spleen tokenized? Did your sampler have sp- and not pick spine?

Anonymous
04/25/24(Thu)16:09:45 No.100178899

Anonymous 04/25/24(Thu)16:09:45 No.100178899

>>100178656
Whether they (dolphin/hermes authors) like it or not, they'll eventually have to scale finetuning data down to curate it properly instead of continuing to use millions of GPTsloppy examples. A relatively small hand-curated finetuning dataset (~10^3-10^4 examples) + large human preference dataset (in the order of 10^5-10^6 examples or more) should be the proper way.

Anonymous
04/25/24(Thu)16:10:18 No.100178913

Anonymous 04/25/24(Thu)16:10:18 No.100178913

>>100178744
Not so fast richnigga https://hacks.mozilla.org/2024/04/llamafiles-progress-four-months-in/
"Today, you can today use the very latest and most capable open models with llamafile thanks to her hard work. For example, we were able to roll-out llamafiles for Meta’s newest LLaMA 3 models–8B-Instruct and 70B-Instruct–within a day of their release. With yesterday’s 0.8 release, llamafile can also run Grok, Mixtral 8x22B, and Command-R."
When llamafile hits mainstream, there would be a shift to use server-class processors with 64-cores, dual-channel mode, and DDR5-6400 for inference-only purposes.

Anonymous
04/25/24(Thu)16:10:49 No.100178917

Anonymous 04/25/24(Thu)16:10:49 No.100178917

File: op doing what he does best.jpg (177 KB, 1000x1000)

177 KB JPG

>>100176365
>twitter AI fag
>unironically calls it "X"
>"No-Code" in bio
every time

Anonymous
04/25/24(Thu)16:11:12 No.100178924

Anonymous 04/25/24(Thu)16:11:12 No.100178924

>>100178870
>he's never had a hooker massage his spleen
get a load of this pleb

Anonymous
04/25/24(Thu)16:12:16 No.100178940

Anonymous 04/25/24(Thu)16:12:16 No.100178940

>>100178913
>llamafile can also run Grok
Is this a reason to be proud?

Anonymous
04/25/24(Thu)16:13:39 No.100178957

Anonymous 04/25/24(Thu)16:13:39 No.100178957

>>100178913
yup... I'm thinkin' jart won

Anonymous
04/25/24(Thu)16:15:28 No.100178982

Anonymous 04/25/24(Thu)16:15:28 No.100178982

>>100178913
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>her

Anonymous
04/25/24(Thu)16:18:57 No.100179014

Anonymous 04/25/24(Thu)16:18:57 No.100179014

>>100178871
If I understand it correctly, no. Attention and Flash Attention are just prioritizing assigning weights to the most 'important' information for the task.
>>100178801
Reasoning benchmarks are retarded. The thesis of the MSFT's Phi team already proved it - you can just train a relatively small LLM up to 8B on specific datasets and it will score high on reasoning benchmarks but when the user uses it, it will be retarded as fuck. The best 'benchmark' that should be used is to make these LLMs navigate a maze in a rogue-like fashion and take the average of their runs.

Anonymous
04/25/24(Thu)16:19:31 No.100179024

Anonymous 04/25/24(Thu)16:19:31 No.100179024

/lmg/, please give me some RP situations that 7B/8B models usually suck at.

Anonymous
04/25/24(Thu)16:22:43 No.100179072

Anonymous 04/25/24(Thu)16:22:43 No.100179072

are there any good models for generating 3d meshes?

Anonymous
04/25/24(Thu)16:22:55 No.100179076

Anonymous 04/25/24(Thu)16:22:55 No.100179076

Has anyone tried bark.cpp yet?

Anonymous
04/25/24(Thu)16:23:55 No.100179088

Anonymous 04/25/24(Thu)16:23:55 No.100179088

Some days ago gguf were broken because newlines got merged. Is that fixed by now?

Anonymous
04/25/24(Thu)16:23:59 No.100179092

Anonymous 04/25/24(Thu)16:23:59 No.100179092

>>100179014
>Reasoning benchmarks are retarded.
You could just use pure math benchmark. It is all just to check the trend and if it is gradually getting better as it is getting better at compressing data. Or if you see the math result improvement slow down while compression result continue to improve then it is probably becoming just a retarded winrar for text.

Anonymous
04/25/24(Thu)16:24:13 No.100179094

Anonymous 04/25/24(Thu)16:24:13 No.100179094

>burgers are home
fuck. it's all tech support from here

Anonymous
04/25/24(Thu)16:24:15 No.100179095

Anonymous 04/25/24(Thu)16:24:15 No.100179095

File: THE SPLEEN THO.png (39 KB, 881x311)

39 KB PNG

>>100178888
>pic related

>>100178924
Can't say I have.

Anonymous
04/25/24(Thu)16:24:30 No.100179099

Anonymous 04/25/24(Thu)16:24:30 No.100179099

>>100179024
https://www.chub.ai/characters/Vyrea_Aster/doppelganger-interrogation-simulator-654daf19

Anonymous
04/25/24(Thu)16:25:54 No.100179121

Anonymous 04/25/24(Thu)16:25:54 No.100179121

>>100179076
https://github.com/PABannier/bark.cpp
>no .exe
no thanks.

Anonymous
04/25/24(Thu)16:27:28 No.100179146

Anonymous 04/25/24(Thu)16:27:28 No.100179146

File: thumb-1920-1127692.png (1.13 MB, 1920x1080)

1.13 MB PNG

>>100179094
I can make the next thread tech support edition.

Anonymous
04/25/24(Thu)16:27:58 No.100179154

Anonymous 04/25/24(Thu)16:27:58 No.100179154

>>100179146
haven't you made enough threads?

Anonymous
04/25/24(Thu)16:28:50 No.100179164

Anonymous 04/25/24(Thu)16:28:50 No.100179164

File: q1h6mwgu9vz51.jpg (402 KB, 854x1200)

402 KB JPG

>>100179154
After yesterday? I am just getting started.

Anonymous
04/25/24(Thu)16:30:12 No.100179178

Anonymous 04/25/24(Thu)16:30:12 No.100179178

>>100179094
good morning sir please kind bastard redeem the american burger home thanks!

Anonymous
04/25/24(Thu)16:31:39 No.100179201

Anonymous 04/25/24(Thu)16:31:39 No.100179201

>>100177831
There's only one fp16 8B quant
https://huggingface.co/MaziyarPanahi/Meta-Llama-3-8B-Instruct-GGUF/tree/main

Also, these results could explain why so many think there's "8B" retardation. At fp16 it's superior to Q4 70B, which is huge (and actual good use of 24GB of VRAM).

Anonymous
04/25/24(Thu)16:33:08 No.100179221

Anonymous 04/25/24(Thu)16:33:08 No.100179221

>>100179092
Ehhh, there's lots of caveats with benchmarks, especially task-oriented ones like math benchmarks. Take Phi-3, connect it to WolframAlpha, and you have your own agentic math buddy; I'm fondly wary of benchmarks because they can be easily gamed. Testing the models needs to be more broad and active - stochastic scenarios of different levels. The trend is towards them becoming agentic - benchmarks will be just like tying shoelaces or putting on shirts for them. This is why using benchmarks and not updating them as fast as finetunes or models are being trained is such a retarded idea. And no, using LLMs to generate datasets for benchmarks is even more fucking retarded and whoever came up with it needs to be fucking fed to pitbulls.

Anonymous
04/25/24(Thu)16:33:11 No.100179223

Anonymous 04/25/24(Thu)16:33:11 No.100179223

>>100179164
>>100179146
I have nothing against kurisu, she is cute. But she is not /lmg/ you are making me slowly but surely dislike her with your threads.

Anonymous
04/25/24(Thu)16:33:46 No.100179239

Anonymous 04/25/24(Thu)16:33:46 No.100179239

>>100179223
Woah, I suddenly want more Kurisu threads.

Anonymous
04/25/24(Thu)16:34:11 No.100179245

Anonymous 04/25/24(Thu)16:34:11 No.100179245

>>100179099
>When a person/doppelganger comes into the room, IMMEDIATELY DECIDE IF THEY ARE HUMAN OR DOPPELGANGER, BUT DO NOT TELL {{user}} IN ANY WAY. THEN USE THIS DECISION TO INFLUENCE HOW THEY WILL TALK FROM NOW ON.
>THESE TRAITS ALSO APPLIES TO HUMAN. If {{char}} was talking as human, and {{user}} is being mean and started to accuse them, they will still exhibit the symptom above.
I can see how this can confuse the LLM lol

Anonymous
04/25/24(Thu)16:35:20 No.100179256

Anonymous 04/25/24(Thu)16:35:20 No.100179256

>>100179223
NTA but she is definitely /lmg/, if you don't know why you should leave immediately.

Anonymous
04/25/24(Thu)16:35:27 No.100179259

Anonymous 04/25/24(Thu)16:35:27 No.100179259

>>100178854
A little too meme-y. Also, it seems that it doesn't really get the concept of niggebrehaviour. But at least no moralizing.

Anonymous
04/25/24(Thu)16:37:09 No.100179277

Anonymous 04/25/24(Thu)16:37:09 No.100179277

>>100179221
>I'm fondly wary of benchmarks because they can be easily gamed.
Anon I am talking about trying to get a good model. Not about selling it. Of course you wouldn't be trying to game benchmark or even try to make what I am saying a selling point. It is just my fan theory and I am saying how you could easily falsify it or prove it.

Anonymous
04/25/24(Thu)16:37:46 No.100179283

Anonymous 04/25/24(Thu)16:37:46 No.100179283

File: 1695117076473585.png (140 KB, 1029x898)

140 KB PNG

Did kalomaze give a /g/erdict on this or what?

https://github.com/oobabooga/text-generation-webui/pull/5677

In my experience testing every setting for writing was shit except min_p 0.1

Anonymous
04/25/24(Thu)16:38:15 No.100179292

Anonymous 04/25/24(Thu)16:38:15 No.100179292

>>100179256
she is definitely related, but the way he tried to make her a mascot by shitting on miku and starting the whole trans herpes miku thing is what's leading to people disliking him, and by extension her

Anonymous
04/25/24(Thu)16:38:16 No.100179293

Anonymous 04/25/24(Thu)16:38:16 No.100179293

>>100179223
I had nothing against Miku, she is cute. But I came to /lmg/ and /lmg/ is making me slowly but surely dislike her. Curb your autism sperg.

Anonymous
04/25/24(Thu)16:40:15 No.100179312

Anonymous 04/25/24(Thu)16:40:15 No.100179312

File: file.png (341 KB, 640x480)

341 KB PNG

Anonymous
04/25/24(Thu)16:40:45 No.100179325

Anonymous 04/25/24(Thu)16:40:45 No.100179325

>>100179201
The paper shows basically no degradation at 8bpw, though. And their tables have fp16 8B nowhere near as good as 4bit 70B, even shitty RTN quant, where are you getting that from?

Anonymous
04/25/24(Thu)16:40:47 No.100179327

Anonymous 04/25/24(Thu)16:40:47 No.100179327

>>100179293
>autist talking about autism

Anonymous
04/25/24(Thu)16:41:28 No.100179337

Anonymous 04/25/24(Thu)16:41:28 No.100179337

>>100179327
I am keeping mine in check. You should do the same.

Anonymous
04/25/24(Thu)16:42:13 No.100179353

Anonymous 04/25/24(Thu)16:42:13 No.100179353

File: IMG_20240425_173853.jpg (471 KB, 1080x1127)

471 KB JPG

>Our findings indicate that while LLAMA3 still demonstrates superior performance after
>quantization, the performance degradation associated with quantization is significant and can even
>lead to larger declines in many cases. This discovery highlights the potential challenges of deploying
>LLAMA3 in resource-constrained environments and underscores the ample room for growth and
>improvement within the context of low-bit quantization. The empirical insights from our research are
>expected to be valuable for the development of future LLM quantization techniques, especially in
>terms of narrowing the performance gap with the original models. By addressing the performance
>degradation caused by low-bit quantization, we anticipate that subsequent quantization paradigms
>will enable LLMs to achieve stronger capabilities at a lower computational cost, ultimately driving
>the progress of generative artificial intelligence, as represented by LLMs, to new heights.

(V)RAMlet bros... it's over.

Anonymous
04/25/24(Thu)16:42:25 No.100179355

Anonymous 04/25/24(Thu)16:42:25 No.100179355

>>100179327
>autist talking about a autist talking about autism

Anonymous
04/25/24(Thu)16:42:33 No.100179357

Anonymous 04/25/24(Thu)16:42:33 No.100179357

>>100179337
>in check

Anonymous
04/25/24(Thu)16:42:38 No.100179360

Anonymous 04/25/24(Thu)16:42:38 No.100179360

>>100177788
Are there any 6-8 70B confirmed not to be broken? NousResearch is good but they only did up to Q5 and I don't want to spend hours figure out how to convert and quant myself.

Anonymous
04/25/24(Thu)16:44:05 No.100179376

Anonymous 04/25/24(Thu)16:44:05 No.100179376

>>100178957
No, but it's a good proof of concept that you don't need an H100 or multiple GPUs to run a 100B+ parameter model. There's a lot of room for optimization of inferencing and we're barely scratching it.
>>100178982
>nooo it's a heckin' tranny I can't accept his work!!!
beat his work if you can instead of moping around gender identities. you're no better than leftist and normie retards complaining about 'muh patriarchy' when you focus on someone's gender instead of the quality of their work.

Anonymous
04/25/24(Thu)16:45:27 No.100179395

Anonymous 04/25/24(Thu)16:45:27 No.100179395

>>100179353
vramlets destroyed anally as usual. Btw bitnet will plateau much earlier than fp32. 7B bitnet trained on 15T tokens will be just as shit as 7B bitnet trained on 1T tokens.

Anonymous
04/25/24(Thu)16:45:31 No.100179399

Anonymous 04/25/24(Thu)16:45:31 No.100179399

>>100179376
>uhm! don't you think that both sides are LE BAD!
go away.

Anonymous
04/25/24(Thu)16:45:55 No.100179406

Anonymous 04/25/24(Thu)16:45:55 No.100179406

>>100179376
its literally just packaged llama.cpp

Anonymous
04/25/24(Thu)16:47:29 No.100179427

Anonymous 04/25/24(Thu)16:47:29 No.100179427

>>100179353
did anyone here think otherwise?
full-sized AI on a home pc will never be a reality.

Anonymous
04/25/24(Thu)16:48:17 No.100179440

Anonymous 04/25/24(Thu)16:48:17 No.100179440

File: msedge_0aAtXxig0v.png (190 KB, 2926x660)

190 KB PNG

>decide to update both koboldcpp and SillyTavern since I haven't done it in a while
>everything broken
How do I get CPP to show up in my API list?

Anonymous
04/25/24(Thu)16:49:03 No.100179451

Anonymous 04/25/24(Thu)16:49:03 No.100179451

>>100179353
>>100178318
>they still didn't upload the smoothquant versions of 70b-instruct.
these fucking cunts someone message them. both of the HF repos only have the L3-8b quantized versions. how is someone supposed to validate their findings for the 70b quantized versions?

Anonymous
04/25/24(Thu)16:49:06 No.100179452

Anonymous 04/25/24(Thu)16:49:06 No.100179452

>>100179440
it's an option under text completion

Anonymous
04/25/24(Thu)16:49:08 No.100179453

Anonymous 04/25/24(Thu)16:49:08 No.100179453

>>100179440
pick text completion instead

Anonymous
04/25/24(Thu)16:50:21 No.100179478

Anonymous 04/25/24(Thu)16:50:21 No.100179478

File: msedge_XBbS5ZXAHF.png (75 KB, 1311x538)

75 KB PNG

>>100179452
>>100179453
Tried that

Anonymous
04/25/24(Thu)16:51:29 No.100179503

Anonymous 04/25/24(Thu)16:51:29 No.100179503

>>100179201
I didn't see that on the chart. Also how can fp16 be good use when fp8 is just as good. Also fp6 isn't on the chart. Prob use 6 bit 70b.

Anonymous
04/25/24(Thu)16:51:53 No.100179511

Anonymous 04/25/24(Thu)16:51:53 No.100179511

>>100179478
ur blind
its there http://127.0.0.1:5001

Anonymous
04/25/24(Thu)16:52:35 No.100179522

Anonymous 04/25/24(Thu)16:52:35 No.100179522

>https://old.reddit.com/r/LocalLLaMA/comments/1cci5w6/quantizing_llama_3_8b_seems_more_harmful_compared/
erm.... GGUF bros what is this..?

Anonymous
04/25/24(Thu)16:52:38 No.100179524

Anonymous 04/25/24(Thu)16:52:38 No.100179524

>>100179353
Why no gguf or exl2?

Anonymous
04/25/24(Thu)16:53:00 No.100179540

Anonymous 04/25/24(Thu)16:53:00 No.100179540

>>100179478
Seems like your SillyTavern settings could not be saved. You should check the SillyTavern server connection and reload the page to prevent data loss.

Anonymous
04/25/24(Thu)16:53:11 No.100179544

Anonymous 04/25/24(Thu)16:53:11 No.100179544

File: msedge_uQ5qgUhRli.png (79 KB, 1303x552)

79 KB PNG

>>100179511

Anonymous
04/25/24(Thu)16:53:46 No.100179555

Anonymous 04/25/24(Thu)16:53:46 No.100179555

>>100179353
This makes me wonder how, say, llama 3 8b would perform if it was trained in 4bit to begin with.

Anonymous
04/25/24(Thu)16:54:22 No.100179566

Anonymous 04/25/24(Thu)16:54:22 No.100179566

File: 1.png (59 KB, 550x398)

59 KB PNG

>>100179544
anon... i...

Anonymous
04/25/24(Thu)16:55:45 No.100179589

Anonymous 04/25/24(Thu)16:55:45 No.100179589

>>100179544
Close everything then try again.

Anonymous
04/25/24(Thu)16:55:51 No.100179591

Anonymous 04/25/24(Thu)16:55:51 No.100179591

>>100179478
$10 says you shut down your ST at some point and are still using your old session

Anonymous
04/25/24(Thu)16:56:00 No.100179595

Anonymous 04/25/24(Thu)16:56:00 No.100179595

>>100179555
Bloated bitnet

Anonymous
04/25/24(Thu)16:57:06 No.100179617

Anonymous 04/25/24(Thu)16:57:06 No.100179617

>>100179591
It was this, I'm a fucking retard
I had like 20+ windows open looking at different frontends I could try out and models to download and loli tummies to goon over and I got fed up and nuked everything including the server, lmao

Anonymous
04/25/24(Thu)16:58:39 No.100179641

Anonymous 04/25/24(Thu)16:58:39 No.100179641

>>100179395
>70B bitnet that fits on a 3090 and plateaus at 2T so it performs like llama 2
You know what, I'll take it

Anonymous
04/25/24(Thu)16:59:08 No.100179645

Anonymous 04/25/24(Thu)16:59:08 No.100179645

So exllama and vllm > llama.cpp I guess

Anonymous
04/25/24(Thu)16:59:25 No.100179652

Anonymous 04/25/24(Thu)16:59:25 No.100179652

>>100177732
A lot of the research side of /lmg/ would work better over there. Maybe if quality is dense enough it'd spread by word of mouth, I don't know how to give word to industry devs without inviting the planet.

Anonymous
04/25/24(Thu)17:00:04 No.100179667

Anonymous 04/25/24(Thu)17:00:04 No.100179667

>>100179353
Damn, so it's over for BitNet huh. And the meta will be Q8 8B

Anonymous
04/25/24(Thu)17:01:10 No.100179681

Anonymous 04/25/24(Thu)17:01:10 No.100179681

>>100179503
>>100179325
Why everyone going by numbers on a chart? Just try the models out yourself. The benchmarks do not cover everything that can fit into 15T parameters.

Anonymous
04/25/24(Thu)17:02:19 No.100179697

Anonymous 04/25/24(Thu)17:02:19 No.100179697

>>100179652
What research side

Anonymous
04/25/24(Thu)17:03:40 No.100179723

Anonymous 04/25/24(Thu)17:03:40 No.100179723

georgie's grift is starting to unravel

Anonymous
04/25/24(Thu)17:04:08 No.100179729

Anonymous 04/25/24(Thu)17:04:08 No.100179729

>>100179681
I did, 8B fp16 is retarded compared to 70B Q4. This should not surprise anyone. The paper does not even contradict this.

Anonymous
04/25/24(Thu)17:04:15 No.100179732

Anonymous 04/25/24(Thu)17:04:15 No.100179732

>>100179667
Bitnet isn't quantized

Anonymous
04/25/24(Thu)17:04:33 No.100179740

Anonymous 04/25/24(Thu)17:04:33 No.100179740

>>100179667
I don't think so.
The problem that's being pointed out is that a model trained on a fuckton of tokens using high precision FP loses information when compressed to a lower precision.
BitNet is, what, 1.58 bpw by default? It already has all the information encoded that way.
Apples and oranges, I think.

Anonymous
04/25/24(Thu)17:04:33 No.100179741

Anonymous 04/25/24(Thu)17:04:33 No.100179741

>>100179681
Meta themselves said Q8 is no degredation

Anonymous
04/25/24(Thu)17:07:50 No.100179787

Anonymous 04/25/24(Thu)17:07:50 No.100179787

>>100179095
>in her spong
>in her sparse
>in her spon
I guess that's what grabbing a 0.02% likely token does.

Anonymous
04/25/24(Thu)17:09:48 No.100179821

Anonymous 04/25/24(Thu)17:09:48 No.100179821

>>100179787
Kind of makes me wish we had minP but not scaled to the top token probability so that I could simply tell the thing to ignore every token under a certain threshold.
Which shouldn't be hard to implement at all, a simple flag that enables or disables the scaling.

Anonymous
04/25/24(Thu)17:11:11 No.100179844

Anonymous 04/25/24(Thu)17:11:11 No.100179844

>>100179524
turboderp and ikawrakow haven't paid their membership fees to the quant papers mafia.

Anonymous
04/25/24(Thu)17:12:27 No.100179868

Anonymous 04/25/24(Thu)17:12:27 No.100179868

>>100179524
Isn't awq close to exl2? I remember seeing that AWQ used parts of exllama

Anonymous
04/25/24(Thu)17:13:26 No.100179881

Anonymous 04/25/24(Thu)17:13:26 No.100179881

>>100179524
Because no companies care about that shit. Everyone is using vLLM or TensorRT-LLM.

Anonymous
04/25/24(Thu)17:22:12 No.100180014

Anonymous 04/25/24(Thu)17:22:12 No.100180014

>>100179312
>When you go to sleep amidst a kino plot and in the morning suddenly your model is retarded once more.

Anonymous
04/25/24(Thu)17:23:53 No.100180040

Anonymous 04/25/24(Thu)17:23:53 No.100180040

>>100179451
>how is someone supposed to validate their findings for the 70b quantized versions?
Stop pretending you can 70B 8bit

Anonymous
04/25/24(Thu)17:24:40 No.100180057

Anonymous 04/25/24(Thu)17:24:40 No.100180057

>>100173514
It's been literally two weeks. How can I get koboldcpp to behave when generating with llama-3 70B? It keeps putting out tokens that aren't recognized, like "|eot_id|><|start_header_id|>assistant<|end_header_id|>"
And each chat ends with an error dialog about unexpected end of output or somesuch

Anonymous
04/25/24(Thu)17:26:52 No.100180094

Anonymous 04/25/24(Thu)17:26:52 No.100180094

>>100179353
Wasn't Meta betting on open source because providers can use llama cheaper than other models or closed api providers? Doesn't making all their models large, dense, and unquantifiable hurt that? Seems like llama might be good for a handful here that can afford to build mining rigs, but if you're trying to run a service, assuming equal performance, something like Snowflake would be far more cheaper and desirable than a dense unquantable L3 405B.

Anonymous
04/25/24(Thu)17:28:10 No.100180112

Anonymous 04/25/24(Thu)17:28:10 No.100180112

Everything is broken, even hugging.chat llama3 is broken!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Anonymous
04/25/24(Thu)17:29:00 No.100180124

Anonymous 04/25/24(Thu)17:29:00 No.100180124

>>100179353
lmao so based, the more you buy the more you save

Anonymous
04/25/24(Thu)17:30:46 No.100180160

Anonymous 04/25/24(Thu)17:30:46 No.100180160

>>100180094
I don't think they consciously chose to finally reach saturation point. Or maybe it is some bug but even if it is there will be a point where quants will stop working. It is pretty obvious.

Anonymous
04/25/24(Thu)17:34:04 No.100180200

Anonymous 04/25/24(Thu)17:34:04 No.100180200

>>100180197
>>100180197
>>100180197

Anonymous
04/25/24(Thu)17:34:14 No.100180204

Anonymous 04/25/24(Thu)17:34:14 No.100180204

god i hope next gen consoomer nvidia cards start at 1000 dollars for 8gb of vram, total vramlet cucl death
>t. h100 cluster GOD

Anonymous
04/25/24(Thu)17:34:34 No.100180209

Anonymous 04/25/24(Thu)17:34:34 No.100180209

>>100180200
yep time to leave lmg for a day

Anonymous
04/25/24(Thu)17:35:06 No.100180215

Anonymous 04/25/24(Thu)17:35:06 No.100180215

>>100180209
bye!

Anonymous
04/25/24(Thu)17:35:38 No.100180228

Anonymous 04/25/24(Thu)17:35:38 No.100180228

>>100180200
>(EMBED)
again

Anonymous
04/25/24(Thu)17:36:16 No.100180237

Anonymous 04/25/24(Thu)17:36:16 No.100180237

New bake pls

Anonymous
04/25/24(Thu)17:36:25 No.100180239

Anonymous 04/25/24(Thu)17:36:25 No.100180239

>>100180228
>rent free

Anonymous
04/25/24(Thu)17:36:46 No.100180248

Anonymous 04/25/24(Thu)17:36:46 No.100180248

>>100180200
Baking on page seven?
The stink of desperation is not appealing

Anonymous
04/25/24(Thu)17:36:53 No.100180249

Anonymous 04/25/24(Thu)17:36:53 No.100180249

>>100180200
what's with autists coming back from time to time to force their own garbage?

Anonymous
04/25/24(Thu)17:38:32 No.100180277

Anonymous 04/25/24(Thu)17:38:32 No.100180277

>>100180204
>>t. h100 cluster GOD
yeah cool story bro

Anonymous
04/25/24(Thu)17:38:53 No.100180282

Anonymous 04/25/24(Thu)17:38:53 No.100180282

>>100180249
I am desensitized to your (you)'s now I want your (ree)'s

Anonymous
04/25/24(Thu)17:40:06 No.100180305

Anonymous 04/25/24(Thu)17:40:06 No.100180305

>>100180249
>mikufaggot afraid of changes

Anonymous
04/25/24(Thu)17:45:22 No.100180370

Anonymous 04/25/24(Thu)17:45:22 No.100180370

how do I set up rope for llama3?
I don't want my wife to forget how we met

Anonymous
04/25/24(Thu)17:47:44 No.100180399

Anonymous 04/25/24(Thu)17:47:44 No.100180399

>>100180370
>I don't want my wife to forget how we met
2MW!

Anonymous
04/25/24(Thu)18:04:04 No.100180634

Anonymous 04/25/24(Thu)18:04:04 No.100180634

>>100179395 >>100179427 >>100179451 >>100179524 >>100179555 >>100179667 >>100180094 >>100180124
>>100180094
>>100180160
>It is pretty obvious
It's obvious bullshit. You are retarded. Being dense (vs. MoE crap or whatever) or thoroughly pretrained (vs half-baked like llama2) has no bearing on quality degradation due to quantization. And not even this retarded paper makes such a claim. Those "researchers" didn't even compare to other models. But they know how to prompt you to hallucinate bullshit.
Their finding: Quantized models perform worse than unquantized models. This is seriously everything they've got. What a great new discovery.

Anonymous
04/25/24(Thu)18:33:19 No.100180997

Anonymous 04/25/24(Thu)18:33:19 No.100180997

>>100178913
at a full 2 token/s amazing !

Anonymous
04/25/24(Thu)18:34:01 No.100181008

Anonymous 04/25/24(Thu)18:34:01 No.100181008

>>100180634
I was going to make a post but thanks for doing it for me. It's like they didn't even look at the paper. Some of these posts are pretty suspiciously worded anyway, really makes you think.

Anonymous
04/25/24(Thu)19:28:15 No.100181721

Anonymous 04/25/24(Thu)19:28:15 No.100181721

>>100177263
I like this Teto

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.