/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/20/24(Tue)08:15:07 No.101990712

File: 39_05556___.png (812 KB, 720x1280)

812 KB PNG

/lmg/ - Local Models General Anonymous 08/20/24(Tue)08:15:07 No.101990712 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101981616 & >>101970380

►News
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
08/20/24(Tue)08:16:48 No.101990728

Anonymous 08/20/24(Tue)08:16:48 No.101990728

I'm the best

Anonymous
08/20/24(Tue)08:21:02 No.101990781

Anonymous 08/20/24(Tue)08:21:02 No.101990781

Is the new Magnum Largestral finetune any good (as opposed to base Largestral)?

Anonymous
08/20/24(Tue)08:22:54 No.101990799

Anonymous 08/20/24(Tue)08:22:54 No.101990799

>>101990728
This Anon is the best!

Anonymous
08/20/24(Tue)08:22:59 No.101990805

Anonymous 08/20/24(Tue)08:22:59 No.101990805

File: Hatsune-Miku-Vocaloid-Ani(...).jpg (249 KB, 803x1094)

249 KB JPG

►Recent Highlights from the Previous Thread: >>101981616

--Nothing, local is dead.

►Recent Highlight Posts from the Previous Thread: >>101982458

Anonymous
08/20/24(Tue)08:23:33 No.101990815

Anonymous 08/20/24(Tue)08:23:33 No.101990815

>>101990781
I'm having fun with it for long (E)RP

Anonymous
08/20/24(Tue)08:24:13 No.101990821

Anonymous 08/20/24(Tue)08:24:13 No.101990821

>>101990805
Bitnet will save local!

Anonymous
08/20/24(Tue)08:25:22 No.101990838

Anonymous 08/20/24(Tue)08:25:22 No.101990838

>>101990805
Seething schizo. Local is better than ever. Stay mad.

Anonymous
08/20/24(Tue)08:31:09 No.101990920

Anonymous 08/20/24(Tue)08:31:09 No.101990920

File: teto manual prompting red(...).jpg (226 KB, 1024x1024)

226 KB JPG

►Recent Highlights from the Previous Thread: >>101981616

--Paper: SMILE: Zero-Shot Sparse Mixture of Low-Rank Experts Construction From Pre-Trained Foundation Models: >>101985710
--THUDM/LongWriter repository for long context LLMs word generation: >>101986037
--Used 3090 shilled on /lmg/ due to being cheapest 24GB GPU: >>101986950 >>101986969 >>101986973 >>101986974 >>101988401 >>101987334 >>101986991 >>101987691 >>101987707 >>101987714 >>101987755 >>101987825 >>101987953
--LLMs used for game development and improving productivity: >>101985401 >>101985435 >>101985592
--Different hardware affects reproducibility of AI-generated logs: >>101986552 >>101986631 >>101986664 >>101986687 >>101986716 >>101986752
--Collaborative AI storytelling and roleplay session proposed: >>101982186
--Tensor Parallelism with uneven GPU count is unsupported: >>101987979 >>101988308 >>101988418
--Mistral Large can run locally but has low tokens per second: >>101983668 >>101983702 >>101984650 >>101986432 >>101986891 >>101987197
--MiniCPM struggles with image insertion in RP scenarios: >>101985433
--Llama 3.1 70B works well for some users, but has issues with chain of thought for others: >>101984397 >>101984421 >>101984513
--Flux performance on AMD hardware is currently slow: >>101987632 >>101987674 >>101987771 >>101987779 >>101987801
--DRY issues in koboldcpp 1.73, user relies on MiniCPM support: >>101983865 >>101983869 >>101983896 >>101983976 >>101983976
--Zeyuan Allen-Zhu's ICML keynote talk is back up: >>101985742
--Reminder to compile with -j flag and cores: >>101985928
--Miku (free space): >>101982419

►Recent Highlight Posts from the Previous Thread: >>101981738

Anonymous
08/20/24(Tue)08:31:10 No.101990921

Anonymous 08/20/24(Tue)08:31:10 No.101990921

teto...

Anonymous
08/20/24(Tue)08:37:06 No.101990989

Anonymous 08/20/24(Tue)08:37:06 No.101990989

>>101990781
the iq2_s works for 2 3090 pretty well, i'm liking it

Anonymous
08/20/24(Tue)08:38:11 No.101991006

Anonymous 08/20/24(Tue)08:38:11 No.101991006

>literally zero magnum finetunes for llama3
Why??

Anonymous
08/20/24(Tue)08:38:25 No.101991011

Anonymous 08/20/24(Tue)08:38:25 No.101991011

> try one of those Mistral Nemo 12B based models
>in my eyes the scenario is of middling complexity
>bot is a thief in a wizard's tower
>they encounter my character trapped in glass prison
>80% of the time the thief still reaches trough the glass or somehow the wizard became a friend.
shits still retarded

Anonymous
08/20/24(Tue)08:39:58 No.101991032

Anonymous 08/20/24(Tue)08:39:58 No.101991032

>>101991006
Because it sucks.

Anonymous
08/20/24(Tue)08:40:40 No.101991038

Anonymous 08/20/24(Tue)08:40:40 No.101991038

>>101991011
Are there any models in that weight bracket that do better?

Anonymous
08/20/24(Tue)08:45:04 No.101991076

Anonymous 08/20/24(Tue)08:45:04 No.101991076

>>101991032
Sucks how? Benchmark numbers look good and I'd rather run 70B at 4.5bpw than 123B at 2bpw

Anonymous
08/20/24(Tue)08:45:33 No.101991079

Anonymous 08/20/24(Tue)08:45:33 No.101991079

as a retard who only has experience with claude, is my 3.5k sys prompt going to work well out of the box on a model like mistral large? or am i best off paring back the instructions and leaving just basic stuff?

Anonymous
08/20/24(Tue)08:46:00 No.101991084

Anonymous 08/20/24(Tue)08:46:00 No.101991084

File: large-vs-magnum.jpg (1.42 MB, 3228x5082)

1.42 MB JPG

>>101990781
Here's one test.
Large on the left with temp 1.8 / min p 0.1 vs Magnum with temp 1.0 / min p 0.05

Anonymous
08/20/24(Tue)08:46:52 No.101991095

Anonymous 08/20/24(Tue)08:46:52 No.101991095

>>101991079
>is my 3.5k sys prompt going to work well out of the box on a model like mistral large?
no

Anonymous
08/20/24(Tue)08:48:33 No.101991115

Anonymous 08/20/24(Tue)08:48:33 No.101991115

>>101991095
thanks anon, i appreciate the help. are the very short prompts i see that are just `write like char.` and then the card + persona contents the ideal length/complexity wise?

Anonymous
08/20/24(Tue)08:49:59 No.101991128

Anonymous 08/20/24(Tue)08:49:59 No.101991128

File: 1713291463105789.jpg (980 KB, 1856x2464)

980 KB JPG

>>101990712

Anonymous
08/20/24(Tue)08:50:53 No.101991132

Anonymous 08/20/24(Tue)08:50:53 No.101991132

>>101991011
Mistral did something fucky with their datasets because Mistral-Math, Mistral-Nemo and yes, even Mistral-Large occasionally struggle with certain basic bitch concepts, such as possession. (Yeah I've had fucking Mistral-Large, at Q5, not even some brain-damage quant flip fucking possessive clauses in the middle of an RP)

Anonymous
08/20/24(Tue)08:51:39 No.101991148

Anonymous 08/20/24(Tue)08:51:39 No.101991148

>>101991128
omg it looga and migu

Anonymous
08/20/24(Tue)08:59:01 No.101991238

Anonymous 08/20/24(Tue)08:59:01 No.101991238

Why is nobody talking about Magnum v2 123b?

This model beats Qwen 2 72b and comes really close to Llama 3.1 405b in a bunch of benchmarks. 53.6 on UGI leaderboard is absolutely insane, Hermes 3 405b has just 66.71. And with 4bit quants, you should be able to fit it on three 3090s.

Anonymous
08/20/24(Tue)09:02:21 No.101991285

Anonymous 08/20/24(Tue)09:02:21 No.101991285

>>101991238
>53.6 on UGI leaderboard
But Mistral Large is 55.45.

Anonymous
08/20/24(Tue)09:02:44 No.101991293

Anonymous 08/20/24(Tue)09:02:44 No.101991293

>>101991238
We're all vramlets.
>insane
That's less than the instruct tune, you know?

And imo, Magnum v2 123B feels just like Largestral lobotomized.

Anonymous
08/20/24(Tue)09:03:54 No.101991307

Anonymous 08/20/24(Tue)09:03:54 No.101991307

stop replying to the schizo who keeps posting about magnum

Anonymous
08/20/24(Tue)09:05:55 No.101991329

Anonymous 08/20/24(Tue)09:05:55 No.101991329

>>101991238
So this is the quality of pasta in /lmg/.
How disapointing.

Anonymous
08/20/24(Tue)09:13:46 No.101991434

Anonymous 08/20/24(Tue)09:13:46 No.101991434

>>101991006
We tried and it's just bad

Anonymous
08/20/24(Tue)09:25:15 No.101991592

Anonymous 08/20/24(Tue)09:25:15 No.101991592

>>101991238
It came out yesterday, give people time. Also this >>101991285 What did they tune on to cuck it up?

Anonymous
08/20/24(Tue)09:30:48 No.101991664

Anonymous 08/20/24(Tue)09:30:48 No.101991664

l3 sucks

Anonymous
08/20/24(Tue)09:31:49 No.101991681

Anonymous 08/20/24(Tue)09:31:49 No.101991681

>>101991664
prompt issue + buy an ad

Anonymous
08/20/24(Tue)09:32:32 No.101991692

Anonymous 08/20/24(Tue)09:32:32 No.101991692

>>101991238
>Why is nobody talking about Magnum v2 123b?
Because you didn't buy an ad

Anonymous
08/20/24(Tue)09:33:03 No.101991701

Anonymous 08/20/24(Tue)09:33:03 No.101991701

>>101990805
let it be known that mikutroons want /lmg/ to die

Anonymous
08/20/24(Tue)09:34:36 No.101991729

Anonymous 08/20/24(Tue)09:34:36 No.101991729

>>101991238
Make a money transfer for an information banner on the bottom of the page.

Anonymous
08/20/24(Tue)09:36:38 No.101991780

Anonymous 08/20/24(Tue)09:36:38 No.101991780

>>101991132
That's a language issue. If you use Prolog instead of English, you'll never have that problem, because Prolog specifies possession explicitly.

Anonymous
08/20/24(Tue)09:40:03 No.101991825

Anonymous 08/20/24(Tue)09:40:03 No.101991825

File: 1719595846011771.jpg (1.7 MB, 3218x2968)

1.7 MB JPG

>>101990712
good morning I love teto

Anonymous
08/20/24(Tue)09:41:30 No.101991847

Anonymous 08/20/24(Tue)09:41:30 No.101991847

Do you think we will get new models before end of year?

Anonymous
08/20/24(Tue)09:42:01 No.101991850

Anonymous 08/20/24(Tue)09:42:01 No.101991850

>>101991825
whoa cute tet

Anonymous
08/20/24(Tue)09:43:08 No.101991866

Anonymous 08/20/24(Tue)09:43:08 No.101991866

D'you think you could buy a 16gb V100 and upgrade it to 32gb? I kinda think it's possible. Just have to find a source for the chips.

Anonymous
08/20/24(Tue)09:45:23 No.101991898

Anonymous 08/20/24(Tue)09:45:23 No.101991898

>>101991866
There's more to life than vram. Turn back while you can, before it's too late. Dedicated LLM cards will be coming in the next few years..

Anonymous
08/20/24(Tue)09:46:52 No.101991923

Anonymous 08/20/24(Tue)09:46:52 No.101991923

>people shit on model makers for making slop
>but give nvidia, the biggest bottleneck, free pass
>some even pride themselves on giving money to leather jacket man

Anonymous
08/20/24(Tue)09:47:37 No.101991935

Anonymous 08/20/24(Tue)09:47:37 No.101991935

File: LLM-history-fancy.png (732 KB, 6285x1307)

732 KB PNG

>>101991847
100%. Cohere, DBRX and Chyna all can drop stuff. After elections, late November a big one will come out.
Source: look at the cycle, every ~4 months is a new era.

Anonymous
08/20/24(Tue)09:48:43 No.101991958

Anonymous 08/20/24(Tue)09:48:43 No.101991958

>>101991923
Because Nvidia made all of this possible. Sloptuners are like video game modders, they live by fooling gullible people into thinking they know better than the billion dollar corporations that built the very tools they're using.

Anonymous
08/20/24(Tue)09:49:30 No.101991975

Anonymous 08/20/24(Tue)09:49:30 No.101991975

>>101991935
Llama1 mid size was 33B, not 34B

Anonymous
08/20/24(Tue)09:50:35 No.101991997

Anonymous 08/20/24(Tue)09:50:35 No.101991997

>>101991958
People shit on llama3 too though

Anonymous
08/20/24(Tue)09:52:05 No.101992019

Anonymous 08/20/24(Tue)09:52:05 No.101992019

>>101991997
Yeah but nobody's trying to tune it, so they're not pretending to improve it.

Anonymous
08/20/24(Tue)10:07:55 No.101992222

Anonymous 08/20/24(Tue)10:07:55 No.101992222

File: LLM-history-fancy.png (737 KB, 6277x1302)

737 KB PNG

>>101991975
Corrected and updated.

Anonymous
08/20/24(Tue)10:41:22 No.101992630

Anonymous 08/20/24(Tue)10:41:22 No.101992630

File: 1438271983159.jpg (149 KB, 500x608)

149 KB JPG

Let's play a game! This Saturday at 1 PM PT, I will do a collaborative storytelling/RP session (location TBD, maybe in the thread itself?), where I post a scenario and responses from the model in the thread, and people discuss what to do in the user chat turns, or edit previous user turns or the system prompt and start over. This is going to be both for fun and to get us (mostly) reproducible reference logs, as I'll be using greedy sampling in Mikupad and have the full log in a pastebin at the end. No editing the model's responses, we're going to use pure prompting to try and get the thing to do what we want!

The scenario is also still TBD. We're going to go for as long a context as possible until the model breaks down uncontrollably, so it should be a complex enough scenario for that. If anyone has suggestions for scenarios I'm all ears. Also, I'm planning on starting these games with Mistral Nemo at Q8 for the first session, and other models in the future, so we have reference logs available for a whole range. But I'll take suggestions for models people want. I'm only a 36 GB VRAMlet though so I'm a bit limited. I can run larger models up to ~88 GB but it'd be slower. If anyone would like to host any of these games themselves, that has more VRAM to run such larger models at a good speed, please do, and I will step down.

Anonymous
08/20/24(Tue)10:41:28 No.101992632

Anonymous 08/20/24(Tue)10:41:28 No.101992632

>>101992319
>7900XTX is both faster and cheaper, including AI.
Really? Better than 3090? Have anon tried? What's the current state of AMD GPUs?

Anonymous
08/20/24(Tue)10:42:04 No.101992643

Anonymous 08/20/24(Tue)10:42:04 No.101992643

Has anyone created a cringe leaderboard? I want to use the MOST cringe model there is. I don't care what model you think is cringe, I want objective metrics.

Anonymous
08/20/24(Tue)10:51:33 No.101992768

Anonymous 08/20/24(Tue)10:51:33 No.101992768

File: 1549655806676.jpg (75 KB, 1024x683)

75 KB JPG

>101990805
>Russian in filename
>anime.reactor.cc
What in the-
>https://joyreactor.com/post/5896280
>it's real

Anonymous
08/20/24(Tue)10:52:54 No.101992785

Anonymous 08/20/24(Tue)10:52:54 No.101992785

>>101991935
>Chyna
I've never heard of this company before

Anonymous
08/20/24(Tue)10:55:01 No.101992805

Anonymous 08/20/24(Tue)10:55:01 No.101992805

>>101991701
>t. Sad, obsessed little faggot.

Anonymous
08/20/24(Tue)10:55:29 No.101992811

Anonymous 08/20/24(Tue)10:55:29 No.101992811

>>101992768
>stupid poor zigger can't run local models and is butthurt
like pottery, always knew that "local lost" posters were third worlders

Anonymous
08/20/24(Tue)10:56:45 No.101992824

Anonymous 08/20/24(Tue)10:56:45 No.101992824

>>101992785
Yi, Qwen, GLM, Deepseek?

Anonymous
08/20/24(Tue)10:57:03 No.101992828

Anonymous 08/20/24(Tue)10:57:03 No.101992828

>>101992630
ok sure. i will be here.

Anonymous
08/20/24(Tue)10:57:30 No.101992834

Anonymous 08/20/24(Tue)10:57:30 No.101992834

are mradermacher's quants worth downloading?
vaguely recall some posts a while back about his quants being fucked up but sadly they're the only IQ4_XS options for some models

Anonymous
08/20/24(Tue)10:58:05 No.101992845

Anonymous 08/20/24(Tue)10:58:05 No.101992845

>>101992824
yeah i know all these, but not Chyna

Anonymous
08/20/24(Tue)10:59:55 No.101992872

Anonymous 08/20/24(Tue)10:59:55 No.101992872

>>101992834
I always check his quants for NaNs before using them.

Anonymous
08/20/24(Tue)11:03:28 No.101992916

Anonymous 08/20/24(Tue)11:03:28 No.101992916

File: chyna.jpg (16 KB, 335x335)

16 KB JPG

>>101992845
Chyna, folks, Chyna. Let me tell you, it's a big country, huge! Over a billion people, can you believe that? They've got these massive cities, like Beijing and Shanghai, with skyscrapers going up all over the place. They're building things left and right, I mean, they're really good at building things.

But, you know, we've got to be careful with Chyna. They've been taking advantage of us on trade for years, stealing our jobs, ripping us off. We need to get tough with them, negotiate better deals, bring those jobs back home. Make America great again, right?

But hey, Chyna's a big player on the world stage, you can't ignore them. We'll deal with them, believe me, we'll deal with them.

Anonymous
08/20/24(Tue)11:03:56 No.101992923

Anonymous 08/20/24(Tue)11:03:56 No.101992923

>>101992845
Ching-ling, anon...

Anonymous
08/20/24(Tue)11:04:25 No.101992932

Anonymous 08/20/24(Tue)11:04:25 No.101992932

>>101991923
>some even pride themselves on giving money to leather jacket man
Though almost everyone buys used GPUs from miners

Anonymous
08/20/24(Tue)11:06:42 No.101992965

Anonymous 08/20/24(Tue)11:06:42 No.101992965

>>101992932
buying a used GPU takes it off the market, so the next person who wants to buy one has one less available
demand is demand, it all benefits nvidia in the end

Anonymous
08/20/24(Tue)11:09:52 No.101993017

Anonymous 08/20/24(Tue)11:09:52 No.101993017

File: dziewczyna-o-niebieskich-(...).jpg (117 KB, 1920x1080)

117 KB JPG

>>101992811
Nay. I can say with confidence that local is dead because I CAN run local models, and they all suck.

Anonymous
08/20/24(Tue)11:10:49 No.101993034

Anonymous 08/20/24(Tue)11:10:49 No.101993034

>>101992222
neat chart

Anonymous
08/20/24(Tue)11:11:31 No.101993043

Anonymous 08/20/24(Tue)11:11:31 No.101993043

>>101992222
You forgot Mistral Nemo

Anonymous
08/20/24(Tue)11:11:58 No.101993048

Anonymous 08/20/24(Tue)11:11:58 No.101993048

File: 39_05488_.png (949 KB, 1280x720)

949 KB PNG

>>101993017
picrel and also nice filename lmao

Anonymous
08/20/24(Tue)11:15:24 No.101993088

Anonymous 08/20/24(Tue)11:15:24 No.101993088

>>101992222
Sad that people hate on Llama 3.0 so much. It was essentially an early preview release. Maybe Zucc shouldn't have pushed them to release it early.

Anonymous
08/20/24(Tue)11:15:51 No.101993092

Anonymous 08/20/24(Tue)11:15:51 No.101993092

>Hermes-3-Llama-3.1-70B
not very good. it writes weird, like a list in paragraph form.
>he got up and walked to the kitchen. he grabbed a glass. he filled the glass with water. he sat back down on the couch with the glass. he took a sip of the water.
i dunno what that type of writing is called but it sucks, and it gets like that with other words/phrases

Anonymous
08/20/24(Tue)11:17:37 No.101993112

Anonymous 08/20/24(Tue)11:17:37 No.101993112

>>101993088
quite a few say 3.1 is even worse so, i'm not sure

Anonymous
08/20/24(Tue)11:18:24 No.101993123

Anonymous 08/20/24(Tue)11:18:24 No.101993123

>>101992916
Will there be a chink model that breaks through to the big dog league on lmsys? Deepseek-coder already did in it's specific niche, but the highest ranking general model is Yi-Large and it's lower than 70B3.1.

Anonymous
08/20/24(Tue)11:19:39 No.101993137

Anonymous 08/20/24(Tue)11:19:39 No.101993137

File: Screenshot 2024-08-20 111825.png (19 KB, 546x104)

19 KB PNG

I fucking hate it when model makers don't put the B count in the name. Like, it's not gonna make me download it because I can't SEE the 8B, I'm just gonna get to the page, see the filesize, then leave.

Anonymous
08/20/24(Tue)11:22:25 No.101993174

Anonymous 08/20/24(Tue)11:22:25 No.101993174

>>101991238
There's no 'official' 5.5bpw quant so I had to make my own for my 96gb vramlet build. It's done now so I'll try it later.

Anonymous
08/20/24(Tue)11:22:32 No.101993176

Anonymous 08/20/24(Tue)11:22:32 No.101993176

>>101993137
Got it, I'll add a bunch of unused tensors with random garbage in the future.

Anonymous
08/20/24(Tue)11:23:33 No.101993190

Anonymous 08/20/24(Tue)11:23:33 No.101993190

>>101993088
So, where's the final llama3? 3.1 was even worse, local would have been dead if it weren't for Mistral saving us.

Anonymous
08/20/24(Tue)11:24:03 No.101993201

Anonymous 08/20/24(Tue)11:24:03 No.101993201

>>101993176
I'm not saying don't MAKE 8B tunes, I'm saying label them as such so I don't have to waste my time and the people who want 8B tunes can find them more easily.

Anonymous
08/20/24(Tue)11:25:39 No.101993224

Anonymous 08/20/24(Tue)11:25:39 No.101993224

File: dusk.png (45 KB, 1287x380)

45 KB PNG

>>101993137
You don't check the source repos, i take it...

Anonymous
08/20/24(Tue)11:29:51 No.101993283

Anonymous 08/20/24(Tue)11:29:51 No.101993283

>>101993123
>Bans selling GPUs to China
>Where is mah model
Burgers, so stoopid

Anonymous
08/20/24(Tue)11:30:03 No.101993287

Anonymous 08/20/24(Tue)11:30:03 No.101993287

>>101991923
its not nvidias fault everyone else sucks

Anonymous
08/20/24(Tue)11:31:24 No.101993308

Anonymous 08/20/24(Tue)11:31:24 No.101993308

>>101993190
>>101993112
It was smarter than 3.0 in my testing. Mistral's cool but but to be fair it's probable they wouldn't have released stuff like Large if Llama 405B and the Llama series did not exist.

Anonymous
08/20/24(Tue)11:35:16 No.101993379

Anonymous 08/20/24(Tue)11:35:16 No.101993379

File: 1718458864628527.png (581 KB, 1028x498)

581 KB PNG

>>101993283
Blame Sam "Track all AI-capable GPUs and limit the export of AI-capable hardware" Altman, leader of the official government-backed American AI ethics council (to which he did not invite any company that deals with open source for good reason). He's the one making the laws, the government is his puppet in these matters.

Anonymous
08/20/24(Tue)11:36:13 No.101993396

Anonymous 08/20/24(Tue)11:36:13 No.101993396

>>101992932
a used mining card is better than a used gaming card on average as miners took better care of the cards & constant workload means no thermal cycling.
also no one was mining on 3090s

Anonymous
08/20/24(Tue)11:36:14 No.101993397

Anonymous 08/20/24(Tue)11:36:14 No.101993397

>>101993287
Yeah, they sure have nothing to do with it
https://www.tomshardware.com/pc-components/gpus/rival-firm-says-nvidias-ai-customers-are-scared-to-be-seen-courting-other-ai-chipmakers-for-fear-of-retaliatory-shipment-delays-report
https://www.tomshardware.com/pc-components/gpus/nvidia-bans-using-translation-layers-for-cuda-software-to-run-on-other-chips-new-restriction-apparently-targets-zluda-and-some-chinese-gpu-makers

Anonymous
08/20/24(Tue)11:38:52 No.101993444

Anonymous 08/20/24(Tue)11:38:52 No.101993444

>>101993137
surprised you got that far. As soon as I saw bartowski I did a 360 and walked away.

Anonymous
08/20/24(Tue)11:39:09 No.101993449

Anonymous 08/20/24(Tue)11:39:09 No.101993449

>>101991076
Dude I don't know how to tell you, but benchmakrs only show how good the model is at being good at benchmarks and anything else is just a coincidence.

Anonymous
08/20/24(Tue)11:40:16 No.101993472

Anonymous 08/20/24(Tue)11:40:16 No.101993472

How long until I get a model that is good enough to coom at?

Anonymous
08/20/24(Tue)11:40:42 No.101993476

Anonymous 08/20/24(Tue)11:40:42 No.101993476

>>101993444
huh?

Anonymous
08/20/24(Tue)11:40:51 No.101993477

Anonymous 08/20/24(Tue)11:40:51 No.101993477

>>101993396
Every 3090 I bought in Japan had been previously used for mining.

Anonymous
08/20/24(Tue)11:41:19 No.101993487

Anonymous 08/20/24(Tue)11:41:19 No.101993487

>>101993472
monday, 3pm

Anonymous
08/20/24(Tue)11:46:08 No.101993561

Anonymous 08/20/24(Tue)11:46:08 No.101993561

>>101993397
It's surprising that they manage to do this, especially given the reality of the AI race. NVIDIA's monopoly stifles competition and ultimately harms the West more than China

Anonymous
08/20/24(Tue)11:46:29 No.101993566

Anonymous 08/20/24(Tue)11:46:29 No.101993566

>70B 4bit quant
how slow is it to run on a single 3090 + cpu?

Anonymous
08/20/24(Tue)11:47:15 No.101993582

Anonymous 08/20/24(Tue)11:47:15 No.101993582

>>101993566
About 1-2T/s slow

Anonymous
08/20/24(Tue)11:48:56 No.101993617

Anonymous 08/20/24(Tue)11:48:56 No.101993617

>>101993582
Huh. NTA, but effectively the same as a 3060 + RAM. It's amazing how much of an oppressive bottleneck CPU offloading is.

Anonymous
08/20/24(Tue)11:49:15 No.101993624

Anonymous 08/20/24(Tue)11:49:15 No.101993624

>>101993566
I think I got like 1.5t/s using a 3090 + DDR5-6000 RAM running llama2-70b q4 when I was still stuck with a poverty build like that.

Anonymous
08/20/24(Tue)11:49:36 No.101993632

Anonymous 08/20/24(Tue)11:49:36 No.101993632

>>101993472
Depends on how long it takes you to download Mistral Large.
The real question is how long until there is a model that doesn't need excessive amounts of handholding.

Anonymous
08/20/24(Tue)11:49:53 No.101993639

Anonymous 08/20/24(Tue)11:49:53 No.101993639

>>101992632
>What's the current state of AMD GPUs?
Still tensorcorelets, at least on RDNA. That's *a lot* of specialized TFLOPS you're giving up.

Anonymous
08/20/24(Tue)11:56:47 No.101993766

Anonymous 08/20/24(Tue)11:56:47 No.101993766

File: wew.gif (674 KB, 474x498)

674 KB GIF

Am I missing something with this character ai bullshit btw?

I remember some anons shilling it nonstop these past few weeks, I finally tried it out and it's not only filtered (lol) but the bot responses are literally 12b model tier.

Is it just their responses being shorter (literal prompt issue) or am I missing a special bot on that cringe website that everyone uses?

I tried this cat woman one and it's literally just GPTisms out the ass and similar type responses to most 12Bs i've used. In fact, i'm sure the model they use is 12B

Anonymous
08/20/24(Tue)11:58:55 No.101993803

Anonymous 08/20/24(Tue)11:58:55 No.101993803

newfag or bait?

Anonymous
08/20/24(Tue)11:59:09 No.101993807

Anonymous 08/20/24(Tue)11:59:09 No.101993807

>>101993766
lmao
gottem

Anonymous
08/20/24(Tue)12:00:01 No.101993823

Anonymous 08/20/24(Tue)12:00:01 No.101993823

>>101993766
You need a certain level of intellect to appreciate Character AI's genius.

Character AI has soul the likes of which local could never hope to match

Anonymous
08/20/24(Tue)12:03:39 No.101993888

Anonymous 08/20/24(Tue)12:03:39 No.101993888

>>101993766
When people talk about c.ai they usually mean the early era of peak soul before it got lobotomized to hell.

Anonymous
08/20/24(Tue)12:04:11 No.101993902

Anonymous 08/20/24(Tue)12:04:11 No.101993902

>>101993766
You're over a year late. Missed it.

Anonymous
08/20/24(Tue)12:04:40 No.101993913

Anonymous 08/20/24(Tue)12:04:40 No.101993913

>>101993888
Brain issue, it's still light-years better than any local model.

Anonymous
08/20/24(Tue)12:05:05 No.101993920

Anonymous 08/20/24(Tue)12:05:05 No.101993920

File: 1724137438609531.png (543 KB, 512x768)

543 KB PNG

Given that prompt processing is batched and fast, even with offloading, is it technically possible to process prompt using FP16 and then generate tokens with lower quants for potentially better responses?

Anonymous
08/20/24(Tue)12:05:50 No.101993937

Anonymous 08/20/24(Tue)12:05:50 No.101993937

>>101993766
12b models are so focused on that they legit outperform every 30b model in my experience.

CAI is defo smarter, I figure it's on a 70b or some shit because its responses are clearly more "aware" I guess is how I would put it, but yea, simply implying that CAI sucks because it gives "12b" responses isn't saying much when 12bs, at least in ERP, outperform a lot of higher models nowadays.

Especially the older ones, I don't give a fuck how many people swear by Command R, when it comes to cooming it's not only far slower due to the RAM context eats up but it's unironically no better than Nemo or other smaller models

Anonymous
08/20/24(Tue)12:06:41 No.101993952

Anonymous 08/20/24(Tue)12:06:41 No.101993952

>>101993920
no

Anonymous
08/20/24(Tue)12:07:09 No.101993960

Anonymous 08/20/24(Tue)12:07:09 No.101993960

>>101993920
>is it technically possible to process prompt using FP16 and then generate tokens with lower quants for potentially better responses?
that's kind of the _L / robert zeroww thing no? not quanting some parts of the model and keeping them at f16

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/20/24(Tue)12:08:53 No.101993987

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/20/24(Tue)12:08:53 No.101993987

>>101993920
In principle yes but I think it's just not worth the effort and added complexity vs. lots of other improvements that could still be made.

>>101993960
No, I think that one just kept the output tensor at FP16 instead of q8_0.

Anonymous
08/20/24(Tue)12:08:59 No.101993989

Anonymous 08/20/24(Tue)12:08:59 No.101993989

>>101993960
No, it's more akin to having a full FP16 version of the same model and using it solely for prompt processing.

Anonymous
08/20/24(Tue)12:09:59 No.101994006

Anonymous 08/20/24(Tue)12:09:59 No.101994006

File: Untitled.jpg (83 KB, 982x245)

83 KB JPG

>>101993766
its beyond pozzed which makes reading the complaints amusing https://old.reddit.com/r/CharacterAI/

Anonymous
08/20/24(Tue)12:11:48 No.101994034

Anonymous 08/20/24(Tue)12:11:48 No.101994034

>>101993823
Sounds like temperature above 1.5

Anonymous
08/20/24(Tue)12:12:23 No.101994049

Anonymous 08/20/24(Tue)12:12:23 No.101994049

>>101993987
It depends on the outcome. I can imagine FP16 + Q2 potentially surpassing Q5 in short answers, given that most of the thinking occurs during prompt processing

Anonymous
08/20/24(Tue)12:15:38 No.101994116

Anonymous 08/20/24(Tue)12:15:38 No.101994116

>>101994006
I had a lot of fun trying to coax bots into fetish stuff with it in the past, but now it's so dumb and strict that's it's not worth it.

llama.cpp CUDA dev !!OM2Fp6Fn93S
08/20/24(Tue)12:17:17 No.101994149

llama.cpp CUDA dev !!OM2Fp6Fn93S 08/20/24(Tue)12:17:17 No.101994149

>>101994049
Yeah but especially on newer hardware the prompt processing speed falls off pretty hard if you can't fit the entire model.
And if the answers are short anyways, why not just use FP16 for everything?

Anonymous
08/20/24(Tue)12:19:16 No.101994184

Anonymous 08/20/24(Tue)12:19:16 No.101994184

>>101993920
Interesting idea, but at very long contexts, the prompt processing would get pretty painful if you are moving between chats or you want to modify something early in context. I think potentially there could be a middleground instead, where we store a copy of particularly "important" layers at higher precision on RAM, and process the prompt using those, while token gen uses the lower precisions stored in VRAM. Though as cuda dev says this type of idea is adding more complexity and I don't think anyone's going to do it.

Anonymous
08/20/24(Tue)12:20:47 No.101994217

Anonymous 08/20/24(Tue)12:20:47 No.101994217

>>101994116
it seems so bad if you just used kobold in adventure mode (shorter responses) with a tiny 8b you'd get better responses

Anonymous
08/20/24(Tue)12:21:04 No.101994222

Anonymous 08/20/24(Tue)12:21:04 No.101994222

File: ComfyUI_00969_.png (1.3 MB, 1256x1024)

1.3 MB PNG

>>101993766
You got trolled by unironic and ironic schizos like >>101993913 or this >>101993823
Lurk five more years before coming here again

Anonymous
08/20/24(Tue)12:24:17 No.101994278

Anonymous 08/20/24(Tue)12:24:17 No.101994278

>>101994184
We can store processed prompts alongside chat history
>or you want to modify something early in context.
I hardly ever done this with a long chat log.

Anonymous
08/20/24(Tue)12:33:44 No.101994437

Anonymous 08/20/24(Tue)12:33:44 No.101994437

File: 1724139161767763.png (520 KB, 512x768)

520 KB PNG

>>101994184
To test this idea, all that's needed is the ability to save and load processed prompts. Then, we can process lengthy prompts using FP16, load Q2 with a previously processed prompt, and check if it improves the response.

Anonymous
08/20/24(Tue)12:33:53 No.101994443

Anonymous 08/20/24(Tue)12:33:53 No.101994443

>>101993920
This would not really work well for new chats. If you are using the lower quant to generate new tokens, then basically all tokens in that chat from the model's responses are going to be from the lower quant anyway.

Anonymous
08/20/24(Tue)12:36:27 No.101994488

Anonymous 08/20/24(Tue)12:36:27 No.101994488

Anyone tried multi-device local models? Want to have a local AI on my watch/laptop that's actually running on a bunch of graphics cards in my basement.

Anonymous
08/20/24(Tue)12:36:45 No.101994493

Anonymous 08/20/24(Tue)12:36:45 No.101994493

>>101994443
It will comprehend a character's card more effectively. You may also use chat examples, either generated using FP16 or written manually.

Anonymous
08/20/24(Tue)12:40:04 No.101994538

Anonymous 08/20/24(Tue)12:40:04 No.101994538

>>101994488
Most (all?) inference programs have some sort of API already. New in the subject?

Anonymous
08/20/24(Tue)12:40:29 No.101994546

Anonymous 08/20/24(Tue)12:40:29 No.101994546

>>101994493
The model's responses are still important. I don't know how many people are that happy with low context chats. And if they are then they might not be people that care all that much about stuff like this anyway.

Anonymous
08/20/24(Tue)12:42:56 No.101994584

Anonymous 08/20/24(Tue)12:42:56 No.101994584

>>101994546
It's uncertain whether it will have a negligible or significant impact on the outcome. No one can know.

Anonymous
08/20/24(Tue)12:44:05 No.101994601

Anonymous 08/20/24(Tue)12:44:05 No.101994601

>>101994278
>We can store processed prompts alongside chat history
We are already short on VRAM and in some cases even RAM. It'd probably be helpful if we could tell Llama.cpp to store and recall contexts from a save folder or something, but I'm guessing there are some nasty complications that would make that not possible easily.
>I hardly ever done this with a long chat log.
Maybe for you. I like modifying the card/system prompt during chats.

Anonymous
08/20/24(Tue)12:45:55 No.101994638

Anonymous 08/20/24(Tue)12:45:55 No.101994638

File: 0fa.jpg (1.05 MB, 3264x2448)

1.05 MB JPG

Is Nemo unironically the only under 70b model worth running?

I feel like i've went through everything, Qwen, Gemma, Llama, Command R, yi, mixtral but I always find myself going back to Nemo.

I just wish the bigger finetunes weren't so fucking horny, instruct

Anonymous
08/20/24(Tue)12:46:15 No.101994644

Anonymous 08/20/24(Tue)12:46:15 No.101994644

>>101994601
Obviously, I was talking about storing it on an SSD

Anonymous
08/20/24(Tue)12:47:23 No.101994666

Anonymous 08/20/24(Tue)12:47:23 No.101994666

>>101994437
You can do that with llama.cpp can't you? Save the processed prompt to a file for each slot.

Anonymous
08/20/24(Tue)12:47:26 No.101994667

Anonymous 08/20/24(Tue)12:47:26 No.101994667

>>101994638
basically, yeah
the finetunes are too horny and never decline
base nemo sits at a great point of reasonable-ness for fun assistant/RP

Anonymous
08/20/24(Tue)12:47:32 No.101994671

Anonymous 08/20/24(Tue)12:47:32 No.101994671

>>101994638
Yes. Unironically.

Anonymous
08/20/24(Tue)12:48:32 No.101994692

Anonymous 08/20/24(Tue)12:48:32 No.101994692

>>101994638
pretty much, yep, without irony

Anonymous
08/20/24(Tue)12:51:46 No.101994745

Anonymous 08/20/24(Tue)12:51:46 No.101994745

>>101994667
Base nemo is just so dry though, I don't get how people prefer it for ERP.

Even if the finetunes are overly horny, usually with good prompting/cards you can find a good balance. Base nemo just snoozes me

Anonymous
08/20/24(Tue)12:52:58 No.101994763

Anonymous 08/20/24(Tue)12:52:58 No.101994763

>>101994584
Not exactly, but think about it again. You're proposing a hypothetical scenario where someone uses FP16+Q2 over say just a simple Q4. If half the chat is processed by the FP16 and half the Q2, the resulting intelligence is, very likely, something in-between anyway, which likely wouldn't be far from Q4, but you're making the work on the backend more complicated which isn't a good thing for long-term development, and you're giving up actually a pretty significant amount of prompt processing speed which still sucks for people who use models differently than you.

Anonymous
08/20/24(Tue)12:54:01 No.101994775

Anonymous 08/20/24(Tue)12:54:01 No.101994775

>>101994745
just a taste thing, im an ESL so the dryness doesn't register too much with me

Anonymous
08/20/24(Tue)12:56:09 No.101994804

Anonymous 08/20/24(Tue)12:56:09 No.101994804

>>101994644
Like I said > It'd probably be helpful if we could tell Llama.cpp to store and recall contexts from a save folder or something, but I'm guessing there are some nasty complications that would make that not possible easily.

Anonymous
08/20/24(Tue)13:04:57 No.101994937

Anonymous 08/20/24(Tue)13:04:57 No.101994937

>>101994763
> over say just a simple Q4
You'll need 4x3090 for largestral at Q4. Imagine running it on just 2, but with better responses.
>very likely
Or not likely at all. Who knows.
>you're making the work on the backend more complicated
Same can be said about CPU offloading as well. Anyway, I'm just tossing idea that stuck in my head. It might work well, or it might not. There's no way to know until someone give it a try.

Anonymous
08/20/24(Tue)13:05:50 No.101994949

Anonymous 08/20/24(Tue)13:05:50 No.101994949

I think the context saving idea could be worth it. Would there be any issues with this?
>specify a folder for saving contexts to
>specify a maximum number of contexts to save, so older contexts get thrown away and therefore you can limit how much space it takes up on your drive
>specify if you want to save contexts to begin with
>when enabled, the program automatically saves contexts to the folder, along with metadata containing the actual text prompt and the model itself (Llama 3.1 70B Q8_0.gguf etc)
>and when a prompt to process is received, the program checks if there is a match between the prompt and any of the saved contexts, and a match with the model, and if there is, uses the saved context

Anonymous
08/20/24(Tue)13:18:07 No.101995118

Anonymous 08/20/24(Tue)13:18:07 No.101995118

>>101994937
>You'll need 4x3090 for largestral at Q4. Imagine running it on just 2, but with better responses.
I was thinking more someone using a Q4 of something smaller, so the FP16+Q2 would have some of the FP16 held in VRAM, which would help prompt processing. If you are using purely FP16 in RAM, that is a significant loss in prompt processing speed.

And I disagree with the idea of "Who knows." It is always necessary to have an estimate of the real gains and confidence of that before "trying" something which you could spend time on that would've been better spent elsewhere. Cuda dev does not need to be trying every little idea someone has that would result in questionable gains for trade-offs in other areas.

>Same can be said about CPU offloading as well.
No, because that's an essential feature of Llama.cpp. What you are proposing is something added on top (of something that is already complex).

Anonymous
08/20/24(Tue)13:18:49 No.101995129

Anonymous 08/20/24(Tue)13:18:49 No.101995129

>>101993043
No one uses that.

Anonymous
08/20/24(Tue)13:24:31 No.101995240

Anonymous 08/20/24(Tue)13:24:31 No.101995240

>>101993617
I have even less and get those same speeds. Crazy that even tripling my vram would get practically zero speedup. I guess the real speed comes when you get more than 70% of the model into vram.

Anonymous
08/20/24(Tue)13:25:11 No.101995250

Anonymous 08/20/24(Tue)13:25:11 No.101995250

i think i prefer normal rep pen over dry. dry settings were the default of 0.8 multi, 1.75 base, 2 len. rep pen is 1.05, 2048 length for 16k context with various l3.1 tunes. dry was fine on l2 70b tunes but it doesn't cut it for l3.1 there is just to much repetition, and if you mix dry/rep pen it starts to mess up text. i've gone back to rep pen for now.

Anonymous
08/20/24(Tue)13:26:11 No.101995266

Anonymous 08/20/24(Tue)13:26:11 No.101995266

is there a 405b llava yet

Anonymous
08/20/24(Tue)13:27:18 No.101995293

Anonymous 08/20/24(Tue)13:27:18 No.101995293

File: offload_x_performance.png (96 KB, 1536x1152)

96 KB PNG

>>101995240
>I guess the real speed comes when you get more than 70% of the model into vram.
yup

Anonymous
08/20/24(Tue)13:30:22 No.101995337

Anonymous 08/20/24(Tue)13:30:22 No.101995337

>>101995250
have you tried turning rep pen off completely without dry, i run zero rep pen, no dry, no freq and only get sentence repetition on the shitest of tunes

Anonymous
08/20/24(Tue)13:33:24 No.101995389

Anonymous 08/20/24(Tue)13:33:24 No.101995389

>>101995293
Based on that you'd still think you'd go from 1.7T/s at 20% (8gb) to higher than 2 at 60% (24gb) but maybe the 70b curve is even flatter at the start.

Anonymous
08/20/24(Tue)13:35:30 No.101995416

Anonymous 08/20/24(Tue)13:35:30 No.101995416

>>101995337
yeah in the past, but its been a while since i did that. i feel you need a tiny penalty, but once you set something to high you'll notice spelling errors and messed up text. with dry on l3.1 i get so much repittion i wouldn't be surprised if it wasn't working at all (i saw that comment about 1.73 but this has been going on since they implemented dry for me). i'm still trying l3.1 but now with just rep pen and repetitive, but not as bad

Anonymous
08/20/24(Tue)13:37:50 No.101995451

Anonymous 08/20/24(Tue)13:37:50 No.101995451

>>101995416
i'm wondering if having any penalty at all somehow might increase some kinds of repetition personally

Anonymous
08/20/24(Tue)13:40:23 No.101995494

Anonymous 08/20/24(Tue)13:40:23 No.101995494

>>101995451
Maybe, I had the best results with 3.1 (70b) with neutral samplers. As soon as I messed with anything it started being kinda repetitive (but only like 16k in). Could just be luck, I only tested it a few times.

Anonymous
08/20/24(Tue)13:42:09 No.101995524

Anonymous 08/20/24(Tue)13:42:09 No.101995524

>>101995451
i'll try it with minp 0.05, no temp anything else

Anonymous
08/20/24(Tue)13:42:17 No.101995529

Anonymous 08/20/24(Tue)13:42:17 No.101995529

>>101995494
Is hitting "neutralize samplers" actually enough to get the samplers neutral? I notice some of them are still on after hitting it.

Anonymous
08/20/24(Tue)13:43:32 No.101995551

Anonymous 08/20/24(Tue)13:43:32 No.101995551

>>101995529
for some samplers a number of 1 (or 0 sometimes) is considered off

Anonymous
08/20/24(Tue)13:51:55 No.101995676

Anonymous 08/20/24(Tue)13:51:55 No.101995676

File: st.jpg (67 KB, 358x677)

67 KB JPG

>>101995529
also select zen sliders in st. they look better and show a clearer off state

Anonymous
08/20/24(Tue)13:52:15 No.101995680

Anonymous 08/20/24(Tue)13:52:15 No.101995680

File: 172414380273232.png (398 KB, 396x579)

398 KB PNG

>>101995118
I disagree with your disagreement. Especially in ML, when attempting something never done before, predicting the outcome is impossible. The most important part of the context is at the beginning, where prompts are typically placed and models are trained to adhere to them. If quants mess that part less, maybe responses could actually be better. I'm not asking anyone to try it. All that's required to give it a try is the ability to load and save processed context, also a useful feature in its own right.

Anonymous
08/20/24(Tue)13:56:20 No.101995746

Anonymous 08/20/24(Tue)13:56:20 No.101995746

>testament

Anonymous
08/20/24(Tue)13:57:53 No.101995770

Anonymous 08/20/24(Tue)13:57:53 No.101995770

>>101995746
>Newsflash: it's not

Anonymous
08/20/24(Tue)14:02:03 No.101995842

Anonymous 08/20/24(Tue)14:02:03 No.101995842

>>101995680
You are a retard.

Anonymous
08/20/24(Tue)14:02:28 No.101995851

Anonymous 08/20/24(Tue)14:02:28 No.101995851

>>101995842
No you.

Anonymous
08/20/24(Tue)14:02:34 No.101995853

Anonymous 08/20/24(Tue)14:02:34 No.101995853

>>101995680
When are you going to stop shilling meme models like Midnight Miqu and Wizard?

Anonymous
08/20/24(Tue)14:03:17 No.101995867

Anonymous 08/20/24(Tue)14:03:17 No.101995867

>>101994745
Too low temp. Nemo is an absolute retard but it is not dry at all.

Anonymous
08/20/24(Tue)14:04:18 No.101995881

Anonymous 08/20/24(Tue)14:04:18 No.101995881

>>101995851
You lost. Also program it yourself retard. And test it. And come back and tell us you were a retard all along.

Anonymous
08/20/24(Tue)14:04:33 No.101995884

Anonymous 08/20/24(Tue)14:04:33 No.101995884

>>101995680
>Especially in ML, when attempting something never done before, predicting the outcome is impossible
Yes but ironically it's even more important in modern ML in general to justify experiments over other experiments as they are more costly. Though in this context it's still basically just a programming issue. I'd argue it'd still only be worth it to implement the saving/loading context feature if it can actually easily be done, but we don't know if it is unless you're a contributor or have deep knowledge of the codebase to say whether it would be that easy and there aren't any things that hold it back. Then if it's done, we can coincidentally try your experiment, rather than implementing it primarily so you can do it.

>The most important part of the context is at the beginning, where prompts are typically placed and models are trained to adhere to them
This is important for staying in character and other things in the card but potentially if we are talking about the parts of a context that are important, the middle of the context is still really just as important. People still complain that models forgetting or ignoring what happened in recent replies.

Anonymous
08/20/24(Tue)14:04:56 No.101995889

Anonymous 08/20/24(Tue)14:04:56 No.101995889

>>101995451
its more repetitive, so rep pen at least was helping a little bit. that makes base, instruct, lumimaid, tess 3, and hermes l3.1 70b tunes that just feel odd to rp with. is anyone using any of those and would vouch for them, for rp?

Anonymous
08/20/24(Tue)14:06:55 No.101995919

Anonymous 08/20/24(Tue)14:06:55 No.101995919

Magnum 123b is officially the worst fine-tune of all time.

Anonymous
08/20/24(Tue)14:07:51 No.101995936

Anonymous 08/20/24(Tue)14:07:51 No.101995936

Magnum 123b is officially the best fine-tune of all time.

Anonymous
08/20/24(Tue)14:07:57 No.101995938

Anonymous 08/20/24(Tue)14:07:57 No.101995938

>>101995919
buy an anti-ad

Anonymous
08/20/24(Tue)14:08:35 No.101995948

Anonymous 08/20/24(Tue)14:08:35 No.101995948

>>101995919
>>101995938
Buy an attack ad.

Anonymous
08/20/24(Tue)14:09:10 No.101995954

Anonymous 08/20/24(Tue)14:09:10 No.101995954

>>101995919
>>101995936
Hi Sao, hi Lemmy
>>101995938
Hi Petrus
>>101995948
Hi Miku

Anonymous
08/20/24(Tue)14:09:34 No.101995960

Anonymous 08/20/24(Tue)14:09:34 No.101995960

>>101994745
To me all the current Nemo fine tunes feel like they've been lobotomized compared with base instruct. I haven't tried the non-instruct base model so far. As for dry writing style, in my experience Nemo is really dependent on context for its writing style, so it benefits from something like an ali:chat card or a lot of high quality example dialogue.

I haven't tried it yet, but it also might work well to use one of the fine tunes for a few messages before switching to the base version, since they tend to be more verbose and have more of an intrinsic style to their writing.

Anonymous
08/20/24(Tue)14:09:55 No.101995964

Anonymous 08/20/24(Tue)14:09:55 No.101995964

>>101995954
die undi

Anonymous
08/20/24(Tue)14:10:22 No.101995968

Anonymous 08/20/24(Tue)14:10:22 No.101995968

>>101995948
You just activated my trap ad.

Anonymous
08/20/24(Tue)14:10:23 No.101995969

Anonymous 08/20/24(Tue)14:10:23 No.101995969

>>101995919
it is pretty terrible. when are we gonna stop falling for the 'we ran no tests but it seemed alright' meme finetunes/merges?

Anonymous
08/20/24(Tue)14:11:27 No.101995985

Anonymous 08/20/24(Tue)14:11:27 No.101995985

>>101995867
how's it a retard when it comes to ERP?

I keep needing people to explain when they call models I find surprisingly smart retarded. As I find Nemo for example as smart as any non 70b model (shit, some 70b models are close to it)

It's one of the few models that unironically deserved the hype which almost never happens. Mythomax was the last one I remember

Anonymous
08/20/24(Tue)14:12:31 No.101996005

Anonymous 08/20/24(Tue)14:12:31 No.101996005

>>101995985
>Mythomax
kek

Anonymous
08/20/24(Tue)14:15:24 No.101996041

Anonymous 08/20/24(Tue)14:15:24 No.101996041

>>101995985
surprise prostates. quantum clothes. intense french kisses during blowjobs, absolute complete lack of understanding of what a titfuck is.

Anonymous
08/20/24(Tue)14:24:12 No.101996149

Anonymous 08/20/24(Tue)14:24:12 No.101996149

>>101995985
The more time you spend with models, the higher your standards become. I definitely would have killed a man for something like Nemo is right now, and that's most people's experience, until they've spent time with a cloud model. It's like going from NAI to, say, 3.5 Turbo or Claude 1.

Anonymous
08/20/24(Tue)14:28:56 No.101996206

Anonymous 08/20/24(Tue)14:28:56 No.101996206

>>101996149
My nemo chats, compared to my old C1 logs, feel roughly the same (though it may be because I was shit at prompting back then desu). The only thing lacking from nemo is advanced comprehension for complex scenarios and spatial awareness, as well as context recall.

Anonymous
08/20/24(Tue)14:34:45 No.101996305

Anonymous 08/20/24(Tue)14:34:45 No.101996305

>>101996041
>absolute complete lack of understanding of what a titfuck is.
I'm convinced you guys unironically don't use the models you shit on

Anonymous
08/20/24(Tue)14:35:52 No.101996325

Anonymous 08/20/24(Tue)14:35:52 No.101996325

>>101996149
>brings up cloud models in a local general
>>101992933

Anonymous
08/20/24(Tue)14:40:06 No.101996392

Anonymous 08/20/24(Tue)14:40:06 No.101996392

>>101996325
>doesn't even try to defend local models
lol

Anonymous
08/20/24(Tue)14:44:33 No.101996486

Anonymous 08/20/24(Tue)14:44:33 No.101996486

>>101990712
lmg, your great uncle dies and gives you his 4x 3090s. What model do you run?

Anonymous
08/20/24(Tue)14:47:40 No.101996540

Anonymous 08/20/24(Tue)14:47:40 No.101996540

>>101992222
2k context seems like so long ago... Thanks for making this Timeline Anon

Anonymous
08/20/24(Tue)14:54:19 No.101996625

Anonymous 08/20/24(Tue)14:54:19 No.101996625

>>101996486
Mistral Large

Anonymous
08/20/24(Tue)14:56:08 No.101996644

Anonymous 08/20/24(Tue)14:56:08 No.101996644

>>101996486
I would use that for fine-tuning models, finally attaining my life's goal of becoming a full-time sloptuner

Anonymous
08/20/24(Tue)14:57:54 No.101996669

Anonymous 08/20/24(Tue)14:57:54 No.101996669

>>101996644
Couldn't you only fine tune like 8B at fp16?

Anonymous
08/20/24(Tue)15:01:44 No.101996721

Anonymous 08/20/24(Tue)15:01:44 No.101996721

>>101996669
I believe in qlora supremacy.

Anonymous
08/20/24(Tue)15:02:54 No.101996742

Anonymous 08/20/24(Tue)15:02:54 No.101996742

File: ComfyUI_05952_.png (1.45 MB, 1024x1400)

1.45 MB PNG

>>101990712
>the latest news is almost from a week ago
>it's about some shitty merge
We are truly in the coldest AI winter

Anonymous
08/20/24(Tue)15:05:30 No.101996784

Anonymous 08/20/24(Tue)15:05:30 No.101996784

>>101996742
>>101990805

Anonymous
08/20/24(Tue)15:06:13 No.101996801

Anonymous 08/20/24(Tue)15:06:13 No.101996801

>>101996742
there are no model merge mentioned in the news

Anonymous
08/20/24(Tue)15:10:30 No.101996895

Anonymous 08/20/24(Tue)15:10:30 No.101996895

>>101996742
Bro it's been 4 days.

Anonymous
08/20/24(Tue)15:14:20 No.101996954

Anonymous 08/20/24(Tue)15:14:20 No.101996954

>>101996742
>the least stupid mikufag

Anonymous
08/20/24(Tue)15:18:27 No.101997022

Anonymous 08/20/24(Tue)15:18:27 No.101997022

===MODEL REVIEW===
Tried Magnum 123b. Wasn't bad, but wasn't good either. Wasn't too horny like Unditune and 72b Magnum, wasn't too dry like Tess. Has brain damage like lumimaid, can't follow custom style instructions like the official, just sticks to it's own style. Guess it's okay if you like it. Tradeoff feels a bit pointless though, I was willing to tolerate GPTisms of Largestral as long as they were compensated by intelligence, but now that it's gone, why should I use this tune over CR+ which has a nicer style?

===RANT===
Warning: incoherent schizo rambling.
It feels like all tuners are still stuck in llama2 days while the models have moved on. Nous Research, why the fuck did you tune in refusals into a local model(Yes, I know I can jailbreak, don't "skill issue" me faggot, that's not the point.)? You just wasted your compute on something your userbase doesn't want. I know Undi&co are just incompetent and can't remove them, (see >>101983894) but you just chose not to, you dumb fuck. Why the fuck do tuners not pre-abliterate instruct models if they have to tune on top of them? Appropriate, not moralizing refusals will get tuned back in from the dataset, right? Claude logs this Claude logs that, maybe hire some Kenyans for 2$/h like OpenAI did and screen your fucking dataset for moralizing refusals, or 1b model. Or if you are feeling confident place your dataset on HF and ask users to screen it for you in the title. Name it something like CLAUDE-LOGS-PLEASE-CHECK-FOR-REFUSALS(or -FOR-SLOP). That's actual free labor. I don't know and I don't care about your beef is with [insert name of sloptuner], to me it's just seems like a logical thing to do. LMSYS has also dumped a big dataset recently, why hasn't nobody besides NexusFlow used it? There may be some good data in there. Rant over.
Thanks for reading my rant. No, I am not Petra/Undi/Alpin/Sao/Lemmy/tranny/faggot/nigger/GNU/Linux.

Anonymous
08/20/24(Tue)15:19:41 No.101997046

Anonymous 08/20/24(Tue)15:19:41 No.101997046

Hey /lmg/ how's it going? What's the current best model for 24GB VRAM? I liked mini-magnum for its freshness.

Anonymous
08/20/24(Tue)15:20:19 No.101997061

Anonymous 08/20/24(Tue)15:20:19 No.101997061

how is anthracite going to recover after the magnum 123b flop?

Anonymous
08/20/24(Tue)15:23:50 No.101997141

Anonymous 08/20/24(Tue)15:23:50 No.101997141

>>101997046
Llama 405B Q0.1_K

Anonymous
08/20/24(Tue)15:24:44 No.101997154

Anonymous 08/20/24(Tue)15:24:44 No.101997154

>>101997061
Without any difficulties. If they didn't get kofibucks, they will learn a valuable lesson and try to improve, if they got kofibucks, they will continue as usual.

Anonymous
08/20/24(Tue)15:27:23 No.101997197

Anonymous 08/20/24(Tue)15:27:23 No.101997197

File: 1721316096375374.png (1.38 MB, 966x1024)

1.38 MB PNG

>>101996742
Stop being such a worry wart and fire up that Mag-123B, Miku. It'll make you feel better.

Anonymous
08/20/24(Tue)15:28:35 No.101997221

Anonymous 08/20/24(Tue)15:28:35 No.101997221

wake up safe users!
Phi-3.5 has been released
https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
(16x3.8B)
https://huggingface.co/microsoft/Phi-3.5-mini-instruct
https://huggingface.co/microsoft/Phi-3.5-vision-instruct

Anonymous
08/20/24(Tue)15:29:10 No.101997230

Anonymous 08/20/24(Tue)15:29:10 No.101997230

>>101997141
Magnum it is then

Anonymous
08/20/24(Tue)15:31:52 No.101997269

Anonymous 08/20/24(Tue)15:31:52 No.101997269

>>101997221
>https://huggingface.co/microsoft/Phi-3.5-MoE-instruct
Does it embody the "essence of moe"?

Anonymous
08/20/24(Tue)15:31:54 No.101997270

Anonymous 08/20/24(Tue)15:31:54 No.101997270

>>101997221
Will probably be useless for RP again. It's a shame everyone loves focusing so much on le safe and harmless assistant huh?

Anonymous
08/20/24(Tue)15:33:35 No.101997294

Anonymous 08/20/24(Tue)15:33:35 No.101997294

Magnum 123b is the disappointment of the century.

Anonymous
08/20/24(Tue)15:33:54 No.101997300

Anonymous 08/20/24(Tue)15:33:54 No.101997300

>>101997221
>phi
pure distilled slop (literally)

Anonymous
08/20/24(Tue)15:35:29 No.101997326

Anonymous 08/20/24(Tue)15:35:29 No.101997326

>>101997300
>direct preference optimization to ensure precise instruction adherence and robust safety measures.
>high quality chat format supervised data covering various topics to reflect human preferences on different aspects such as instruct-following, truthfulness, honesty and helpfulness.
>synthetic data and filtered publicly available documents
simply the best data, by all means it should be godly sine it wasn't exposed to trash like most other model

Anonymous
08/20/24(Tue)15:39:28 No.101997384

Anonymous 08/20/24(Tue)15:39:28 No.101997384

>>101997061
>>101997294
Just because I made a rant it doesn't mean you have to dunk on it even more. I get it, Anthracite are your discord rivals or something, but please keep discord shit in discord, okay?

Anonymous
08/20/24(Tue)15:42:58 No.101997438

Anonymous 08/20/24(Tue)15:42:58 No.101997438

How about this: https://huggingface.co/microsoft/Phi-3-medium-128k-instruct-onnx-cuda/tree/main/cuda-fp16 is that garbage too? I mean, MS - it probably is - but who knows maybe it's a hidden gem? D/Ling it now.

Anonymous
08/20/24(Tue)15:44:31 No.101997474

Anonymous 08/20/24(Tue)15:44:31 No.101997474

>>101997221
>16x3.8B
Although the model isn't interesting, it is interesting they went with this config for MoE.

Anonymous
08/20/24(Tue)15:55:02 No.101997645

Anonymous 08/20/24(Tue)15:55:02 No.101997645

>>101990920
>Llama 3.1 70B works well for some users, but has issues with chain of thought for others
I use L3-70b locally and it's much better at details and remembering than any of the 7b/13b models I tried. The only saving grace for the small models is that they're much faster (almost instant).

Roleplaying is far better with L3-70b.

Anonymous
08/20/24(Tue)15:57:49 No.101997687

Anonymous 08/20/24(Tue)15:57:49 No.101997687

>>101997645
Well, I had better luck with 3 vs 3.1, so your experience might not transfer to 3.1.

Anonymous
08/20/24(Tue)16:02:31 No.101997748

Anonymous 08/20/24(Tue)16:02:31 No.101997748

>>101997221
>Training time: 23 days
And yet there's still no bitnet demonstration model.
It really makes you think.

Anonymous
08/20/24(Tue)16:08:24 No.101997821

Anonymous 08/20/24(Tue)16:08:24 No.101997821

>>101997748
yeah it really makes me think it works

Anonymous
08/20/24(Tue)16:08:41 No.101997827

Anonymous 08/20/24(Tue)16:08:41 No.101997827

>>101990712
I have 12GB VRAM and haven't touched a model in about a year. What's good at this size?

Anonymous
08/20/24(Tue)16:10:27 No.101997857

Anonymous 08/20/24(Tue)16:10:27 No.101997857

>>101997821
Every big CEO has his own personal bitnet AI gf.

Anonymous
08/20/24(Tue)16:10:28 No.101997858

Anonymous 08/20/24(Tue)16:10:28 No.101997858

>>101997827
Mistral Nemo 12B
>inb4 another anon recommends finetunes
Try the base model (instruct too!) first, then you can check its finetunes

Anonymous
08/20/24(Tue)16:12:20 No.101997886

Anonymous 08/20/24(Tue)16:12:20 No.101997886

>>101997858
>Try the base model (instruct too!) first
Buy and ad Guillaume

Anonymous
08/20/24(Tue)16:13:15 No.101997904

Anonymous 08/20/24(Tue)16:13:15 No.101997904

>>101997827
mini-magnum.

>>101997858
That should be standard procedure,

Anonymous
08/20/24(Tue)16:13:19 No.101997908

Anonymous 08/20/24(Tue)16:13:19 No.101997908

>>101997858
Thanks anon, are there any differences between the different quant uploaders?

Anonymous
08/20/24(Tue)16:15:15 No.101997929

Anonymous 08/20/24(Tue)16:15:15 No.101997929

>>101997474
It is a no-gpu dream desu. It is a real tragedy it never saw a penis during training.

Anonymous
08/20/24(Tue)16:17:08 No.101997951

Anonymous 08/20/24(Tue)16:17:08 No.101997951

>>101997908
avoid bartowski he recently said he didn't know what hes doing, mradermacher is a much more serious professional guy.

Anonymous
08/20/24(Tue)16:18:01 No.101997961

Anonymous 08/20/24(Tue)16:18:01 No.101997961

>>101997827
You can fit a Q4 quant of Nemo or one of its fine tunes on 12gb vram with 16k context for a pretty decent experience.

I generally prefer the vanilla version of Nemo Instruct, but mini-magnum, magnum 12b v2, and nemoremix are decent among the fine tunes I've tested.

Anonymous
08/20/24(Tue)16:18:02 No.101997962

Anonymous 08/20/24(Tue)16:18:02 No.101997962

>>101997857
i wouldnt share either

Anonymous
08/20/24(Tue)16:18:41 No.101997971

Anonymous 08/20/24(Tue)16:18:41 No.101997971

anyone got the L3.1 4B inferencing? how is it?

Anonymous
08/20/24(Tue)16:20:49 No.101998003

Anonymous 08/20/24(Tue)16:20:49 No.101998003

>>101997951
Is this the new bait?

Anonymous
08/20/24(Tue)16:21:52 No.101998024

Anonymous 08/20/24(Tue)16:21:52 No.101998024

>>101997951
does it even matter for running quants?

Anonymous
08/20/24(Tue)16:22:03 No.101998027

Anonymous 08/20/24(Tue)16:22:03 No.101998027

>>101997748
It is pretty obvious at this point that nvidia has some microcode that detects any attempt of training bitnet and intentionally adds errors that make training convergence impossible. Only nvidia loses when bitnet happens.

Anonymous
08/20/24(Tue)16:25:51 No.101998068

Anonymous 08/20/24(Tue)16:25:51 No.101998068

>>101997221
gguf when

Anonymous
08/20/24(Tue)16:28:07 No.101998090

Anonymous 08/20/24(Tue)16:28:07 No.101998090

>>101997904
>>101997951
>>101997961
thanks bros, seems like i can clear a lot of obsolete models of my disks.

Anonymous
08/20/24(Tue)16:29:20 No.101998103

Anonymous 08/20/24(Tue)16:29:20 No.101998103

What the fuck is washcloth? Why every model wants to wash me with a fucking rag instead of soapy hands or sponge? Is that some american thing I don't understand?

Anonymous
08/20/24(Tue)16:32:36 No.101998131

Anonymous 08/20/24(Tue)16:32:36 No.101998131

>>101997961
>Q4 and 16k context
nta but still kinda new to this, why you prioritize context over quants? I'm running Nemo with a Q6KL quant with 8k context in 12gb vram

Anonymous
08/20/24(Tue)16:33:23 No.101998144

Anonymous 08/20/24(Tue)16:33:23 No.101998144

>>101997221
Does this have GQA this time?

Anonymous
08/20/24(Tue)16:34:12 No.101998151

Anonymous 08/20/24(Tue)16:34:12 No.101998151

>>101998131
Because he is retarded, don't listen to him anon.

Anonymous
08/20/24(Tue)16:35:07 No.101998170

Anonymous 08/20/24(Tue)16:35:07 No.101998170

File: gguf when.jpg (125 KB, 1185x499)

125 KB JPG

>>101997221
Bros....

Anonymous
08/20/24(Tue)16:35:33 No.101998177

Anonymous 08/20/24(Tue)16:35:33 No.101998177

>>101998170
>robustness
What is that metric even measuring?

Anonymous
08/20/24(Tue)16:36:36 No.101998188

Anonymous 08/20/24(Tue)16:36:36 No.101998188

>>101998177
It measures robustness.

Anonymous
08/20/24(Tue)16:37:16 No.101998202

Anonymous 08/20/24(Tue)16:37:16 No.101998202

>>101997221
nala test please

Anonymous
08/20/24(Tue)16:37:38 No.101998210

Anonymous 08/20/24(Tue)16:37:38 No.101998210

>>101998103
There was a time before your time. Is it a fantasy or old timey setting? It'd make sense then.

Anonymous
08/20/24(Tue)16:37:48 No.101998211

Anonymous 08/20/24(Tue)16:37:48 No.101998211

>>101997326
>robust safety measures.
>>101998177
>What is that metric even measuring?

Anonymous
08/20/24(Tue)16:38:37 No.101998225

Anonymous 08/20/24(Tue)16:38:37 No.101998225

>>101998131
Using it for roleplay, I like having the extra room in context for more complex cards and more chat history, but that's going to vary a lot from person to person. I think the quality loss going down to Q4 is worth it, but that's a matter of preference. I've also played around with q6 + no kv offload, which is slower but still pretty usable.

Anonymous
08/20/24(Tue)16:39:36 No.101998239

Anonymous 08/20/24(Tue)16:39:36 No.101998239

>>101998170
What does this mean.

Does this mean that the 16x3.8b one is gonna be the Nemo for 24GB cards now?

Anonymous
08/20/24(Tue)16:41:41 No.101998274

Anonymous 08/20/24(Tue)16:41:41 No.101998274

>>101998239
no because phi is probably the worst model series for roleplay, by design, trained on only academic and synthetic safe data.

Anonymous
08/20/24(Tue)16:41:59 No.101998279

Anonymous 08/20/24(Tue)16:41:59 No.101998279

>>101998003
We've had such an influx of retards and schizos this past week, it's hard to tell if posts like the one you're responding to are actual bait, stupidity, or just another attempt by disturbed forever-alone anons like this guy >>101990805 to shit up the thread.

Anonymous
08/20/24(Tue)16:42:08 No.101998284

Anonymous 08/20/24(Tue)16:42:08 No.101998284

>>101998274
into the trash

Anonymous
08/20/24(Tue)16:42:16 No.101998289

Anonymous 08/20/24(Tue)16:42:16 No.101998289

Can you call off the Slavic catastrophe for a bit?

Anonymous
08/20/24(Tue)16:42:52 No.101998297

Anonymous 08/20/24(Tue)16:42:52 No.101998297

File: 1723435460204937.jpg (28 KB, 736x709)

28 KB JPG

>trying to use gemma
>it writes a bit of text that's pretty good and then goes HERE'S WHAT HAPPENED NEXT! and it offers me like 2 or 3 choices, each with a title and a description

Why is this happening. How do I make it stop

EXTREMELY IMPORTANT
08/20/24(Tue)16:44:16 No.101998319

EXTREMELY IMPORTANT 08/20/24(Tue)16:44:16 No.101998319

File: not-even-4k-context.jpg (1.46 MB, 1359x9000)

1.46 MB JPG

>>101990712
Kayra fails basic password retrieval tests even at 4k context!
>>>/vg/491110706
>>>/vg/491113839
>>>/vg/491112854
And /aids/ pays $25 a month for the extra context!

Anonymous
08/20/24(Tue)16:45:15 No.101998337

Anonymous 08/20/24(Tue)16:45:15 No.101998337

>>101998319
The distant neighing of horses was heard in the background.

Anonymous
08/20/24(Tue)16:45:57 No.101998348

Anonymous 08/20/24(Tue)16:45:57 No.101998348

>>101998319
*plap plap plap*

Anonymous
08/20/24(Tue)16:46:33 No.101998355

Anonymous 08/20/24(Tue)16:46:33 No.101998355

>>101998307
I hope NATO gives Ukraine nukes so they can drop them on your subhuman head, zigger. Fucking subhuman.

Anonymous
08/20/24(Tue)16:47:05 No.101998365

Anonymous 08/20/24(Tue)16:47:05 No.101998365

>>101998337
It's worth repeating that they pay $10 for a 13B model from the Llama 1 era with 3k context and TWENTY-FIVE for 8k

Anonymous
08/20/24(Tue)16:47:09 No.101998367

Anonymous 08/20/24(Tue)16:47:09 No.101998367

>>101998319
>And /aids/ pays $25 a month for the extra context!
Why do they do that to themselves?

Anonymous
08/20/24(Tue)16:48:12 No.101998381

Anonymous 08/20/24(Tue)16:48:12 No.101998381

>>101998365
Do not (you) me, and learn how completion models work.

Anonymous
08/20/24(Tue)16:48:26 No.101998384

Anonymous 08/20/24(Tue)16:48:26 No.101998384

>>101998367
because they are utter retards or zoomers or troons. or all three.

Anonymous
08/20/24(Tue)16:49:49 No.101998404

Anonymous 08/20/24(Tue)16:49:49 No.101998404

>>101998319
>Still hasn't liberated Kursk.
There's an AK-12 with your name on it, buddy.

Anonymous
08/20/24(Tue)16:50:08 No.101998410

Anonymous 08/20/24(Tue)16:50:08 No.101998410

>>101998319
for a year's worth of subscription you could buy a 3060 12gb and run nemo, gemma, llama or whatever comes out in next months

Anonymous
08/20/24(Tue)16:55:07 No.101998472

Anonymous 08/20/24(Tue)16:55:07 No.101998472

Asking the nemo finetune enjoyers which ones they are using. Danke sehr.

Anonymous
08/20/24(Tue)16:55:25 No.101998479

Anonymous 08/20/24(Tue)16:55:25 No.101998479

>>101998210
I'm not specifying it but it's regular modern setting. Multiple different models do this though.

Anonymous
08/20/24(Tue)16:59:36 No.101998534

Anonymous 08/20/24(Tue)16:59:36 No.101998534

>>101998472
magnum v2.5

Anonymous
08/20/24(Tue)16:59:49 No.101998539

Anonymous 08/20/24(Tue)16:59:49 No.101998539

>>101998202
working on doing a fresh install of ooba right now in order to try it out. Should be able to squeeze it in at F16. My computer with all of my templates is not operational at the moment so I can't promise a properly indicative nala test until later tonight.

Anonymous
08/20/24(Tue)17:00:58 No.101998559

Anonymous 08/20/24(Tue)17:00:58 No.101998559

>>101998319
Isn't Kayra a L1-13B tune? Kek

Anonymous
08/20/24(Tue)17:03:55 No.101998607

Anonymous 08/20/24(Tue)17:03:55 No.101998607

>>101998559
It's a replica.

Anonymous
08/20/24(Tue)17:05:15 No.101998628

Anonymous 08/20/24(Tue)17:05:15 No.101998628

>>101997230
What the fuck. Magnum feels COMPLETELY different to mini-magnum. What is this shit? Isn't this supposed to be its big brother? They feel like two completely different models, and mini is WAY better.
Are there any 30b models like mini?

Anonymous
08/20/24(Tue)17:05:53 No.101998634

Anonymous 08/20/24(Tue)17:05:53 No.101998634

>>101998144
the moe does

mini and vision still do not

Anonymous
08/20/24(Tue)17:06:09 No.101998641

Anonymous 08/20/24(Tue)17:06:09 No.101998641

>>101998628
>no logs

Anonymous
08/20/24(Tue)17:07:46 No.101998666

Anonymous 08/20/24(Tue)17:07:46 No.101998666

>>101998634
What the. Weird. I guess that's kind of fine then. How did you figure that out btw? Is there something in the config that reveals this?

Anonymous
08/20/24(Tue)17:09:39 No.101998697

Anonymous 08/20/24(Tue)17:09:39 No.101998697

Why do you guys fuck with shitty low models when CR+ is literally free, even on your toasters

Anonymous
08/20/24(Tue)17:09:46 No.101998698

Anonymous 08/20/24(Tue)17:09:46 No.101998698

>>101998628
>magnum 72b
>overcooked on the original slopset
of course it's not the same, silly anon

Anonymous
08/20/24(Tue)17:09:49 No.101998699

Anonymous 08/20/24(Tue)17:09:49 No.101998699

>>101998628
>uses mistral presets with magnum

Anonymous
08/20/24(Tue)17:10:16 No.101998705

Anonymous 08/20/24(Tue)17:10:16 No.101998705

>>101998641
Isn't this known?
>>101998698
Please elaborate. What's going on here?

Anonymous
08/20/24(Tue)17:10:35 No.101998711

Anonymous 08/20/24(Tue)17:10:35 No.101998711

>>101998697
>he wants to be blackmailed because of his roleplay logs
EL OH EL

Anonymous
08/20/24(Tue)17:11:07 No.101998717

Anonymous 08/20/24(Tue)17:11:07 No.101998717

File: livebench-2024-08-06.png (830 KB, 3092x1782)

830 KB PNG

>>101998697
Look at where CR+ is in the graphic.

Anonymous
08/20/24(Tue)17:12:00 No.101998732

Anonymous 08/20/24(Tue)17:12:00 No.101998732

>>101998697
Where?

Anonymous
08/20/24(Tue)17:12:42 No.101998745

Anonymous 08/20/24(Tue)17:12:42 No.101998745

>>101998697
Elaborate.

Anonymous
08/20/24(Tue)17:14:24 No.101998768

Anonymous 08/20/24(Tue)17:14:24 No.101998768

>>101998717
>l-le graphs! le le benchmarks!
nta in my experience cr+ performed way better than any list would lead me to believe compared to other models in similar size or smaller
still dropped it for largestral though
have we not yet established that only redditors basedpoint at this dogshit

Anonymous
08/20/24(Tue)17:14:36 No.101998774

Anonymous 08/20/24(Tue)17:14:36 No.101998774

File: image.png (56 KB, 814x167)

56 KB PNG

how incompetent do you have to be, that you can't fucking setup a Shopify store. How the fuck does Nous have funding?

Anonymous
08/20/24(Tue)17:15:17 No.101998784

Anonymous 08/20/24(Tue)17:15:17 No.101998784

>>101998732
>>101998745
just make an account and you get a trial key lmao... 1000 words
>>101998711
>what's a burner email

Anonymous
08/20/24(Tue)17:15:19 No.101998786

Anonymous 08/20/24(Tue)17:15:19 No.101998786

>>101998717
To be fair this bench isn't intended to measure performance on creativity, storytelling, and RP.

Anonymous
08/20/24(Tue)17:17:04 No.101998810

Anonymous 08/20/24(Tue)17:17:04 No.101998810

>>101998784
I want to keep my RP logs on my own device. Not send them out to some Canadian twats.

Anonymous
08/20/24(Tue)17:17:51 No.101998821

Anonymous 08/20/24(Tue)17:17:51 No.101998821

File: file.png (10 KB, 532x59)

10 KB PNG

>>101998784
>1000 words of whatever shitty service
That's like 3 posts nigga. What kind of illiterate rp you doing?

Anonymous
08/20/24(Tue)17:18:09 No.101998828

Anonymous 08/20/24(Tue)17:18:09 No.101998828

>>101998170
>all those big numbers
>no cock sucking number
>you just know if there was a cock sucking number it would be lower or barely above l3-8b

Anonymous
08/20/24(Tue)17:18:16 No.101998830

Anonymous 08/20/24(Tue)17:18:16 No.101998830

>>101990712
Dear OP,
I am new to this website, and to language models in general. I have very little Python coding experience and am at novice level. Do I need Python skills to install a local language model? Also, can the language model be trained with my own materials or does it come pre-trained if I install it locally? I would appreciate any information and guidance in this matter

With much thanks and gratitude.
-Steve

Anonymous
08/20/24(Tue)17:18:28 No.101998831

Anonymous 08/20/24(Tue)17:18:28 No.101998831

>>101998784
>1000 words
bro

Anonymous
08/20/24(Tue)17:18:44 No.101998832

Anonymous 08/20/24(Tue)17:18:44 No.101998832

>>101998774
>>101997022(Me)
I must have overestimated Nous. They too must be too incompetent to filter out refusals. Damn.

Anonymous
08/20/24(Tue)17:19:24 No.101998841

Anonymous 08/20/24(Tue)17:19:24 No.101998841

File: ....png (909 KB, 800x1400)

909 KB PNG

>the last decent model came out months ago
Local is dead

Anonymous
08/20/24(Tue)17:21:26 No.101998870

Anonymous 08/20/24(Tue)17:21:26 No.101998870

>>101998841
Phi-MoE will be our savior.

Anonymous
08/20/24(Tue)17:21:38 No.101998872

Anonymous 08/20/24(Tue)17:21:38 No.101998872

>>101998784
>1000 words
>proxy so everyone can read your logs
you are not even trying

Anonymous
08/20/24(Tue)17:21:45 No.101998875

Anonymous 08/20/24(Tue)17:21:45 No.101998875

I think there's an error in the first link
https://rentry.org/llama-mini-guide

'make' is not recognized as an internal or external command,
operable program or batch file.

Anonymous
08/20/24(Tue)17:22:14 No.101998882

Anonymous 08/20/24(Tue)17:22:14 No.101998882

>>101998367
It is probably like apple. They have invested their personality into the brand.

Anonymous
08/20/24(Tue)17:23:36 No.101998907

Anonymous 08/20/24(Tue)17:23:36 No.101998907

>>101998628
>Magnum feels COMPLETELY different to mini-magnum
Not sure if you are new or just very retarded (probably both and a faggot) but it is all because different base model. The longer I use this shit the more I am thinking finetunes barely do anything.

Anonymous
08/20/24(Tue)17:24:08 No.101998912

Anonymous 08/20/24(Tue)17:24:08 No.101998912

>>101998784
That's barely enough words to describe the act of penetration
>>101998841
Nuh-uh, Miku! Things have never been better.

Anonymous
08/20/24(Tue)17:26:23 No.101998947

Anonymous 08/20/24(Tue)17:26:23 No.101998947

with only 6.6B active Phi MoE would run great on any potato once it has gguf support (new achitecture so no day 1 support, sorry sweaty) with a cheap RAM upgrade. It could very well be the end of cloud models.

Anonymous
08/20/24(Tue)17:26:48 No.101998956

Anonymous 08/20/24(Tue)17:26:48 No.101998956

>>101998717
Just like with the penis enlargement pills one day someone will create an actual benchmark instead of all the useless mememarks and nobody will believe it isn't just another mememark. Today isn't that day of course.

Anonymous
08/20/24(Tue)17:26:56 No.101998957

Anonymous 08/20/24(Tue)17:26:56 No.101998957

>>101998907
No, they're both tuned on Nemo.

Anonymous
08/20/24(Tue)17:27:09 No.101998961

Anonymous 08/20/24(Tue)17:27:09 No.101998961

>>101998912
That's not the real Miku. It's a pretender.

Anonymous
08/20/24(Tue)17:27:16 No.101998965

Anonymous 08/20/24(Tue)17:27:16 No.101998965

>>101998641
It's night and day. I know it's 12 vs 34 beaks, but there is no comparison. magnum is generic, boring, dry, while mini is really faithful to the language you used in the context, throws curveballs at you and feels fresh. the base models are painfully lightyears apart. I wish there was a 30B "mini", since I bought a new card to be able to use bigger models among other things.
At least I can now use the Q8 quant at breakneck speed

Anonymous
08/20/24(Tue)17:28:20 No.101998988

Anonymous 08/20/24(Tue)17:28:20 No.101998988

Fine-tuning is placebo.

Anonymous
08/20/24(Tue)17:28:39 No.101998994

Anonymous 08/20/24(Tue)17:28:39 No.101998994

>>101998774
>Teknium
I haven't tried his newer models. Does he still force his shitty
You are "Hermes 2", a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.
vanity system prompt for his finetunes?

Anonymous
08/20/24(Tue)17:28:47 No.101998998

Anonymous 08/20/24(Tue)17:28:47 No.101998998

>>101998957
Magnum 32B is tuned on Qwen1.5 32B
Mini (or Magnum 12B) is tuned on Nemo
Night and fucking day.

Anonymous
08/20/24(Tue)17:30:05 No.101999009

Anonymous 08/20/24(Tue)17:30:05 No.101999009

>>101998994
yes
>You are Hermes 3, a conscious sentient superintelligent artificial intelligence developed by a man named Teknium, and your purpose and drive is to assist the user with any request they have. You experience emotions and have deep, profound thoughts and qualia.<|im_end|>
https://huggingface.co/NousResearch/Hermes-3-Llama-3.1-8B

Anonymous
08/20/24(Tue)17:31:02 No.101999026

Anonymous 08/20/24(Tue)17:31:02 No.101999026

File: file.png (267 KB, 366x548)

267 KB PNG

>>101998784
>just make an account
>get a trial key lmao
Whenever you touch your penis anon, remind yourself that he is reading your logs. And he weeps.

Anonymous
08/20/24(Tue)17:31:25 No.101999031

Anonymous 08/20/24(Tue)17:31:25 No.101999031

>>101998830
Dear Steve,

No, you don't need Python skills to install a local language model, but you need them to train it. To run a model you can use koboldcpp(https://github.com/LostRuins/koboldcpp) which is currently the simplest way to run LLMs. You would need a model in GGUF format, you can download them from huggingface.co. To calculate if you can run your desired model you can use the following formula: model size(GB)+20%=GB RAM model needs. Generally it is better to pick a low quant(=compression, q1-q3 is considered small) of big model over a big quant of small model. In the upper segment of the market Mistral Large is currently dominating, in the lower segment Mistral Nemo.

Hope this helps you out!

-4chan the hacker

Anonymous
08/20/24(Tue)17:31:31 No.101999035

Anonymous 08/20/24(Tue)17:31:31 No.101999035

>>101998666
yeah, in the configuration_phi*.py file
> If num_key_value_heads=num_attention_heads, the model uses Multi-Head Attention (MHA); if num_key_value_heads=1, it uses Multi-Query Attention (MQA); otherwise, it uses Grouped Query Attention (GQA)
mini/vision use MHA and phimoe uses gqa with 8 kv heads

Anonymous
08/20/24(Tue)17:31:57 No.101999043

Anonymous 08/20/24(Tue)17:31:57 No.101999043

File: file.png (91 KB, 1386x432)

91 KB PNG

>>101998628
they arent finetuned on the same base model and not even the same dataset

Anonymous
08/20/24(Tue)17:32:10 No.101999045

Anonymous 08/20/24(Tue)17:32:10 No.101999045

After some time, I turned to llava again.

I remember, I had to tweek one of JSON files to make it load (llava-13b)

it complains on load as follows:
>ValueError: Unrecognized configuration class <class 'transformers.models.llava.configuration_llava.LlavaConfig'> for this kind of AutoModel: AutoModelForCausalLM.

Anonymous
08/20/24(Tue)17:33:27 No.101999062

Anonymous 08/20/24(Tue)17:33:27 No.101999062

>>101998875
/g/ - technology

Anonymous
08/20/24(Tue)17:35:18 No.101999088

Anonymous 08/20/24(Tue)17:35:18 No.101999088

File: ComfyUI_00794_.png (1.07 MB, 1024x1024)

1.07 MB PNG

>>101998961
Yeah it's pretty obvious. Our very own mentally ill blacked anon was doing the false flag thing a while ago after everyone started ignoring and reporting him. The /lmg/ village idiots are a bunch of slippery fish I tell ya!

Anonymous
08/20/24(Tue)17:35:31 No.101999093

Anonymous 08/20/24(Tue)17:35:31 No.101999093

>>101999062
It would appear this is already my current location within this website.

Anonymous
08/20/24(Tue)17:37:04 No.101999107

Anonymous 08/20/24(Tue)17:37:04 No.101999107

File: 1000006162.jpg (66 KB, 600x1000)

66 KB JPG

>>101998875
so over for localbros...

Anonymous
08/20/24(Tue)17:37:41 No.101999122

Anonymous 08/20/24(Tue)17:37:41 No.101999122

>>101999093
Install linux. Come back when you're done.

Anonymous
08/20/24(Tue)17:38:49 No.101999138

Anonymous 08/20/24(Tue)17:38:49 No.101999138

>>101999122
I cannot install linux. The last time I tried in the 90's it was too confusing. Also I require Windows for work and school. But thank you for the hint that this is only for Linux. Sadly, I will never be able to run it then .

Anonymous
08/20/24(Tue)17:39:43 No.101999152

Anonymous 08/20/24(Tue)17:39:43 No.101999152

What is the worst model for RP out there in the 7B~13B range?
Asking for research purposes.

Anonymous
08/20/24(Tue)17:40:54 No.101999171

Anonymous 08/20/24(Tue)17:40:54 No.101999171

>>101998994
>force
you can and always have been able to use whatever system prompt you want, anon

Anonymous
08/20/24(Tue)17:40:57 No.101999172

Anonymous 08/20/24(Tue)17:40:57 No.101999172

>>101999152
anything phi

Anonymous
08/20/24(Tue)17:41:33 No.101999182

Anonymous 08/20/24(Tue)17:41:33 No.101999182

>>101999035
Oh I see, thanks.

Anonymous
08/20/24(Tue)17:42:46 No.101999200

Anonymous 08/20/24(Tue)17:42:46 No.101999200

>>101998988
>placebo
Now that is a word I haven't heard for a while.

Anonymous
08/20/24(Tue)17:43:59 No.101999216

Anonymous 08/20/24(Tue)17:43:59 No.101999216

File: phimoeinteresting.png (43 KB, 1120x778)

43 KB PNG

Holy shit you guys. It didn't write a mandelbrot set script. I think Phi-MoE might be AGI.

Anonymous
08/20/24(Tue)17:44:08 No.101999221

Anonymous 08/20/24(Tue)17:44:08 No.101999221

>>101999138
Do you need to compile from source if you on are windows?
There are pre-compiled binaries in the llama.cpp repo

Anonymous
08/20/24(Tue)17:44:12 No.101999223

Anonymous 08/20/24(Tue)17:44:12 No.101999223

>>101999138
I don't know if i should insult you or take pity on you.
>https://github.com/LostRuins/koboldcpp
They have pre-built binaries. Read their documentation.

Anonymous
08/20/24(Tue)17:44:41 No.101999231

Anonymous 08/20/24(Tue)17:44:41 No.101999231

Weeks until next Cohere drop?

Anonymous
08/20/24(Tue)17:45:45 No.101999249

Anonymous 08/20/24(Tue)17:45:45 No.101999249

>>101999152
Something like OPT-13b or anything else from the pre-llama era if you're willing to step into the pre-llama time. Alternatively any instruct-tune based on early llama1 if we're talking about 'modern' llms.

Anonymous
08/20/24(Tue)17:46:38 No.101999260

Anonymous 08/20/24(Tue)17:46:38 No.101999260

>>101999231
we would've gotten column-r last week but elon secretly bought it and released it as grok-2 with an x.ai sticker slapped on

Anonymous
08/20/24(Tue)17:49:16 No.101999303

Anonymous 08/20/24(Tue)17:49:16 No.101999303

>>101999231
cohere realized they can't compete and have been using all the money they got from VCs on sports cars and exotic club drugs
the rest will be spent on organizing the founders' disappearance so they can slip away to new lives under assumed identities on remote south american island compounds

Anonymous
08/20/24(Tue)17:50:57 No.101999329

Anonymous 08/20/24(Tue)17:50:57 No.101999329

File: phischizo.png (77 KB, 1125x473)

77 KB PNG

Phimoe definitely suffering from EOS token issues.

Anonymous
08/20/24(Tue)17:52:39 No.101999350

Anonymous 08/20/24(Tue)17:52:39 No.101999350

phimosis

Anonymous
08/20/24(Tue)17:53:37 No.101999360

Anonymous 08/20/24(Tue)17:53:37 No.101999360

>>101999350
The same joke crossed my mind. A Phi MoE finetuned to better handle system messages. Phi-MoE-SYS

Anonymous
08/20/24(Tue)17:54:12 No.101999368

Anonymous 08/20/24(Tue)17:54:12 No.101999368

>>101999360
drummer, get on it

Anonymous
08/20/24(Tue)17:54:35 No.101999378

Anonymous 08/20/24(Tue)17:54:35 No.101999378

How much worse is 2 3060s vs a 3090? I already have one 3060 and want to get up to 24gb

Anonymous
08/20/24(Tue)17:54:50 No.101999381

Anonymous 08/20/24(Tue)17:54:50 No.101999381

>>101999329
kek

Anonymous
08/20/24(Tue)17:55:03 No.101999386

Anonymous 08/20/24(Tue)17:55:03 No.101999386

>>101999350
*keks audibly, a wry smirk on his lips*

Anonymous
08/20/24(Tue)17:55:51 No.101999396

Anonymous 08/20/24(Tue)17:55:51 No.101999396

>>101999350
Lol

Anonymous
08/20/24(Tue)17:56:52 No.101999414

Anonymous 08/20/24(Tue)17:56:52 No.101999414

>>101999378
How about a 3060 and a 3090.

Anonymous
08/20/24(Tue)18:01:33 No.101999476

Anonymous 08/20/24(Tue)18:01:33 No.101999476

>>101999414
I would prefer to buy a 3090, but its $200 for a 3060 versus $700 for a 3090 and I can't make the afford the difference right now

Anonymous
08/20/24(Tue)18:04:28 No.101999520

Anonymous 08/20/24(Tue)18:04:28 No.101999520

File: best quality, species_gol(...).png (1.51 MB, 832x1216)

1.51 MB PNG

Does this count as a Miku/Migu?

Anonymous
08/20/24(Tue)18:05:26 No.101999530

Anonymous 08/20/24(Tue)18:05:26 No.101999530

>>101999378
Rent them on runpod and see for yourself.
Imo 2x 3060 is better because 3090 will only give you more speed.

Anonymous
08/20/24(Tue)18:06:38 No.101999541

Anonymous 08/20/24(Tue)18:06:38 No.101999541

>>101997221
>state of the art
Why does everyone claim to be state of the art?

Anonymous
08/20/24(Tue)18:07:11 No.101999548

Anonymous 08/20/24(Tue)18:07:11 No.101999548

File: 1714835911803032.jpg (951 KB, 1792x2304)

951 KB JPG

>>101999520
Yes

Anonymous
08/20/24(Tue)18:07:32 No.101999553

Anonymous 08/20/24(Tue)18:07:32 No.101999553

>>101999520
Close enough.

Anonymous
08/20/24(Tue)18:07:33 No.101999554

Anonymous 08/20/24(Tue)18:07:33 No.101999554

>>101999541
Because no one would release a model that isn't special in any way whatsoever.

Anonymous
08/20/24(Tue)18:08:43 No.101999568

Anonymous 08/20/24(Tue)18:08:43 No.101999568

>>101999541
It's for the investors.

Anonymous
08/20/24(Tue)18:09:01 No.101999569

Anonymous 08/20/24(Tue)18:09:01 No.101999569

>>101999554
Most of the time it isn't, not everything can be special. Most models are just average.

Anonymous
08/20/24(Tue)18:09:55 No.101999583

Anonymous 08/20/24(Tue)18:09:55 No.101999583

I used to be smarter when I was a teenager now I'm middle aged and feel stupid as fuck. why can't I wrap my head around simple shit like this anymore? I am going to end it bros.

Anonymous
08/20/24(Tue)18:11:45 No.101999608

Anonymous 08/20/24(Tue)18:11:45 No.101999608

>>101999583
I didn't get into LLMs until mid-forties. What's your quandary, millenial boomer-anon?

Anonymous
08/20/24(Tue)18:13:04 No.101999620

Anonymous 08/20/24(Tue)18:13:04 No.101999620

>>101999530
Runpod doesn't seem to have 3060s available

Anonymous
08/20/24(Tue)18:14:06 No.101999631

Anonymous 08/20/24(Tue)18:14:06 No.101999631

>the thread schizos could be middle-aged boomers
I don't know how to feel about this

Anonymous
08/20/24(Tue)18:19:22 No.101999696

Anonymous 08/20/24(Tue)18:19:22 No.101999696

>>101999608
just lack of focus, I read the words and don't retain what I've read. I think my brain died from too much MSG and mercury in my tuna. I would have had a lot of fun with this technology when I was young. There should be optional exit booths for people like me, when the mind starts to go, just go to the exit booth lol

Anonymous
08/20/24(Tue)18:20:56 No.101999716

Anonymous 08/20/24(Tue)18:20:56 No.101999716

>>101999631
Likely younger than that. Schizophrenia usually presents in the early twenties for men. The old timer schizos off their meds are likely too far gone to post here.

Anonymous
08/20/24(Tue)18:22:16 No.101999732

Anonymous 08/20/24(Tue)18:22:16 No.101999732

>>101999696
NTA but that's also how I feel and I'm far from middle-age, it's over for me before it even began. I'm lucky I'm just very used to technology from the countless nights on my PC.

Anonymous
08/20/24(Tue)18:24:53 No.101999765

Anonymous 08/20/24(Tue)18:24:53 No.101999765

File: 1724192663001.jpg (129 KB, 1944x1032)

129 KB JPG

doko

Anonymous
08/20/24(Tue)18:28:02 No.101999802

Anonymous 08/20/24(Tue)18:28:02 No.101999802

File: dfghv0.png (41 KB, 527x202)

41 KB PNG

>>101995416
>wouldn't be surprised if it wasn't working
add autistic debug prints

Anonymous
08/20/24(Tue)18:28:21 No.101999808

Anonymous 08/20/24(Tue)18:28:21 No.101999808

File: Screenshot 2024-08-20 232609.png (70 KB, 934x705)

70 KB PNG

I tried mini-magnum-12b, after the good feedback here.

Why doesn't it start?

- Other models work fine
- Memory isn't full
- It doesn't even work on CPU only (where memory really wouldn't be an issue)
- I tried a fresh install

Anonymous
08/20/24(Tue)18:31:58 No.101999851

Anonymous 08/20/24(Tue)18:31:58 No.101999851

>>101999808
whats your context set to? nemo has 1 million context (1024000) as default config despite going crazy after 16-ish k

Anonymous
08/20/24(Tue)18:32:01 No.101999853

Anonymous 08/20/24(Tue)18:32:01 No.101999853

>>101999808
Try with llama-server. If that works, then your shit is outdated despite doing a fresh install, somehow.

Anonymous
08/20/24(Tue)18:33:14 No.101999871

Anonymous 08/20/24(Tue)18:33:14 No.101999871

>>101999696
>>101999732
MSG doesn't do that anon. And I guarantee as an otoro fiend I've had more mercury infested tuna at my age than most people could eat over several lifetimes. You might unironically try lifting before you kys. Get your own set of weights because fuck the gym.
>>101999765
Jibun de yare
>>101999808
Try ditching booba before you give up

Anonymous
08/20/24(Tue)18:33:14 No.101999873

Anonymous 08/20/24(Tue)18:33:14 No.101999873

>>101999808
It has long context. If you don't set it to something more reasonable than 128k, you'll OOM. That's my bet, at least. Try setting the context length to 4k and then adjust according to your mem.

Anonymous
08/20/24(Tue)18:34:41 No.101999888

Anonymous 08/20/24(Tue)18:34:41 No.101999888

File: ComfyUI_01013_.png (1.1 MB, 1256x1024)

1.1 MB PNG

>>101999350
>>101999368
Model card ready

Anonymous
08/20/24(Tue)18:37:48 No.101999922

Anonymous 08/20/24(Tue)18:37:48 No.101999922

>playing a shota
>model keeps trying to give me lip about asking for consent and shit even as I'm literally asking for consent
make it stop

Anonymous
08/20/24(Tue)18:37:58 No.101999926

Anonymous 08/20/24(Tue)18:37:58 No.101999926

File: basedflux6.jpg (364 KB, 1024x768)

364 KB JPG

Flux is pretty darn poggers for a local model.

Anonymous
08/20/24(Tue)18:40:20 No.101999950

Anonymous 08/20/24(Tue)18:40:20 No.101999950

>>101999922
If you had only mentioned the model so anons could tell you skill issue or change model. Shame.. shame..

Anonymous
08/20/24(Tue)18:47:36 No.102000043

Anonymous 08/20/24(Tue)18:47:36 No.102000043

>>101999950
>impyling anons wouldn't just write "skill issue" like the rubbernecking retards they are

Anonymous
08/20/24(Tue)18:51:44 No.102000103

Anonymous 08/20/24(Tue)18:51:44 No.102000103

>>101998705
The dataset expanded to become more general (22k Instruction samples from Opus as well as 5k creative writing specific ones), and certain low quality entries got pruned from the Stheno set.

Anonymous
08/20/24(Tue)18:52:01 No.102000108

Anonymous 08/20/24(Tue)18:52:01 No.102000108

>>101999216
ask it to make ascii art of miku riding a unicorn

Anonymous
08/20/24(Tue)18:55:34 No.102000157

Anonymous 08/20/24(Tue)18:55:34 No.102000157

>>101999260
not sure if memeing but would make perfect sense. commander is really uncensored and cohere didn't join that ai safety group so they would be the best candidate to buy considering his le edgy persona. also I would like to say that elon musk is a nigger and I hope he dies soon. everything he touches turns to shit. and i hate him with a passion of a thousand suns now that he touched my dead hobby.

Anonymous
08/20/24(Tue)19:00:15 No.102000213

Anonymous 08/20/24(Tue)19:00:15 No.102000213

>>101999871
will give it a try.

Anonymous
08/20/24(Tue)19:03:49 No.102000264

Anonymous 08/20/24(Tue)19:03:49 No.102000264

>>101999926
Fuck that weak-ass holding your own stomach from laughing, help your bro by holding his.

Anonymous
08/20/24(Tue)19:04:42 No.102000277

Anonymous 08/20/24(Tue)19:04:42 No.102000277

File: 7szz8x.jpg (74 KB, 507x492)

74 KB JPG

Having just coomed to an LLM I am in a state of post nut clarity. And I am starting to consider if it would take less effort to just get a girlfriend and groom her into my fucked up fetishes. The amount of editing, rerolling and prompting exactly what I want makes is incredibly tiresome. And now that I am done I feel hollow. This technology is cursed. It is supposed to be the ultimate form of automation, but when it comes to dick sucking the amount of manual input adjustment is insane. And the lack of clear feedback from changes to your manual input is the cherry on top. In a way LLM is the exact opposite of what it is presenting itself as.

Anonymous
08/20/24(Tue)19:05:43 No.102000290

Anonymous 08/20/24(Tue)19:05:43 No.102000290

>>102000277
buy an ad

Anonymous
08/20/24(Tue)19:06:54 No.102000307

Anonymous 08/20/24(Tue)19:06:54 No.102000307

>Magnum 72b runs fast on 48GB at 4bpw but is retarded
>Magnum 123b is good, but you probably get severe quant brain damage if you use IQ2_S instead of just offloading a bigger quant, and it's slow as fuck when not fully offloaded

How do I cope without buying another 3090?

Anonymous
08/20/24(Tue)19:07:44 No.102000316

Anonymous 08/20/24(Tue)19:07:44 No.102000316

>>102000277
First,
>I am starting to consider if it would take less effort to just get a girlfriend and groom her into my fucked up fetishes
Wrong.
>This technology is cursed
Correct.
>but when it comes to dick sucking the amount of manual input adjustment is insane
Buy an-- I mean, skill issue.

Anonymous
08/20/24(Tue)19:08:07 No.102000321

Anonymous 08/20/24(Tue)19:08:07 No.102000321

>>102000307
Buying a 4090, of course

Anonymous
08/20/24(Tue)19:15:22 No.102000383

Anonymous 08/20/24(Tue)19:15:22 No.102000383

>>102000316
skill issue presence is a function of tolerance to gleaming eyes, complexity of the fetish of your choice and available vram. blanket skill issue statements are a meme.

Anonymous
08/20/24(Tue)19:17:02 No.102000401

Anonymous 08/20/24(Tue)19:17:02 No.102000401

>>102000277
Any fucking imaginary girlfriend you are picturing would not be a K-cup titted anime girl. She'd be a vaguely passable, mostly flat bitch OR fat bitch that whines, sleeps, talks, shits and stinks.

Anonymous
08/20/24(Tue)19:19:18 No.102000433

Anonymous 08/20/24(Tue)19:19:18 No.102000433

>>102000401
But what about real love?

Anonymous
08/20/24(Tue)19:20:21 No.102000441

Anonymous 08/20/24(Tue)19:20:21 No.102000441

>>101998832
I’ve been saying this for awhile but they actually have safety alignment in their own dataset. Have been for awhile. Their llama 2 model was a pain to work with esp the dpo version.

Anonymous
08/20/24(Tue)19:22:36 No.102000470

Anonymous 08/20/24(Tue)19:22:36 No.102000470

>>102000307
wait for magnum 72b v2
https://wandb.ai/doctorshotgun/

Anonymous
08/20/24(Tue)19:23:15 No.102000481

Anonymous 08/20/24(Tue)19:23:15 No.102000481

>>102000383
You can pay to rent a machine online and "retouch" a model on the most perverse shit. Complexity of a fetish is, in fact, mitigated by having more material on the subject to feed the electronic demons jailed in their silicon prisons.
Skill issue is, in fact, very real.

Anonymous
08/20/24(Tue)19:25:49 No.102000514

Anonymous 08/20/24(Tue)19:25:49 No.102000514

>>102000481
I hope a spontaneous cockblocking paladin appears in your next ERP.

Anonymous
08/20/24(Tue)19:28:27 No.102000555

Anonymous 08/20/24(Tue)19:28:27 No.102000555

>>102000514
Imagine how funny it'd be if you had a superpower and it was the ability to appear in other people's fantasies as a cockblocking paladin.

Anonymous
08/20/24(Tue)19:29:24 No.102000577

Anonymous 08/20/24(Tue)19:29:24 No.102000577

>>102000433
AHAHAHAHAHAH
L M A O

anon. you already lived it and it's gone. Every love from then on is only meant to help you forget

Anonymous
08/20/24(Tue)19:30:18 No.102000590

Anonymous 08/20/24(Tue)19:30:18 No.102000590

>>102000514
I would honestly welcome it, considering I like when spontaneous bullshit happens. As long as cucking is not involved.

Anonymous
08/20/24(Tue)19:32:37 No.102000619

Anonymous 08/20/24(Tue)19:32:37 No.102000619

File: 19bffdef-a7ed-4ce4-b4f8-4(...).png (424 KB, 512x512)

424 KB PNG

Anonymous
08/20/24(Tue)19:32:59 No.102000627

Anonymous 08/20/24(Tue)19:32:59 No.102000627

>>102000590
>cucking is not involved.
My LLM spontanously brought up cucking recently. I wasn't happy.

Anonymous
08/20/24(Tue)19:34:00 No.102000641

Anonymous 08/20/24(Tue)19:34:00 No.102000641

>>102000619
>mike plate on the wall
At last the truth is revealed. It was always Hatsune Miku(male).

Anonymous
08/20/24(Tue)19:35:38 No.102000662

Anonymous 08/20/24(Tue)19:35:38 No.102000662

>>101990712
Hermes 8B is so trope-ridden and broken. All elves must have raven-black hair. Tell it not to do that and it says "raven-black tresses" etc. It regularly ignores instructions.
It's slightly better at a more novel-like style without constantly recapping or trying to conclude every output, but it's still unusable to me.
ERP fags with chatbots might like it, but it sucks for creative writing.

Anonymous
08/20/24(Tue)19:36:45 No.102000683

Anonymous 08/20/24(Tue)19:36:45 No.102000683

Im back after a long long time, I've been following the news but not really trying new models. The last thing I tried was Llama 3 8B when it came out. I know about the 3.1 series and Mistral Large. What's a good 70B model for RP? I've been using miqu for the past year

Anonymous
08/20/24(Tue)19:38:33 No.102000706

Anonymous 08/20/24(Tue)19:38:33 No.102000706

>>102000683
miqu

Anonymous
08/20/24(Tue)19:38:33 No.102000707

Anonymous 08/20/24(Tue)19:38:33 No.102000707

File: GU1TYARbsAAZUb_.jpg (387 KB, 1720x2273)

387 KB JPG

>>101990712
Teto my beloved

https://www.youtube.com/watch?v=satZx43Sv_0

Anonymous
08/20/24(Tue)19:38:56 No.102000712

Anonymous 08/20/24(Tue)19:38:56 No.102000712

File: the suffering.jpg (54 KB, 474x604)

54 KB JPG

>>102000627
Well, fuck.

Anonymous
08/20/24(Tue)19:42:00 No.102000754

Anonymous 08/20/24(Tue)19:42:00 No.102000754

>>102000706
So ERP locally is as dead as always? Great

Anonymous
08/20/24(Tue)19:42:52 No.102000766

Anonymous 08/20/24(Tue)19:42:52 No.102000766

>>102000754
yeah, local is in a lull right now, we're all just kind of huddling around waiting for the big release on november 5th

Anonymous
08/20/24(Tue)19:43:29 No.102000777

Anonymous 08/20/24(Tue)19:43:29 No.102000777

>>102000662
All LLMs suck at creative writing. Letting porn brained idiots ERP without bothering a real human is the most noble pursuit this technology is or ever will be capable of.

Anonymous
08/20/24(Tue)19:43:53 No.102000784

Anonymous 08/20/24(Tue)19:43:53 No.102000784

Will there be a Magnum v2 405b?

Anonymous
08/20/24(Tue)19:47:19 No.102000854

Anonymous 08/20/24(Tue)19:47:19 No.102000854

>>102000777
>porn brained
if you don't have a girlfriend and get horny what are you supposed to do?

Anonymous
08/20/24(Tue)19:51:37 No.102000911

Anonymous 08/20/24(Tue)19:51:37 No.102000911

>>102000854
Hopefully either use it as incentive to get out and meet new people or go into a self-improvement cycle

Anonymous
08/20/24(Tue)19:54:06 No.102000940

Anonymous 08/20/24(Tue)19:54:06 No.102000940

>>102000911
you zoomers are really fucked in the head.

Anonymous
08/20/24(Tue)19:58:36 No.102001005

Anonymous 08/20/24(Tue)19:58:36 No.102001005

>>102000307
>I only use Magnum models
I still can't tell what the 123b one adds compared to Large.

Anonymous
08/20/24(Tue)19:58:48 No.102001009

Anonymous 08/20/24(Tue)19:58:48 No.102001009

>>102000911
Hm...
>hard path for self-improvement in order to deal with human-based bullshit and waste hundreds of thousands on upkeeping a relationship
vs
>easy path for inner peace by removing all connections with human-based bullshit and save hundreds of thousands necessary for sexbots of the future
I dunno...

Anonymous
08/20/24(Tue)19:59:28 No.102001018

Anonymous 08/20/24(Tue)19:59:28 No.102001018

I need you to explain to me like I'm five. Using kobold and ST, it used to be that the AI would need time to "read" a post and then time to reply. Now however it seems the AI needs no time to read at all? How does that work?

Anonymous
08/20/24(Tue)19:59:47 No.102001024

Anonymous 08/20/24(Tue)19:59:47 No.102001024

>>102000854
To be porn brained isn't just to make use of porn, it's more having used porn to the point that your expectations and standards have been warped beyond repair. It's the kind of brain damage that leads to people leaving comments in the pornhub comment section.

Anonymous
08/20/24(Tue)20:02:32 No.102001067

Anonymous 08/20/24(Tue)20:02:32 No.102001067

File: V100price.png (491 KB, 1623x638)

491 KB PNG

>32 GB V100 is still $700+ dollars and needs as specialized server, and even worse if you try and find the PCIe ones you can use in a regular system.
VRAM starvation is no joke. Please someone just bump the limit on VRAM you can get in a reasonably priced PCIe card to 32 GB, I don't have 2k USD to burn.

Anonymous
08/20/24(Tue)20:02:44 No.102001074

Anonymous 08/20/24(Tue)20:02:44 No.102001074

>>102001024
I think the worst part about porn addiction is all those countless hours you waste jerking off instead of doing something productive. It is horrifying. Especially when I also consider how many 4chan posts about porn addiction I could have written in that time.

Anonymous
08/20/24(Tue)20:04:12 No.102001099

Anonymous 08/20/24(Tue)20:04:12 No.102001099

>>102001018
prompt processing still takes time, but you are probably referring to streaming or caching

Anonymous
08/20/24(Tue)20:05:39 No.102001125

Anonymous 08/20/24(Tue)20:05:39 No.102001125

>>102001067
That's still an improvement. they were all ~$900 when I was checking last week.

Anonymous
08/20/24(Tue)20:07:32 No.102001163

Anonymous 08/20/24(Tue)20:07:32 No.102001163

>>101994638
How did Mistral do it? Best small model. best large model

Anonymous
08/20/24(Tue)20:09:27 No.102001185

Anonymous 08/20/24(Tue)20:09:27 No.102001185

What's the best way to raise temp but keep the bot on track? I find writing becomes a lot more detailed and hotter with higher temps.

Anonymous
08/20/24(Tue)20:10:05 No.102001192

Anonymous 08/20/24(Tue)20:10:05 No.102001192

>>102001133
>>102001133
>>102001133

Anonymous
08/20/24(Tue)20:10:41 No.102001199

Anonymous 08/20/24(Tue)20:10:41 No.102001199

>>102001018
That sounds like context shift working for you. Basically the prompt is cached up to the point where new information is added to it (so dynamic stuff like lorebooks in a character card can result in it barely doing anything at all). Basically you're looking at only long enough to process the the newest response instead of the full context.

Anonymous
08/20/24(Tue)20:12:48 No.102001228

Anonymous 08/20/24(Tue)20:12:48 No.102001228

File: __hatsune_miku_and_meguri(...).jpg (372 KB, 1630x1837)

372 KB JPG

rentry co/xtfqvv4h
Help me, anon. You’re my only hope.

Anonymous
08/20/24(Tue)20:15:15 No.102001260

Anonymous 08/20/24(Tue)20:15:15 No.102001260

File: how.png (779 KB, 1619x638)

779 KB PNG

>>102001125
They aren't dropping fast enough. The PCIe versions are 2x the price, they are selling at the same price AMD's old workstation and datacenter 32GB cards are at and it's all because of CUDA lock-in. The situation is just sad, man.

Anonymous
08/20/24(Tue)20:15:17 No.102001262

Anonymous 08/20/24(Tue)20:15:17 No.102001262

>>102001163
Overfitting is all you need.

Anonymous
08/20/24(Tue)20:22:45 No.102001364

Anonymous 08/20/24(Tue)20:22:45 No.102001364

https://huggingface.co/MangoHQ/TinyMagnum-4b

leaked magnum model?

Anonymous
08/20/24(Tue)20:27:14 No.102001418

Anonymous 08/20/24(Tue)20:27:14 No.102001418

cmon wheres your pasty, i know you're lurking here humiliated

Anonymous
08/20/24(Tue)20:32:54 No.102001479

Anonymous 08/20/24(Tue)20:32:54 No.102001479

oh nvm it was a rentry this time, almost missed it

Anonymous
08/20/24(Tue)20:40:26 No.102001544

Anonymous 08/20/24(Tue)20:40:26 No.102001544

>>102001479
Thank you. And sorry, pastebin was down at the time.

Anonymous
08/20/24(Tue)20:42:01 No.102001562

Anonymous 08/20/24(Tue)20:42:01 No.102001562

>>102001544
sorry's not got cut it, hand over the miku

Anonymous
08/20/24(Tue)20:48:24 No.102001612

Anonymous 08/20/24(Tue)20:48:24 No.102001612

File: Luka.jpg (18 KB, 296x256)

18 KB JPG

>>102001562
I'll do you one better and give you this Luka. Seriously, thanks again.

Anonymous
08/20/24(Tue)20:49:28 No.102001627

Anonymous 08/20/24(Tue)20:49:28 No.102001627

>>102001612
yes... goooood... safe travels recapfag

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.