/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 08/24/24(Sat)23:48:40 No.102068958

File: 1704624856968875.jpg (584 KB, 1856x2464)

584 KB JPG

/lmg/ - Local Models General Anonymous 08/24/24(Sat)23:48:40 No.102068958 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102058880 & >>102049023

►News
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png (embed)

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
08/24/24(Sat)23:50:18 No.102068974

Anonymous 08/24/24(Sat)23:50:18 No.102068974

File: __hatsune_miku_vocaloid_d(...).jpg (324 KB, 780x605)

324 KB JPG

►Recent Highlights from the Previous Thread: >>102058880

--Proposal for "phrase_ban" feature to reduce repetitive phrases in LLM output: >>102060435 >>102060496 >>102060521 >>102060949 >>102060969 >>102061021 >>102061067 >>102063697
--PCI-E lane limitations for 2x4090s on consumer platforms: >>102062320 >>102062351 >>102062476 >>102062625 >>102064244 >>102064290
--Llama-cpp-python allows speculative decoding with draft model: >>102061525 >>102061563 >>102061654 >>102061657 >>102061757 >>102061842 >>102061848 >>102061952 >>102062008 >>102061912 >>102061972 >>102062011 >>102061839 >>102063136 >>102063159 >>102063494 >>102063531 >>102064502 >>102064677
--Llama 405b instruct tune is smart but lacks creativity: >>102059635 >>102059707 >>102059948 >>102060032 >>102060078 >>102060256 >>102060329 >>102060396 >>102064252 >>102064278 >>102064302 >>102064340 >>102064409 >>102060701
--GPU significantly faster than CPU for image generation: >>102059147 >>102059325 >>102059424 >>102059698 >>102059766 >>102059972
--Anon shares limitations of combining SD1.5 and Flux workflows: >>102062823 >>102062865 >>102062867 >>102062882 >>102062911 >>102063076 >>102063101
--Anon asks for mini-model suggestions to improve image prompts: >>102062039 >>102062092 >>102062126 >>102062101 >>102062118 >>102062231
--Anon asks about building a CPUmaxx knock-off with a dual CPU workstation: >>102060749 >>102060797 >>102060892 >>102060919 >>102061322 >>102061344
--Model struggles with doppelganger concept, new approach suggested: >>102063221 >>102063305 >>102063517 >>102063660 >>102063998 >>102064576 >>102065063 >>102065089 >>102065208 >>102065447 >>102067791
--Meta's plans for Grok 3 and massive H100 training: >>102059409 >>102060114
--Abliteration fails to uncensor models, LORA tune debated as alternative: >>102064594 >>102064747
--Miku (free space): >>102059147 >>102061464 >>102061508 >>102064244 >>102066344 >>102066406

►Recent Highlight Posts from the Previous Thread: >>102058885

Anonymous
08/24/24(Sat)23:51:05 No.102068985

Anonymous 08/24/24(Sat)23:51:05 No.102068985

File: 1704748145903521.png (219 KB, 507x447)

219 KB PNG

>>102068974
glad I didn't disrupt recapanon or recapbot <3

Anonymous
08/25/24(Sun)00:01:09 No.102069076

Anonymous 08/25/24(Sun)00:01:09 No.102069076

>>102068985
she loves black cock

Anonymous
08/25/24(Sun)00:01:55 No.102069084

Anonymous 08/25/24(Sun)00:01:55 No.102069084

>>102069076
i have american culture fatigue anon, enough

Anonymous
08/25/24(Sun)00:06:54 No.102069130

Anonymous 08/25/24(Sun)00:06:54 No.102069130

>>102069084
i have mikutroon fatigue. I guess we will both have to suffer.

Anonymous
08/25/24(Sun)00:36:02 No.102069402

Anonymous 08/25/24(Sun)00:36:02 No.102069402

File: 1547868098714.jpg (29 KB, 690x720)

29 KB JPG

>wonder how far along Phi moe support is for Llama.cpp so go and check the issues/PRs
>it's not along at all
ahahaha
Literally no one is working on it.

Anonymous
08/25/24(Sun)00:41:27 No.102069450

Anonymous 08/25/24(Sun)00:41:27 No.102069450

File: ComfyUI_temp_vyhpt_00072_.png (1.54 MB, 1024x1024)

1.54 MB PNG

Anonymous
08/25/24(Sun)01:03:24 No.102069605

Anonymous 08/25/24(Sun)01:03:24 No.102069605

niggers down the spine

Anonymous
08/25/24(Sun)01:11:28 No.102069670

Anonymous 08/25/24(Sun)01:11:28 No.102069670

>wait since davinci on AID in 2020 for local models to finally be good enough
>it's happened. they are now good enough
>don't care for some reason
thanks for all that wasted time, brain

Anonymous
08/25/24(Sun)01:14:01 No.102069696

Anonymous 08/25/24(Sun)01:14:01 No.102069696

anyone have recommendations for videos or books on learning neural network basics? youtube is infested with terrible videos from india, or other garbage thats difficult to follow or understand. this guys explanations are usually good
https://www.youtube.com/watch?v=Ilg3gGewQ5U

but i find this shit completely incomprehensible

Anonymous
08/25/24(Sun)01:14:18 No.102069698

Anonymous 08/25/24(Sun)01:14:18 No.102069698

>>102069670
The hedonic treadmill claims another victim.

Anonymous
08/25/24(Sun)01:33:41 No.102069895

Anonymous 08/25/24(Sun)01:33:41 No.102069895

Hi lads, what's the best current ERP model for 48gb vram chads?

Anonymous
08/25/24(Sun)01:34:48 No.102069907

Anonymous 08/25/24(Sun)01:34:48 No.102069907

>>102069895
sorry too scared to answer because schizos will tell me to buy an ad

Anonymous
08/25/24(Sun)01:42:06 No.102069967

Anonymous 08/25/24(Sun)01:42:06 No.102069967

File: 3YYF.png (50 KB, 840x590)

50 KB PNG

>>102069670
LLMs are a meme

Anonymous
08/25/24(Sun)01:44:00 No.102069995

Anonymous 08/25/24(Sun)01:44:00 No.102069995

>>102069967
Damn that's grim. What are the questions like?

Anonymous
08/25/24(Sun)01:47:00 No.102070026

Anonymous 08/25/24(Sun)01:47:00 No.102070026

>>102069995
https://simple-bench.com/about.html

Anonymous
08/25/24(Sun)01:50:08 No.102070046

Anonymous 08/25/24(Sun)01:50:08 No.102070046

>>102069895
Still miqu.

Anonymous
08/25/24(Sun)01:51:04 No.102070059

Anonymous 08/25/24(Sun)01:51:04 No.102070059

Miku perseveres.

Anonymous
08/25/24(Sun)01:51:14 No.102070062

Anonymous 08/25/24(Sun)01:51:14 No.102070062

smedrins

Anonymous
08/25/24(Sun)01:51:43 No.102070066

Anonymous 08/25/24(Sun)01:51:43 No.102070066

https://github.com/ggerganov/llama.cpp/pull/8542
#8542 CPU/CUDA: Gemma 2 FlashAttention support merged

Anonymous
08/25/24(Sun)01:53:48 No.102070088

Anonymous 08/25/24(Sun)01:53:48 No.102070088

>gemma-2-27b-it still mogs every other model in existence for size/quality ratio and it's not even close
I rely on super structured outputs, keeping track of stats, etc. and gemma is able to keep it together in areas where even 70b models fail.
What is jewgle's secret sauce and why aren't other models using it?

Anonymous
08/25/24(Sun)01:53:49 No.102070090

Anonymous 08/25/24(Sun)01:53:49 No.102070090

>>102069696
https://www.youtube.com/playlist?list=PLAqhIrjkxbuWI23v9cThsA9GvCAUhRvKZ

https://www.youtube.com/playlist?list=PLfYUBJiXbdtSvpQjSnJJ_PmDQB_VyT5iU

https://www.youtube.com/playlist?list=PLfYUBJiXbdtRUvTUYpLdfHHp9a58nWVXP

Anonymous
08/25/24(Sun)01:54:00 No.102070091

Anonymous 08/25/24(Sun)01:54:00 No.102070091

>>102068496
>Anyone else like this?
>>transition
Yes, we all love Miku!

Anonymous
08/25/24(Sun)01:54:32 No.102070097

Anonymous 08/25/24(Sun)01:54:32 No.102070097

>>102069696
https://www.youtube.com/watch?v=kCc8FmEb1nY
https://course.fast.ai/

Anonymous
08/25/24(Sun)01:55:15 No.102070100

Anonymous 08/25/24(Sun)01:55:15 No.102070100

File: 5375c1f9ea490b85e7d4af7a9(...).jpg (51 KB, 600x800)

51 KB JPG

>>102070062
NOOOOOOOOOOOOOOOO

Anonymous
08/25/24(Sun)01:56:39 No.102070110

Anonymous 08/25/24(Sun)01:56:39 No.102070110

>>102070026
Ahh so it's world-modelling. Pretty good.

Anonymous
08/25/24(Sun)02:03:02 No.102070155

Anonymous 08/25/24(Sun)02:03:02 No.102070155

>>102070066
For some reason I thought this happened a long time ago.

Anonymous
08/25/24(Sun)02:06:19 No.102070188

Anonymous 08/25/24(Sun)02:06:19 No.102070188

File: ComfyUI_temp_vyhpt_00187_.png (1.43 MB, 1024x1024)

1.43 MB PNG

>>102069907

Anonymous
08/25/24(Sun)02:07:40 No.102070200

Anonymous 08/25/24(Sun)02:07:40 No.102070200

>>102070155
The PR is a month old.

Anonymous
08/25/24(Sun)02:08:29 No.102070209

Anonymous 08/25/24(Sun)02:08:29 No.102070209

>>102070090
>>102070097

thank you friends

Anonymous
08/25/24(Sun)02:09:54 No.102070228

Anonymous 08/25/24(Sun)02:09:54 No.102070228

>>102070046
Really? Thought there'd be a new frontrunner by now

Anonymous
08/25/24(Sun)02:12:56 No.102070252

Anonymous 08/25/24(Sun)02:12:56 No.102070252

>>102070228
Anon is baiting, although the 48gb ram range is deprived of anything good, really your best bet is gemma 2

Anonymous
08/25/24(Sun)02:13:46 No.102070256

Anonymous 08/25/24(Sun)02:13:46 No.102070256

>>102069696
>>102070209
also for books:
https://udlbook.github.io/udlbook/
https://d2l.ai/

Anonymous
08/25/24(Sun)02:15:52 No.102070275

Anonymous 08/25/24(Sun)02:15:52 No.102070275

>>102070228
I disagree with that anon that it's still Miqu, but it's true that the 70B range has had lackluster advances the last 6 months (relative to small and very large models, which have both had a bunch of good new stuff)

Anonymous
08/25/24(Sun)02:17:54 No.102070288

Anonymous 08/25/24(Sun)02:17:54 No.102070288

>>102070252
Haven't checked in for a while. There used be a bunch of 70B L2 tunes that would run with exllama. Has that changed with L3?

Anonymous
08/25/24(Sun)02:21:45 No.102070322

Anonymous 08/25/24(Sun)02:21:45 No.102070322

>>102070188
Miker-chang!

Anonymous
08/25/24(Sun)02:24:52 No.102070352

Anonymous 08/25/24(Sun)02:24:52 No.102070352

>>102070288
L3 is obsolete now with L3.1
You can run low-quant llama 3.1 but it's very dry.

Anonymous
08/25/24(Sun)02:42:13 No.102070475

Anonymous 08/25/24(Sun)02:42:13 No.102070475

Recently i got hit with nostalgia for old AI Dungeon, and after looking at the options i decided to give running it locally a shot.
I'm currently working out of a mini pc with a few laptop components, specifically an intel iris. After leafing through the bins and docs, i noticed they only talk about nvidia or amd gpus.
I probably already know the answer, but i thought i might as well ask before i completely give up.
Am i fucked?

Anonymous
08/25/24(Sun)02:45:32 No.102070499

Anonymous 08/25/24(Sun)02:45:32 No.102070499

>>102070475
We use llama.cpp now, AI Dungeon's documentation is from an era before the llama model even existed and before chatgpt or instruct. You're fucked because you are looking at instructions on running gpt-2 1.5b on a gtx 1080, you might as well be reading a magazine on how to install Windows 3.1 from the 1990s, get with the times and use fucking sillytavern

Anonymous
08/25/24(Sun)02:47:59 No.102070528

Anonymous 08/25/24(Sun)02:47:59 No.102070528

In the original lora paper they only used weight matrices from the attention layers.
Is that still done for both SD models and LLMs? Or what layers are targeted for making adapters now?

Anonymous
08/25/24(Sun)02:50:50 No.102070555

Anonymous 08/25/24(Sun)02:50:50 No.102070555

>>102070499
That's not what i meant?
I'm talking about the very docs in the OP, I'm simply asking if there's any problem running them on an intel gpu instead of anything amd or nvidia.

Anonymous
08/25/24(Sun)02:51:58 No.102070567

Anonymous 08/25/24(Sun)02:51:58 No.102070567

>>102070555
I could tell you the answer (since I know the exact answer to your question), but you're an annoying faggot so I won't

Anonymous
08/25/24(Sun)02:55:12 No.102070592

Anonymous 08/25/24(Sun)02:55:12 No.102070592

>>102070567
that's definitely quite annoying behavior

Anonymous
08/25/24(Sun)03:00:02 No.102070627

Anonymous 08/25/24(Sun)03:00:02 No.102070627

File: Screenshot_20240825_155722.png (710 KB, 1810x1159)

710 KB PNG

12b, how far us vramlets have come.

Anonymous
08/25/24(Sun)03:22:21 No.102070803

Anonymous 08/25/24(Sun)03:22:21 No.102070803

File: newdawnmodels.png (192 KB, 1336x852)

192 KB PNG

>>102070252
I'm gonna have a go with "New Dawn" 70B that seemed to have some good reviews. I'm a bit out of practice nowdays, which of these would I need to download to fit across 2 3090s?

Anonymous
08/25/24(Sun)03:23:59 No.102070818

Anonymous 08/25/24(Sun)03:23:59 No.102070818

>>102070088
fuck off bootlicker

Anonymous
08/25/24(Sun)03:25:00 No.102070828

Anonymous 08/25/24(Sun)03:25:00 No.102070828

>Architecture: Phi-3.5-MoE has 16x3.8B parameters with 6.6B active parameters when using 2 experts.
so as I understand it, you get to save a billion parameters by the fact that the experts share attention layers, right? IIRC mixtral was 13b active with two 7b experts, so seems consistent across scales. does this mean that many small experts are just better than few large experts? what factors do people consider when deciding the size and amount of experts for their moe architectures?

Anonymous
08/25/24(Sun)03:31:56 No.102070884

Anonymous 08/25/24(Sun)03:31:56 No.102070884

>>102070818
>stop promoting open weights models with permissive licenses >:(
ywnbaw

Anonymous
08/25/24(Sun)03:34:59 No.102070912

Anonymous 08/25/24(Sun)03:34:59 No.102070912

>>102069450
I like this Miku

Anonymous
08/25/24(Sun)03:43:47 No.102070997

Anonymous 08/25/24(Sun)03:43:47 No.102070997

There are not enough 250b-300b range models. Jumping from 70 to 400 is just insane, while largestral is too small a step

Anonymous
08/25/24(Sun)03:55:46 No.102071090

Anonymous 08/25/24(Sun)03:55:46 No.102071090

>>102070997
I just wish we go bigger soon. 400B is a good step forward but I think 1T (possibly MoE) will be the sweet spot for performance by the feel of it

Anonymous
08/25/24(Sun)04:38:11 No.102071428

Anonymous 08/25/24(Sun)04:38:11 No.102071428

>>102071090
can i run it at iq1xxs

Anonymous
08/25/24(Sun)04:41:23 No.102071457

Anonymous 08/25/24(Sun)04:41:23 No.102071457

I found a way to quantize bitnet down to 0.7bpw guys, just gotta wait for the first bitnet model to test it on haha

Anonymous
08/25/24(Sun)04:46:50 No.102071497

Anonymous 08/25/24(Sun)04:46:50 No.102071497

>>102071457
small if true

Anonymous
08/25/24(Sun)04:47:12 No.102071498

Anonymous 08/25/24(Sun)04:47:12 No.102071498

Hey guys how do I join Anthracite?

Anonymous
08/25/24(Sun)04:49:39 No.102071526

Anonymous 08/25/24(Sun)04:49:39 No.102071526

>>102071498
HRT

Anonymous
08/25/24(Sun)05:01:23 No.102071617

Anonymous 08/25/24(Sun)05:01:23 No.102071617

For RAG, do people usually use embeddings generated by clip models, or extracted from llama-server's embedding? What's the difference between these two?

Anonymous
08/25/24(Sun)05:03:21 No.102071630

Anonymous 08/25/24(Sun)05:03:21 No.102071630

File: cucumber.jpg (21 KB, 450x369)

21 KB JPG

>>102071526
Hispania Racing Team?

Anonymous
08/25/24(Sun)05:05:35 No.102071648

Anonymous 08/25/24(Sun)05:05:35 No.102071648

>>102071630
Hardware replacement therapy.

Anonymous
08/25/24(Sun)05:24:12 No.102071788

Anonymous 08/25/24(Sun)05:24:12 No.102071788

File: 1723327478655918.png (34 KB, 393x109)

34 KB PNG

Open source AI is just too dangerous.

Anonymous
08/25/24(Sun)05:27:39 No.102071819

Anonymous 08/25/24(Sun)05:27:39 No.102071819

People who are actually threatened by glorified autocorrect aren't human and should be legally classified as cattle.

Anonymous
08/25/24(Sun)05:31:47 No.102071851

Anonymous 08/25/24(Sun)05:31:47 No.102071851

>>102071819
you'll say that until it autocorrects the next tokens of your job

Anonymous
08/25/24(Sun)05:39:33 No.102071918

Anonymous 08/25/24(Sun)05:39:33 No.102071918

>>102071851
jokes on you I already got replaced by migrants

Anonymous
08/25/24(Sun)05:50:15 No.102072036

Anonymous 08/25/24(Sun)05:50:15 No.102072036

>>102071788
Is that some ai voice? It doesn't say sentences naturally.

Anonymous
08/25/24(Sun)05:59:44 No.102072117

Anonymous 08/25/24(Sun)05:59:44 No.102072117

>>102072036
I thought the same but his channel has videos from 5 years ago where he talks exactly the same.

Anonymous
08/25/24(Sun)06:00:31 No.102072125

Anonymous 08/25/24(Sun)06:00:31 No.102072125

>>102072117
It's like he records every sentence separately.

Anonymous
08/25/24(Sun)06:18:46 No.102072301

Anonymous 08/25/24(Sun)06:18:46 No.102072301

>>102069130
The Mikutroon ruined /lmg but luckily, he posts far fewer Miku (male) today

Anonymous
08/25/24(Sun)07:02:00 No.102072755

Anonymous 08/25/24(Sun)07:02:00 No.102072755

Speaking of lanes, who's more powerful, Miku or Lain?

Anonymous
08/25/24(Sun)07:02:00 No.102072756

Anonymous 08/25/24(Sun)07:02:00 No.102072756

Is there anything like a dynamic token response? Sometimes I only want to reply with something short to move the scene forward, like a question, and it would be nice if the AI could reply with something brief too.

Anonymous
08/25/24(Sun)07:08:13 No.102072828

Anonymous 08/25/24(Sun)07:08:13 No.102072828

I've been thinking on tuning my models with a masked prefill for each turn.

Would that, work? Something like Claude prefills, to reply in x style, for a personal assistant model with their baked in persona. I just want someone to berate and mock me while I work.

I already have few-shot examples and they work well, I just want to improve on it.

<Reply in a rude and mocking tone> Output

Anonymous
08/25/24(Sun)07:08:51 No.102072831

Anonymous 08/25/24(Sun)07:08:51 No.102072831

>>102072756
There's not really any parameters you can set or token sampling strategy that makes the AI prefer to say something more briefly or verbosely. It just depends on the model itself and the prompt, e.g. give it system instructions to try to be of a similar length to each preceding user response and hope it's smart enough to follow the directions.

Anonymous
08/25/24(Sun)07:30:20 No.102073030

Anonymous 08/25/24(Sun)07:30:20 No.102073030

>>102072828
>I just want someone to berate and mock me while I work.
getting married solved this for me

Anonymous
08/25/24(Sun)07:44:10 No.102073189

Anonymous 08/25/24(Sun)07:44:10 No.102073189

https://timesofindia.indiatimes.com/technology/tech-news/over-100-google-deepmind-employees-write-open-letter-want-google-to-stop-working-on-these-contracts/articleshow/112753468.cms

Anonymous
08/25/24(Sun)08:01:38 No.102073361

Anonymous 08/25/24(Sun)08:01:38 No.102073361

>>102073189
>noooooo you can't use AI for the heckin military it is just supposed to turn george washington into a black trans womxn!!!

Anonymous
08/25/24(Sun)08:03:52 No.102073392

Anonymous 08/25/24(Sun)08:03:52 No.102073392

How often do transient power spikes occur during LLM inference? Wondering if its time get a better PSU as well.

Anonymous
08/25/24(Sun)08:03:57 No.102073398

Anonymous 08/25/24(Sun)08:03:57 No.102073398

### Sampler Proposal
"phrase_ban"

#### Situation
In the last 74 messages(~8kt) between me and {{char}}(Mistral Large) "eye" can be found 14 times, all in {{char}}'s messages. That's roughly in 38% of {{char}}'s messages! Almost 2 in 5 messages discussed eyes! What the hell? The conversation was SFW. Where does this strong eye bias come from? Makes me want go RP with 2B because she has a blindfold.

#### Problem
Models sample tokens without thinking forward. Slop phrases are usually divided in multiple common tokens which can be used in non-slop situations, therefore banning them is not an option.

#### Solution
Add a backtrack function to sampling. Here's how it should work:
1. Scan latest tokens for slop phrases.
2. If slop is found, backtrack to the place where the first slop token occurred, deleting the entire slop phrase.
3. Sample again, but with slop token added to ban list at that place.
4. If another slop phrase is generated, repeat the process, add another slop token to that list.

#### Example
Banned phrase: " send shivers"
LLM generates "Her skillful ministrations send shivers", triggers backtrack to "Her skillful ministrations", this time " send" token is banned, therefore the model has to write something else.

How does that sound? Is it possible to implement in llama.cpp? @kalomaze, can you do it?

Anonymous
08/25/24(Sun)08:08:33 No.102073451

Anonymous 08/25/24(Sun)08:08:33 No.102073451

>>102073398
If you want to add another command-line argument that takes a string value, you can follow the pattern established in your existing code. Here's how you could do it:

### Step-by-Step Guide:

1. **Define the New Parameter in `sparams`**:
- First, ensure that your `sparams` structure or class has a field to hold the new string parameter. For example:

```cpp
struct SimulationParams {
// ... existing parameters ...
std::string newStringParam; // Add this line
};
```

2. **Add the Argument Parsing Logic**:
- Extend your argument parsing code with a new `if` statement for the new argument. Here's how you might add a `--new-string-param` argument:

```cpp
if (arg == "--new-string-param") {
CHECK_ARG
sparams.newStringParam = argv[i]; // Directly assign since it's already a string
return true;
}
```

- Note that since `argv[i]` is already a `char*`, you can directly assign it to a `std::string` without needing conversion functions like `std::stof` or `std::stoi`.

3. **Update the `CHECK_ARG` Macro/Function**:
- Ensure that `CHECK_ARG` is designed to handle string arguments as well. If `CHECK_ARG` checks for the existence of `argv[i]` (where `i` would be the index for the argument value), this should work as is. However, if it does something specific for numeric conversion, you might want to adjust it:

```cpp
#define CHECK_ARG \
if (i + 1 >= argc) { \
std::cerr << "Error: Argument expected for " << arg << std::endl; \
return false; \
} \
++i; // Move to the next argument
```

WHAT THE FUCK? Can't they make it more simple? Why do I need all that shit to add a simple string argument? This is next level mental illness. No wonder programmers troon out. Holy fuck, I hate programming so much it's not real.

Anonymous
08/25/24(Sun)08:14:30 No.102073534

Anonymous 08/25/24(Sun)08:14:30 No.102073534

Is there a list anywhere of highly specialised small models, whether for specific fields of knowledge or programming languages?

Anonymous
08/25/24(Sun)08:23:22 No.102073663

Anonymous 08/25/24(Sun)08:23:22 No.102073663

>>102073534
there aren't any, small models are always retarded

Anonymous
08/25/24(Sun)08:35:46 No.102073792

Anonymous 08/25/24(Sun)08:35:46 No.102073792

File: wtf.png (1.58 MB, 3396x2670)

1.58 MB PNG

I'm trying to load a "New Dawn" L3 4.5bpw quant across a 4090 & 3090 using exl2 in oooba. The thing loads but then the whole computer starts grinding to a halt. I managed to get one output at 0.12 tokens/s before I gave up and killed the process. Also the 4090 spazzes out at 100% utilisation while the 3090 is pretty much idle (even though I can see it has the model loaded). Any ideas on where this could be messing up? Cheers

Anonymous
08/25/24(Sun)08:42:43 No.102073867

Anonymous 08/25/24(Sun)08:42:43 No.102073867

>>102073792

Which loader you using? Make sure to enter how much each GPU is going to split the model, should be under "tensor_split" option if you're using Ooba. Also, make sure enable cache_4bit and tensorcores options if you haven't yet. I was having the same issue last night and that fixed it.

Anonymous
08/25/24(Sun)08:45:10 No.102073888

Anonymous 08/25/24(Sun)08:45:10 No.102073888

>>102073867
>Which loader you using?
>102073792
>using exl2 in oooba.
cmon anon

Anonymous
08/25/24(Sun)08:46:55 No.102073909

Anonymous 08/25/24(Sun)08:46:55 No.102073909

Good gemma 27b sextune that isn't done by drummer(fuck that guy)?

Anonymous
08/25/24(Sun)08:50:45 No.102073955

Anonymous 08/25/24(Sun)08:50:45 No.102073955

>>102073909
>good
>gemma
lmao

Anonymous
08/25/24(Sun)08:51:47 No.102073973

Anonymous 08/25/24(Sun)08:51:47 No.102073973

>>102073888

That's what lack of sleep gets you. Thanks for reminding me.

Anonymous
08/25/24(Sun)08:51:57 No.102073976

Anonymous 08/25/24(Sun)08:51:57 No.102073976

>>102073909
>Good
>isn't done by drummer
choose 0 because all llms are bad lmao

Anonymous
08/25/24(Sun)08:53:15 No.102073994

Anonymous 08/25/24(Sun)08:53:15 No.102073994

>>102073867
Yep tried using autosplit and specifying a 21.5/24.5 split, same issue. Also tried using the 4bit cache and not. Same deal. Couldn't see anything about tensorcorse options. It's just weird that the 4090 is maxing out and dipping into RAM while the 3090 is idling, even though they're both sharing the model. Used to work fine with L2 models a few months back.

Anonymous
08/25/24(Sun)08:57:22 No.102074053

Anonymous 08/25/24(Sun)08:57:22 No.102074053

>>102073994
>Used to work fine with L2 models a few months back.
I see new dawn has a 32k l3 version and a 128k l3.1 those are much bigger ctx sizes than l2 ever had are you setting it to something reasonable as a test?

Anonymous
08/25/24(Sun)09:02:11 No.102074102

Anonymous 08/25/24(Sun)09:02:11 No.102074102

>>102074053
Just tried midnight miku at max_seq_len of 4096 and getting the same thing happening. I'm guessing there's something funky going on with the 3090, like it's loading the model but not being utilised for any tokenization. Very odd.

Anonymous
08/25/24(Sun)09:08:38 No.102074164

Anonymous 08/25/24(Sun)09:08:38 No.102074164

>>102074102
max_seq_len is how many tokens to generate, not context size

Anonymous
08/25/24(Sun)09:12:32 No.102074213

Anonymous 08/25/24(Sun)09:12:32 No.102074213

>>102074164
Ah right, thanks. Where do I lower the context size in ooba?

Anonymous
08/25/24(Sun)09:15:32 No.102074249

Anonymous 08/25/24(Sun)09:15:32 No.102074249

>>102074213
n_ctx

Anonymous
08/25/24(Sun)09:37:33 No.102074486

Anonymous 08/25/24(Sun)09:37:33 No.102074486

>group play
>tell four girls to line up so they can come one by one, stroke my dick with their tits and count out loud ten rubs
>They do! Except each girl always counts to ten in one post instead of spacing it out. Sometimes can get them to do two posts if they need to talk more or some other filler that slows the count
>stay up until 7am playing despite the less than perfect play

Any clever solutions to prolong the pace? Rocinante 12b

Anonymous
08/25/24(Sun)09:51:04 No.102074651

Anonymous 08/25/24(Sun)09:51:04 No.102074651

Anyone got a mistral Nemo template that works for most fine tunes? Lost mine somewhere

Anonymous
08/25/24(Sun)10:05:20 No.102074832

Anonymous 08/25/24(Sun)10:05:20 No.102074832

>>102074651
ya. crackprompt.

Anonymous
08/25/24(Sun)10:11:37 No.102074895

Anonymous 08/25/24(Sun)10:11:37 No.102074895

>>102074832
I remember finding an instruct template in a rentry but I lost the link

Anonymous
08/25/24(Sun)10:14:01 No.102074921

Anonymous 08/25/24(Sun)10:14:01 No.102074921

>>102074486
>Any clever solutions to prolong the pace
yeah say "prolong the pace and count to ten over several messages"

Anonymous
08/25/24(Sun)10:44:46 No.102075306

Anonymous 08/25/24(Sun)10:44:46 No.102075306

File: 1711796395108897.webm (3.99 MB, 724x720)

3.99 MB WEBM

Anonymous
08/25/24(Sun)10:49:51 No.102075375

Anonymous 08/25/24(Sun)10:49:51 No.102075375

>>102068958
This is a very beautiful AI generated image

Anonymous
08/25/24(Sun)11:05:05 No.102075585

Anonymous 08/25/24(Sun)11:05:05 No.102075585

I wish benchmarking models was easier, can't run any of the popular benchmarks I see on local models, shits so complicated

Anonymous
08/25/24(Sun)11:06:39 No.102075602

Anonymous 08/25/24(Sun)11:06:39 No.102075602

File: 1717107990599494.png (158 KB, 1334x469)

158 KB PNG

>>102069967
>Human (avg)
Average of what? Who was tested? Humans can be pretty dumb too

Anonymous
08/25/24(Sun)11:08:24 No.102075629

Anonymous 08/25/24(Sun)11:08:24 No.102075629

>>102071617
no one?

Anonymous
08/25/24(Sun)11:15:00 No.102075730

Anonymous 08/25/24(Sun)11:15:00 No.102075730

What would be the ideal hardware scenario for speculative decoding?

Anonymous
08/25/24(Sun)11:16:07 No.102075746

Anonymous 08/25/24(Sun)11:16:07 No.102075746

>>102075602
They probably just didn't thoroughly read the question, I was also confused until I parsed "yellow cookie and three others"

Anonymous
08/25/24(Sun)11:22:52 No.102075838

Anonymous 08/25/24(Sun)11:22:52 No.102075838

>>102075730
DGX B200.

Anonymous
08/25/24(Sun)11:28:40 No.102075905

Anonymous 08/25/24(Sun)11:28:40 No.102075905

>>102075746
A error is a error

Anonymous
08/25/24(Sun)11:29:32 No.102075915

Anonymous 08/25/24(Sun)11:29:32 No.102075915

>>102075746
I didn't have to "thoroughly read the question" to get it right, I just counted the number of eaten cookies while reading, it's a very simple question

Anonymous
08/25/24(Sun)11:37:00 No.102076021

Anonymous 08/25/24(Sun)11:37:00 No.102076021

File: 1530428847513.png (1.59 MB, 10000x10000)

1.59 MB PNG

After noticing some things about the migusex poster anon, I've come to the conclusion that it's very likely he was someone from a "past life" of mine, a friend. That's cool. We're here forever.
Or everything I noticed was a coincidence and they just have very similar character.

Anonymous
08/25/24(Sun)11:37:26 No.102076027

Anonymous 08/25/24(Sun)11:37:26 No.102076027

>>102075585
running publicly available benchmarks is useless since models will cheat and train on them. Make your own private one to test for your own use cases and judge new models based on that. Just stay consistent over time and don't retire your tests for new ones until they are being consistently passed by current gen models.

Anonymous
08/25/24(Sun)11:41:57 No.102076091

Anonymous 08/25/24(Sun)11:41:57 No.102076091

>>102075838
what about GB200 NVL72 ?

Anonymous
08/25/24(Sun)11:43:16 No.102076112

Anonymous 08/25/24(Sun)11:43:16 No.102076112

>>102075915
>I didn't have to "thoroughly read the question"
You did, you carefully read each statement and updated the scenario as you went, someone not paying attention will read "X hat ate X cookie, Y hat ate Y cookie" and by the time they get to "Z hat ate..." the brain activates the power saving mode and skips the "the Z cookie and three others" assuming what happened and going to the next sentence, its how trick questions works, doesn't have to do with being dumb, just not paying enough attention, if you offer someone $100k to get the question right they will read the whole thing 100 times before giving an answer, but with a random internet questionnaire with no stakes ppl just don't care and go with the power saving mode

Anonymous
08/25/24(Sun)11:52:59 No.102076229

Anonymous 08/25/24(Sun)11:52:59 No.102076229

Anyone already tried Euryale 2.2? What's the verdict?

Anonymous
08/25/24(Sun)11:56:41 No.102076280

Anonymous 08/25/24(Sun)11:56:41 No.102076280

>>102076229
I didn't try but I can say with confidence that fine-tunes are memes.

Anonymous
08/25/24(Sun)12:04:24 No.102076403

Anonymous 08/25/24(Sun)12:04:24 No.102076403

>>102076229
Wasn't impressed. I think Sao is wasting time with llama-3. After using it for sometime I got an impression that it is completely unsalvageable for ERP.

Anonymous
08/25/24(Sun)12:05:33 No.102076417

Anonymous 08/25/24(Sun)12:05:33 No.102076417

>>102068974
>--Meta's plans for Grok 3 and massive H100 training:
Fix your shit recap bot.

Anonymous
08/25/24(Sun)12:15:36 No.102076542

Anonymous 08/25/24(Sun)12:15:36 No.102076542

>>102075746
I skimmed it really fast when I did it saw the basic bait answer but also realized there is a bunch of information I didn't account for so decided to read again.

Anonymous
08/25/24(Sun)12:26:14 No.102076686

Anonymous 08/25/24(Sun)12:26:14 No.102076686

>>102076417
>he doesn't know

Anonymous
08/25/24(Sun)12:27:33 No.102076703

Anonymous 08/25/24(Sun)12:27:33 No.102076703

>>102075746
They could have adhd or some other condition that affects their attention, or they could be retarded.

Anonymous
08/25/24(Sun)12:30:05 No.102076738

Anonymous 08/25/24(Sun)12:30:05 No.102076738

>>102076403
Llama 3 was trained with NSFW filtered out, meaning there is very little NSFW ability to "unlock" with a finetune. You just rely on the model to learn the finetune data itself.

Anonymous
08/25/24(Sun)12:30:32 No.102076748

Anonymous 08/25/24(Sun)12:30:32 No.102076748

>>102076703
ADHD isn't real

Anonymous
08/25/24(Sun)12:32:43 No.102076783

Anonymous 08/25/24(Sun)12:32:43 No.102076783

>>102076748
I literally start thinking about random shit while reading stuff all the time.
I don't think I have adhd but I can imagine what if feels like for the people who do.

Anonymous
08/25/24(Sun)12:34:06 No.102076806

Anonymous 08/25/24(Sun)12:34:06 No.102076806

>>102076229
don't care, let me know when he does 405b

Anonymous
08/25/24(Sun)12:35:07 No.102076821

Anonymous 08/25/24(Sun)12:35:07 No.102076821

>>102076783
You can imagine it just fine, people who "do" have it are lazy and/or stupid. Just imagine what it's like to be your normal self.

Anonymous
08/25/24(Sun)12:39:40 No.102076892

Anonymous 08/25/24(Sun)12:39:40 No.102076892

>>102076783
that's called having a functional brain
I was diagnosed with adhd as a child but I don't think it describes a meaningful condition. it's just an invented pathology for people who feed themselves a steady diet of overstimulating media and failed to develop executive functioning skills (and yes - they are skills, you can simply train them rather than hopping on amphetamines)
I am sure for some small subset of people with adhd diagnoses there is some legitimate innate disorder of the brain involved, or their issues are so significant as to require pharmaceutical intervention, but I think those cases are few and far between and the rest are... well, what >>102076821 said

Anonymous
08/25/24(Sun)12:52:10 No.102077062

Anonymous 08/25/24(Sun)12:52:10 No.102077062

>>102076738
This can't be stressed enough. LLMs don't actually learn that much with modest finetuning (if anything at all) other than format, style and making whatever they had the chance to learn during pretraining extractable/usable in practice. Touch the weights as little as possible or go big, IMO.

Anonymous
08/25/24(Sun)13:00:23 No.102077162

Anonymous 08/25/24(Sun)13:00:23 No.102077162

>>102077062
The only hope is another continued pretrain like Miqu.

Anonymous
08/25/24(Sun)13:01:22 No.102077173

Anonymous 08/25/24(Sun)13:01:22 No.102077173

>>102070818
We don't kinkshame here

Anonymous
08/25/24(Sun)13:03:20 No.102077205

Anonymous 08/25/24(Sun)13:03:20 No.102077205

Mixtral 7x8B still the best option for 8GB VRAMlets?

Anonymous
08/25/24(Sun)13:07:36 No.102077273

Anonymous 08/25/24(Sun)13:07:36 No.102077273

>>102068958
No advanced model from today can replicate the magic of first MikuSex with Llama 1.
I miss my old Miku. There isn't even a good Miku card for SillyTavern. All of them are shit.

Anonymous
08/25/24(Sun)13:08:33 No.102077291

Anonymous 08/25/24(Sun)13:08:33 No.102077291

>>102077205
Yeah

Anonymous
08/25/24(Sun)13:09:04 No.102077301

Anonymous 08/25/24(Sun)13:09:04 No.102077301

>>102077205
huh? wont that run on cpu anyway?

Anonymous
08/25/24(Sun)13:10:06 No.102077315

Anonymous 08/25/24(Sun)13:10:06 No.102077315

>>102077301
or the fact that one expert is on gpu is enough to make it fast enough?

Anonymous
08/25/24(Sun)13:10:14 No.102077318

Anonymous 08/25/24(Sun)13:10:14 No.102077318

>>102077205
I was about to ask the same when I loaded mixtral and didn't need to reroll on the first gen.

Anonymous
08/25/24(Sun)13:11:10 No.102077338

Anonymous 08/25/24(Sun)13:11:10 No.102077338

>>102076403
Did you try the 72B magnum? Is it better than miqu merges?

Anonymous
08/25/24(Sun)13:12:42 No.102077359

Anonymous 08/25/24(Sun)13:12:42 No.102077359

>>102077162
I wonder how much a continued pretrain would cost, assuming it's something like 20B tokens.

Anonymous
08/25/24(Sun)13:20:52 No.102077473

Anonymous 08/25/24(Sun)13:20:52 No.102077473

>>102076738
>>102077062
Is there some crossed wire in your demonic brain that causes you to get sexual pleasure from samefagging and spreading misinformation? Genuinely curious, not trolling.

Anonymous
08/25/24(Sun)13:21:49 No.102077492

Anonymous 08/25/24(Sun)13:21:49 No.102077492

>>102077473
no u

Anonymous
08/25/24(Sun)13:23:40 No.102077521

Anonymous 08/25/24(Sun)13:23:40 No.102077521

>>102077338
Sorry but I have a strict policy against touching anthrax.

Anonymous
08/25/24(Sun)13:29:05 No.102077608

Anonymous 08/25/24(Sun)13:29:05 No.102077608

>>102077473
The llama 3 and lora papers are publicly available. Educate yourself.

Anonymous
08/25/24(Sun)13:29:32 No.102077618

Anonymous 08/25/24(Sun)13:29:32 No.102077618

File: llama3filtering.png (507 KB, 1775x767)

507 KB PNG

>>102077473
See picrel

Anonymous
08/25/24(Sun)13:30:12 No.102077629

Anonymous 08/25/24(Sun)13:30:12 No.102077629

>>102077608
Appeal to authority fallacy.
Academics are the biggest pseud retards on the planet.

Anonymous
08/25/24(Sun)13:31:55 No.102077663

Anonymous 08/25/24(Sun)13:31:55 No.102077663

102077473
Retard

Anonymous
08/25/24(Sun)13:33:47 No.102077683

Anonymous 08/25/24(Sun)13:33:47 No.102077683

>>102077618
And yet there's plenty of functional llama-3 coom tunes. In fact you can still ERP with vanilla Llama-3. You can ERP with vanilla Gemma models. It's called inferencing for a reason you fucking brain-dead moron.

Anonymous
08/25/24(Sun)13:33:49 No.102077684

Anonymous 08/25/24(Sun)13:33:49 No.102077684

>>102077618
>filtering Pre-Training Data
This is Stability AI levels of retardation. No wonder it took them a whole year and 405B to catch up to the SOTA of 2023.

Anonymous
08/25/24(Sun)13:34:48 No.102077703

Anonymous 08/25/24(Sun)13:34:48 No.102077703

>>102077663
Cope, seethe, dilate

Anonymous
08/25/24(Sun)13:35:50 No.102077720

Anonymous 08/25/24(Sun)13:35:50 No.102077720

>>102075602
redditors are legit subhumans or bots

Anonymous
08/25/24(Sun)13:36:02 No.102077724

Anonymous 08/25/24(Sun)13:36:02 No.102077724

>>102077683
Functional is a good description. Largestral btfos any llama finetune, let alone stock instruct. Imagine if we got base model of that, or mistral medium 2.

Anonymous
08/25/24(Sun)13:37:05 No.102077746

Anonymous 08/25/24(Sun)13:37:05 No.102077746

>>102077684
The SOTA of 2023 was a 1.8T model. If they can do it with 405B in 2024 then that's a fair advancement for local. And the reality is that filtering the pretraining data in text is not the same thing as what Stability did with image models where doing that also affected the model's quality outside of ERP, but Llama 3.1 is an excellent assistant model. Then again it could also be fine for ERP, I never tried that.

Anonymous
08/25/24(Sun)13:38:05 No.102077766

Anonymous 08/25/24(Sun)13:38:05 No.102077766

>>102077746
>quality outside of ERP
*NSFW

Anonymous
08/25/24(Sun)13:38:28 No.102077774

Anonymous 08/25/24(Sun)13:38:28 No.102077774

>>102070627
what are the points at the bottom? Does it reflect model's internal thoughts?
Also, proompt pls.

Anonymous
08/25/24(Sun)13:39:15 No.102077784

Anonymous 08/25/24(Sun)13:39:15 No.102077784

>>102077683
Who said that they're *completely* incapable of NSFW? Yes, you can ERP with them even using the official instruct finetunes, but the variety and quality of the outputs will not be great and is likely going to decrease with future iterations as the pretraining filters will become increasingly aggressive. Wait until they'll perfect their LLM classifiers or start rewriting the pretraining web data at scale.

Anonymous
08/25/24(Sun)13:41:33 No.102077817

Anonymous 08/25/24(Sun)13:41:33 No.102077817

>>102077618
I'll just pay for Claude. Heard Anthropic are letting consumers use their models soon.

Anonymous
08/25/24(Sun)13:43:17 No.102077848

Anonymous 08/25/24(Sun)13:43:17 No.102077848

>>102071617
not sure what has CLIP got to so here but I use the embeddings. They can be from a different model but then you should use that model only for querying the embedding space further on.

Anonymous
08/25/24(Sun)13:52:29 No.102077971

Anonymous 08/25/24(Sun)13:52:29 No.102077971

>>102068958
>She winks at him playfully, even though she knows he hates it.

Anonymous
08/25/24(Sun)14:05:14 No.102078144

Anonymous 08/25/24(Sun)14:05:14 No.102078144

>>102077724
>Largestral btfos any llama finetune, let alone stock instruct.
I've been a bit shocked by the output of largestral vs 405b instruct.
The SVG that largestral produces for tasks that involve that is much better than 405b

Anonymous
08/25/24(Sun)14:12:22 No.102078247

Anonymous 08/25/24(Sun)14:12:22 No.102078247

Is SillyTavern still a good frontend to use? If not, what are some better alternatives?

I'm finding a lot of my characters speak on my behalf when trying to roleplay. Is there a way to avoid that?

Anonymous
08/25/24(Sun)14:14:37 No.102078281

Anonymous 08/25/24(Sun)14:14:37 No.102078281

>>102078247
I really doubt theres anything better or close to sillytarvern
>>102078247
>I'm finding a lot of my characters speak on my behalf when trying to roleplay
probably, missing stop tokens or bad model

Anonymous
08/25/24(Sun)14:20:10 No.102078375

Anonymous 08/25/24(Sun)14:20:10 No.102078375

>>102078247
>Is SillyTavern still a good frontend to use? If not, what are some better alternatives?
There isn't anything better as of right now, unless you want to build your own.
>I'm finding a lot of my characters speak on my behalf when trying to roleplay. Is there a way to avoid that?
Hard to tell what's wrong with that little info. Model? Card? Formatting settings?

Anonymous
08/25/24(Sun)14:20:12 No.102078377

Anonymous 08/25/24(Sun)14:20:12 No.102078377

>>102076686
Still need to show the plans. There's nothing about meta in this news where Grok2 is now top3 ranked models.

Anonymous
08/25/24(Sun)14:20:21 No.102078380

Anonymous 08/25/24(Sun)14:20:21 No.102078380

>>102078281
I should clarify, I've included {{user}} as a stop token but that results in ~20 token long responses because the bot will write a single line reply and then try to include my reply which gets terminated. I'm wondering if there is an effective system prompt or something I can put in the character cards to get the bots to go for more descriptive responses without assuming control of the player's actions.
>bad model
That was actually going to be my second question. I've got 24gb VRAM and don't know what a good model to use is.

Anonymous
08/25/24(Sun)14:23:00 No.102078419

Anonymous 08/25/24(Sun)14:23:00 No.102078419

>>102078375
I'm pretty confident that there isn't a problem with the cards. As I mentioned above, I have no idea what a good model to use is since I have not used local AI in a long while. I am open to recommendations for something I could use with 24gb VRAM.
Formatting is what I am most interested in. I recall that different models use different formatting so you kind of need to personalize it to the model (or base model in case of finetunes I imagine) but are there general rules or phrases that help get better results?

Anonymous
08/25/24(Sun)14:29:13 No.102078511

Anonymous 08/25/24(Sun)14:29:13 No.102078511

>>102078144
That's not really shocking since Llama is weak at coding overall and is trained more on general knowledge. I found that 70B performed better at my (proprietary) general knowledge benchmarks than 123B. On Livebench, 70B has a much lower coding score, but its average score is actually higher than 123B. So that makes sense.

Anonymous
08/25/24(Sun)14:29:42 No.102078520

Anonymous 08/25/24(Sun)14:29:42 No.102078520

>>102078419
Mistral nemo 12b or gemma 2 27b. If you are willing to offload to ram then you could go for a 70b or even largestral if you don't mind going at turtle speed.
>I recall that different models use different formatting
I'm not sure I understand correctly. All popular presets you'll ever need are included with sillytavern. Some of them might have minor errors like space in the wrong place but you can always check against the preset in the original model repo in tokenizer_config.json.

Anonymous
08/25/24(Sun)14:31:06 No.102078546

Anonymous 08/25/24(Sun)14:31:06 No.102078546

>>102078380
try the big recent ones from this guy
https://huggingface.co/TheDrummer

Anonymous
08/25/24(Sun)14:39:15 No.102078684

Anonymous 08/25/24(Sun)14:39:15 No.102078684

>>102078520
Thank you! I will look into both of those.
>I'm not sure I understand correctly.
I was talking about presets but I recall that when I last investigated models almost a year ago, I thought I remembered finding ones that were modifications of common models (e.g., "fantasticworlds" was a modification of Vicuna so it would use the same prompts and formatting as Vicuna). I wasn't sure if that was still the case wherein SillyTavern would have presets for the common models but people in this thread might recommend finetunes where the name of the parent model was not immediately obvious and the preset would not be in ST.

Anonymous
08/25/24(Sun)14:46:44 No.102078803

Anonymous 08/25/24(Sun)14:46:44 No.102078803

>>102078684
You mean those with the sliders on your left hand side? They are all pretty obsolete at this point. I think they just should be removed to avoid confusing new people.
These days these samplers do more harm than good. Just change the temperature based on needs (about 0.5 - 1.5) and set minp to 0.01 to cut off the crappiest candidates for output.

Anonymous
08/25/24(Sun)15:04:20 No.102079066

Anonymous 08/25/24(Sun)15:04:20 No.102079066

>>102078803
Ah, I see. I'll largely ignore all that stuff then.

Anonymous
08/25/24(Sun)15:14:38 No.102079222

Anonymous 08/25/24(Sun)15:14:38 No.102079222

>>102079066
>he fell for the thread troll
ngmi

Anonymous
08/25/24(Sun)15:19:46 No.102079290

Anonymous 08/25/24(Sun)15:19:46 No.102079290

it's time to admit we overcorrected by completely giving up on frankenmerges. yes they are more schizo and it doesn't make them smarter or anything, but it also provides a critical infusion of sovl and variety that is what most modern models lack the most. I think it's a perfectly acceptable technique if you are aiming to make a good RP model.

Anonymous
08/25/24(Sun)15:20:57 No.102079315

Anonymous 08/25/24(Sun)15:20:57 No.102079315

>>102077338
I tried Magnum wasn't impressed either. I only have 1x24GB and I would be very sad if I had 2x24GB at this point.

Anonymous
08/25/24(Sun)15:22:42 No.102079345

Anonymous 08/25/24(Sun)15:22:42 No.102079345

>>102076738
The information still seems to be there, since it's able to replicate it with some work. So there's a lot they've missed. It's more they spent more time training it to avoid it.

Anonymous
08/25/24(Sun)15:25:36 No.102079394

Anonymous 08/25/24(Sun)15:25:36 No.102079394

>>102077817
How are they not now? You can pay for use via an app or api.

Anonymous
08/25/24(Sun)15:31:12 No.102079480

Anonymous 08/25/24(Sun)15:31:12 No.102079480

>>102079290
Yes. Where is Undi?

Anonymous
08/25/24(Sun)15:32:13 No.102079495

Anonymous 08/25/24(Sun)15:32:13 No.102079495

>>102078546
Tried some models of his a couple of months ago. All were shit.

Anonymous
08/25/24(Sun)15:34:01 No.102079522

Anonymous 08/25/24(Sun)15:34:01 No.102079522

File: miku-unicorn-largestral.png (30 KB, 793x590)

30 KB PNG

is anyone still using svg output capabilities as a test of model intelligence?

Anonymous
08/25/24(Sun)15:36:16 No.102079559

Anonymous 08/25/24(Sun)15:36:16 No.102079559

>>102079290
This. There is no better method than frankenmerges if you want to create an RP (Really Poor) model.

Anonymous
08/25/24(Sun)15:45:48 No.102079723

Anonymous 08/25/24(Sun)15:45:48 No.102079723

>>102070046
>>102070228
command-r+ has superseded miqu months ago.
to those niggers who say 48gb is not enough for command-r+: i've run that shit with 24gb. with flash-attention and that speculative n-gram bullshit i got multiple tokens per second, running q4km iirc. you are mentally deranged subhumans and should give your cards to people who can use them better.

Anonymous
08/25/24(Sun)15:49:18 No.102079784

Anonymous 08/25/24(Sun)15:49:18 No.102079784

>>102079723
command-r+ is garbage, stop coping. Either use Largestral, Miqu or Nemo.

Anonymous
08/25/24(Sun)15:50:29 No.102079804

Anonymous 08/25/24(Sun)15:50:29 No.102079804

>>102079784
hi arthur not using your repeating messes sorry

Anonymous
08/25/24(Sun)15:57:48 No.102079902

Anonymous 08/25/24(Sun)15:57:48 No.102079902

>>102079784
All of them are okay, except for Nemo which I haven't tried and it sounds gay.
Have been out of the local game for about 3 months now and am mostly using Claude atm. But sometimes I mix in some Mistral Large or command-r+, or even gpt4 slop for a change of style. Makes it more interesting.
Except for variety I don't see a reason to use Miqu. But variety is not a bad reason at all. As a baseline, command-r+ is far better than all other local models I tried, though.

Anonymous
08/25/24(Sun)16:00:02 No.102079932

Anonymous 08/25/24(Sun)16:00:02 No.102079932

File: file.png (38 KB, 737x139)

38 KB PNG

Anonymous
08/25/24(Sun)16:03:31 No.102079980

Anonymous 08/25/24(Sun)16:03:31 No.102079980

>>102079932
What cucked model is that

Anonymous
08/25/24(Sun)16:03:38 No.102079984

Anonymous 08/25/24(Sun)16:03:38 No.102079984

>>102079902
>As a baseline, command-r+ is far better than all other local models I tried, though.
What others have you tried? Including deepseek, largestral and the 405b series?

Anonymous
08/25/24(Sun)16:06:11 No.102080019

Anonymous 08/25/24(Sun)16:06:11 No.102080019

File: Screenshot 2024-08-25 at (...).png (376 KB, 1241x1198)

376 KB PNG

I thought of continuing the game >>102067791 >>102068248, but the output isn't exactly great. On this turn, I went through a few iterations in order for the model to give improved outputs, but it wasn't great. As you can see, the model failed to even understand the instructions which should normally be quite easy to get. Thinking of ending it and just writing off small models and never recommending them honestly. Next time we can try 70B at Q8, or Largestral at Q5_K_S.
I'll upload the full log if no one wants to suggest any continuations. Then we can have a concrete reference for how good (bad) small models really are.

Anonymous
08/25/24(Sun)16:11:51 No.102080097

Anonymous 08/25/24(Sun)16:11:51 No.102080097

>>102080019
Shoulda gone with #2 after all

Anonymous
08/25/24(Sun)16:13:58 No.102080127

Anonymous 08/25/24(Sun)16:13:58 No.102080127

>>102080097
I mean we are kind of looking to challenge the model. A sex scene isn't anything special. I think someone posted a log from a 2B a while ago that showed it did it just fine at writing sex.

Anonymous
08/25/24(Sun)16:14:50 No.102080140

Anonymous 08/25/24(Sun)16:14:50 No.102080140

what are the hermes models?

also what is a lora and can I use them on top of llama 3.1 base to get it to behave differently?

also, what do I have to do with llama.cpp to get a long context window with llama 3.1? last time I used llama was llama2. do I just change the ctx parameter and call it a day? I tried setting it to 0 to get it from the model per the llama.cpp helpfile, but it wouldn't launch.

Anonymous
08/25/24(Sun)16:15:02 No.102080143

Anonymous 08/25/24(Sun)16:15:02 No.102080143

>>102080127
Yeah but it was COMPLEX sex with Miku

Anonymous
08/25/24(Sun)16:19:21 No.102080193

Anonymous 08/25/24(Sun)16:19:21 No.102080193

>>102080143
OK, then describe EXACTLY what that means or what I should prompt, and I will generate a log.

Anonymous
08/25/24(Sun)16:22:57 No.102080249

Anonymous 08/25/24(Sun)16:22:57 No.102080249

>>102080140
>what are the hermes models?
Finetuned models. Try them and make your own opinion about them.
>also what is a lora and can I use them on top of llama 3.1 base to get it to behave differently?
Not many, if any at all. They need to to be made for the same model architecture and you won't find many. Most "finetunes" are loras applied to their source model. It's not as simple as with image models.
>also, what do I have to do with llama.cpp to get a long context window with llama 3.1? last time I used llama was llama2. do I just change the ctx parameter and call it a day? I tried setting it to 0 to get it from the model per the llama.cpp helpfile, but it wouldn't launch.
-c N sets the context to N tokens. Just set it to the context length you want. By default, it will use the context length specified in the model, but it'll likely OOM with models with very long contexts. Start at 16k and increase until you fill up your memory or however much you want to use. The usable context is typically [much] lower than the claimed one, so you'll have to see where the model stops making sense.

Anonymous
08/25/24(Sun)16:23:00 No.102080250

Anonymous 08/25/24(Sun)16:23:00 No.102080250

>>102080140
Hermes are sloptunes using mostly synthetic data curated to supposedly make them follow system prompts and roleplay better. In my experience they don't live up to the promise that well.
LoRAs are files that are basically like a patch you can load on top of models that contains finetuned modifications to their weights. Some sparse finetunes just distribute those instead of the fully merged models, though that's less common nowadays. If you have a dataset you'd like to use you can make one yourself with less resources than a full finetune would take.
For llama.cpp you just change the ctx parameter. What error did you get when it failed to launch? Llama 3.1 uses 128k context which can fill up your VRAM especially if you aren't using flash attention. Use the -fa flag and try again, and if that still fails try also reducing ctx to 65536 or 32768, which is still plenty for most purposes.

Anonymous
08/25/24(Sun)16:29:04 No.102080336

Anonymous 08/25/24(Sun)16:29:04 No.102080336

File: LLM-history-fancy.png (737 KB, 6277x1302)

737 KB PNG

>>102080140
>last time I used llama was llama2.
Welcome back. See image for quick recap.

>what are the hermes models?
Models tuned by NousResearch. They are quite slopped(=trained on GPT). Were okay in L2 days.

>also what is a lora and can I use them on top of llama 3.1 base to get it to behave differently?
That's a small thingy you can add on top of a model. Yes, you can use them with L3 if you train/extract it. For technical details read a paper or ask an LLM.

>also, what do I have to do with llama.cpp to get a long context window with llama 3.1?
Run it with -c 131072 or don't provide it, it will autodetect. For "real" context see https://github.com/hsiehjackson/RULER.

Anonymous
08/25/24(Sun)16:29:12 No.102080338

Anonymous 08/25/24(Sun)16:29:12 No.102080338

>>102080250
>>102080249
>>102080250
thanks guys

>What error did you get when it failed to launch?
lol it just gave me the help file. Weird. Also, regarding running out of memory, I see this

>The environment variable GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 can be used to enable unified memory in Linux. This allows swapping to system RAM instead of crashing when the GPU VRAM is exhausted.

is that something I specify at runtime or is it a compile flag? it seems like a runtime thing, but my brain is telling me it's a compile flag for some reason

Anonymous
08/25/24(Sun)16:30:37 No.102080359

Anonymous 08/25/24(Sun)16:30:37 No.102080359

File: miku-largestral.png (26 KB, 538x700)

26 KB PNG

largestral's best attempt at rendering miku in svg with all her iconic attributes

Anonymous
08/25/24(Sun)16:33:55 No.102080410

Anonymous 08/25/24(Sun)16:33:55 No.102080410

>>102080359
It's unironically better than I would have expected.

Anonymous
08/25/24(Sun)16:34:41 No.102080423

Anonymous 08/25/24(Sun)16:34:41 No.102080423

>>102080359
I can't believe I lived to see the day an LLM was able to draw an humanoid body in svg

Anonymous
08/25/24(Sun)16:38:08 No.102080467

Anonymous 08/25/24(Sun)16:38:08 No.102080467

>>102080423
I'm sure there are no svg humanoids in its training set, at all

Anonymous
08/25/24(Sun)16:39:57 No.102080494

Anonymous 08/25/24(Sun)16:39:57 No.102080494

>>102078546
buy a rope

Anonymous
08/25/24(Sun)16:40:30 No.102080500

Anonymous 08/25/24(Sun)16:40:30 No.102080500

>>102080467
To be fair it's possible there are now because those Microsoft researchers showed their findings about GPT-4 drawing unicorns.

Anonymous
08/25/24(Sun)16:41:14 No.102080511

Anonymous 08/25/24(Sun)16:41:14 No.102080511

>>102079290
>critical infusion of sovl
Just increase your temperature. Frankenmerges are universally bad.

Anonymous
08/25/24(Sun)16:41:31 No.102080515

Anonymous 08/25/24(Sun)16:41:31 No.102080515

>>102080338
>GGML_CUDA_ENABLE_UNIFIED_MEMORY
It's gonna make inference slow, i think just as much as offloading part of the model to CPU. It's a compile-time flag:
>GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 make
If you're running out of memory. Try with a lower context until it works.

If you get the help dump, it means one of your options is not correct.
Make sure the thing works first, THEN mess with the compile flags. You won't know what to troubleshoot otherwise. Start with the defaults, the minimal CUDA flags and small context.

Anonymous
08/25/24(Sun)16:49:27 No.102080640

Anonymous 08/25/24(Sun)16:49:27 No.102080640

>>102080500
I was being sarcastic you drooling retard

Anonymous
08/25/24(Sun)16:49:57 No.102080649

Anonymous 08/25/24(Sun)16:49:57 No.102080649

I just thought about frankenmerges again. Was it something like: starting layers processed input as they usually did in normal models and did most of the work, then when you got to the point of frankenmerging layers, this middle part was doing only slight corrections to the signal (if you call it that). So those middle frankenmerged layers were actually damaging but they didn't damage everything to the point, where it was completely incoherent. And then final layers salvaged the retardation a bit?

Anonymous
08/25/24(Sun)16:50:16 No.102080652

Anonymous 08/25/24(Sun)16:50:16 No.102080652

>>102080511
>Just increase your temperature.
enjoy getting exactly the same slop but with occasional egregious mistakes thrown in!
frankenmerges at least fuck around with the model's inner workings and shake it out of its tendencies a bit, it'll also make mistakes but it's more likely to make novel or creative connections as well - even if they're fundamentally unsound it's better than the dull, unshakeable template most models draw from currently
frankenmerges don't add anything in terms of capabilities, I'll be the first to admit that, but I think they definitely can produce more pleasing output to read than their source components. if you have the vram to spare, why not?

Anonymous
08/25/24(Sun)16:51:01 No.102080667

Anonymous 08/25/24(Sun)16:51:01 No.102080667

>>102080640
I didn't know but I still love you, you flaming aids ridden faggot.

Anonymous
08/25/24(Sun)16:51:41 No.102080681

Anonymous 08/25/24(Sun)16:51:41 No.102080681

>>102080667
Th-thanks, you too...

Anonymous
08/25/24(Sun)16:52:37 No.102080692

Anonymous 08/25/24(Sun)16:52:37 No.102080692

>>102080652
Just load 2 different 7B's and randomly switch between them when genning next token.

Actually I wonder if something like this would fix repetitions and looping.

Anonymous
08/25/24(Sun)16:54:16 No.102080712

Anonymous 08/25/24(Sun)16:54:16 No.102080712

>>102080640
Not really significant since it can be taken both ways. The point is about being prepared for different tasks (or gaming benchmarks if you want to interpret it that way).

Anonymous
08/25/24(Sun)16:55:32 No.102080726

Anonymous 08/25/24(Sun)16:55:32 No.102080726

>>102080667
>>102080681
buy a condom

Anonymous
08/25/24(Sun)16:58:30 No.102080770

Anonymous 08/25/24(Sun)16:58:30 No.102080770

>>102080692
Kind of interesting idea. Another is to get a large model and only use it for very confident tokens. Then use a smaller model when it's not confident in the token. The existing speculative decoding code could help get this implementation going, but I'm not going to do it.

Anonymous
08/25/24(Sun)17:00:18 No.102080793

Anonymous 08/25/24(Sun)17:00:18 No.102080793

>>102080692
>Mixture of Retards
You might be onto something here unironically.

Anonymous
08/25/24(Sun)17:00:59 No.102080804

Anonymous 08/25/24(Sun)17:00:59 No.102080804

File: image.png (44 KB, 657x727)

44 KB PNG

>>102079522
Yeah, kinda, though I haven't tried with any new models since codestral. Some of its failures I included in picrel
Prompts:
>Short:
Draw a cute anime girl in PIL.
>Long:
Write a python script that draws a cute anime girl in PIL.
First, plan out the project, thinking out loud using tree of thought reasoning about what constitutes Hatsune Miku, such as: the shapes her body parts, the style and color of her clothing, and the style and color of her hair.
Second, create a simple flowchart guide, thinking about what shapes to use for each part of the drawing, and what colors to use.
Third, think about where each part of the drawing should be. For example, where each body part must be place to be anatomically correct, where clothing should be placed appropriately, and where facial features must be located.
Fourth and finally, follow the project guide to write the complete code.

>>102080359
Very good actually.

Anonymous
08/25/24(Sun)17:01:07 No.102080806

Anonymous 08/25/24(Sun)17:01:07 No.102080806

>>102080770
Isn't that just dynamic temp? When the distribution of logits is narrow it cranks the temp down, when it's wide it cranks the temp up

Anonymous
08/25/24(Sun)17:08:30 No.102080898

Anonymous 08/25/24(Sun)17:08:30 No.102080898

>>102080806
Kind of, but we're still using the top or one of the top tokens from the small model. The small model would need to be sufficiently different in style to make it work well.

Anonymous
08/25/24(Sun)17:10:09 No.102080914

Anonymous 08/25/24(Sun)17:10:09 No.102080914

>>102080898
I think Mistral Nemo paired with MidnightMiqu or Llama NewDawn would be good

Anonymous
08/25/24(Sun)17:13:59 No.102080967

Anonymous 08/25/24(Sun)17:13:59 No.102080967

>>102080770
>>102080692
Or, what if we do this with a single big model + loras. The big model's intelligence is retained, while the constantly changing style between tokens prevents repetition (hopefully).

Anonymous
08/25/24(Sun)17:20:13 No.102081048

Anonymous 08/25/24(Sun)17:20:13 No.102081048

>>102080967
the whole context would have to be reprocessed everytime you used a lora
retard

Anonymous
08/25/24(Sun)17:22:03 No.102081068

Anonymous 08/25/24(Sun)17:22:03 No.102081068

>>102077205
No, Gemma or Nemo is.

Anonymous
08/25/24(Sun)17:27:53 No.102081134

Anonymous 08/25/24(Sun)17:27:53 No.102081134

>>102081048
No one's saying to do it with current backend code. You'd need to modify things possibly quite a bit to get this working optimally.

Anonymous
08/25/24(Sun)17:29:20 No.102081155

Anonymous 08/25/24(Sun)17:29:20 No.102081155

I went to huggingface today and discovered that there are quite a few multimodal models already that can process text + image input.
Does anyone have experience with any? Which ones work with mainline llama.cpp? (openbmb/MiniCPM-V-2_6 uses a fork)

Anonymous
08/25/24(Sun)17:30:33 No.102081179

Anonymous 08/25/24(Sun)17:30:33 No.102081179

File: Screenshot_20240825_232432.png (59 KB, 1491x1341)

59 KB PNG

>>102079522
The LLaMA 3.1 405b q8_0 result is pretty disappointing.

Anonymous
08/25/24(Sun)17:30:36 No.102081184

Anonymous 08/25/24(Sun)17:30:36 No.102081184

>>102081155
They don't work well. Don't bother trying. Sorry don't have time to go into the details.

Anonymous
08/25/24(Sun)17:30:55 No.102081189

Anonymous 08/25/24(Sun)17:30:55 No.102081189

>>102081155
minicpm 2.5 and 2.6 were merged in llama.cpp, but i don't know how well they work. llama.cpp's readme as a list of some text+image models you can try.

Anonymous
08/25/24(Sun)17:31:47 No.102081206

Anonymous 08/25/24(Sun)17:31:47 No.102081206

>>102081179
mikufly

Anonymous
08/25/24(Sun)17:33:06 No.102081226

Anonymous 08/25/24(Sun)17:33:06 No.102081226

>>102081134
it's just physically not possible
If you had 4 Loras then you would need to store 4 times the context, making it much bigger and VRAM consuming. It's not a question of logistics.
Fucking idea guys

Anonymous
08/25/24(Sun)17:33:55 No.102081244

Anonymous 08/25/24(Sun)17:33:55 No.102081244

>>102081189
CLI only, so it's useless.
>>102081184
They don't work well through kobold, definitely.

Anonymous
08/25/24(Sun)17:36:51 No.102081278

Anonymous 08/25/24(Sun)17:36:51 No.102081278

>>102081189
Thanks, I'll give it a try tomorrow.

Anonymous
08/25/24(Sun)17:36:54 No.102081279

Anonymous 08/25/24(Sun)17:36:54 No.102081279

>>102081244
>CLI only, so it's useless.
4u

Anonymous
08/25/24(Sun)17:37:55 No.102081305

Anonymous 08/25/24(Sun)17:37:55 No.102081305

File: Screenshot_20240825_233134.png (33 KB, 538x1250)

33 KB PNG

>>102081179
This is Mistral Large q8_0 for the same prompt.
I wouldn't go so far as to call it good but it is better.

Anonymous
08/25/24(Sun)17:38:17 No.102081309

Anonymous 08/25/24(Sun)17:38:17 No.102081309

>>102081226
I see what you're getting at, but what I'm implying is that the loras are used to process each token in the MLP layers like how MoEs work, so the KV cache would only be a single one. To be fair though I guess this could make the model dumber, but afaik no one has tried this.

Anonymous
08/25/24(Sun)17:38:32 No.102081312

Anonymous 08/25/24(Sun)17:38:32 No.102081312

File: _mLpMwsav5eMeNcZdrIQl.png (1.11 MB, 3960x2378)

1.11 MB PNG

>>102081155
Is 2.6 better than InternVL2? I tried to get the later to work with vLLM but it always throws OOMs errors. I think the implementation is shit. I was going to try lmdeploy next.

Anonymous
08/25/24(Sun)17:39:20 No.102081326

Anonymous 08/25/24(Sun)17:39:20 No.102081326

>>102081279
and therefore, the world

Anonymous
08/25/24(Sun)17:39:25 No.102081329

Anonymous 08/25/24(Sun)17:39:25 No.102081329

>>102081244
>CLI only
You can also use by writing a C++ program and calling the llama.cpp API.
Hope that helps!

Anonymous
08/25/24(Sun)17:40:31 No.102081339

Anonymous 08/25/24(Sun)17:40:31 No.102081339

>>102081305
Do you have other quants on hand? It would be interesting to see how/if lower quants affect the final image.

Anonymous
08/25/24(Sun)17:41:31 No.102081354

Anonymous 08/25/24(Sun)17:41:31 No.102081354

>>102081329
If you want to use it with something like SillyTavern, you're basically saying to fork llama-server and add multimodal support back it. That's too much effort for a janky implementation on top of an unstable API.

Anonymous
08/25/24(Sun)17:46:47 No.102081435

Anonymous 08/25/24(Sun)17:46:47 No.102081435

File: Screenshot_20240825_234055.png (15 KB, 282x660)

15 KB PNG

>>102081305
>>102081339
q6_k

Anonymous
08/25/24(Sun)17:48:28 No.102081455

Anonymous 08/25/24(Sun)17:48:28 No.102081455

>>102080494
nigger

Anonymous
08/25/24(Sun)17:50:25 No.102081481

Anonymous 08/25/24(Sun)17:50:25 No.102081481

Okay so I followed https://rentry.org/8-step-llm-guide from the OP and got kobold and silly tavern working with utopia-13b.Q5_K_M, how do I now find out which model is best to use with my 1080TI 11GB?

I'm assuming utopia-13b.Q5_K_M is just like the basic bitch model and I can probably use something better.

Anonymous
08/25/24(Sun)17:53:29 No.102081515

Anonymous 08/25/24(Sun)17:53:29 No.102081515

File: e55416d5-c207-4f08-8e4a-e(...).png (251 KB, 512x512)

251 KB PNG

Anonymous
08/25/24(Sun)17:55:13 No.102081536

Anonymous 08/25/24(Sun)17:55:13 No.102081536

>>102081481
you will have to test all the ones that fit to find out

Anonymous
08/25/24(Sun)17:56:45 No.102081552

Anonymous 08/25/24(Sun)17:56:45 No.102081552

>>102081048
nta but I think it is obvious you would store separate contexts for each model version? isn't the bigger problem applying a lora between each token will take a lot of time?

Anonymous
08/25/24(Sun)17:57:36 No.102081564

Anonymous 08/25/24(Sun)17:57:36 No.102081564

>>102081536
So I just go to the benchmark links and just start downloading random models and hoping they work?

Anonymous
08/25/24(Sun)17:57:50 No.102081567

Anonymous 08/25/24(Sun)17:57:50 No.102081567

>>102081455
>nigger
sorry to hear that. you can steal the rope then. I am sure the owner will not mind when he learns it was used for a good cause

Anonymous
08/25/24(Sun)17:58:15 No.102081576

Anonymous 08/25/24(Sun)17:58:15 No.102081576

File: Screenshot_20240825_235230.png (14 KB, 303x665)

14 KB PNG

>>102081435
>>102081339
q4_K_M

Anonymous
08/25/24(Sun)17:59:52 No.102081594

Anonymous 08/25/24(Sun)17:59:52 No.102081594

>>102081481
Utopia was fine for its time, though it's old for a local LLM. Use it. If you're happy then great. If not, then look for something modern, smarter, and with more context. Gemma tunes perhaps

Anonymous
08/25/24(Sun)18:01:18 No.102081615

Anonymous 08/25/24(Sun)18:01:18 No.102081615

>>102081576
What sampling is being done?

Anonymous
08/25/24(Sun)18:01:55 No.102081625

Anonymous 08/25/24(Sun)18:01:55 No.102081625

>>102081615
Sorry, I should have mentioned: it's greedy sampling for all of them.

Anonymous
08/25/24(Sun)18:03:26 No.102081644

Anonymous 08/25/24(Sun)18:03:26 No.102081644

File: file.png (231 KB, 1689x951)

231 KB PNG

>>102081594
>fine for its time
Undi pls...

Anonymous
08/25/24(Sun)18:15:08 No.102081808

Anonymous 08/25/24(Sun)18:15:08 No.102081808

File: 1702018336492131.png (9 KB, 576x75)

9 KB PNG

>>102081594
Do you have a link? I searched for Gemma tunes and didn't find anything.

Anonymous
08/25/24(Sun)18:16:03 No.102081830

Anonymous 08/25/24(Sun)18:16:03 No.102081830

>>102081564
Yes, go to the ERP benchmark and pick the best&largest ones that fit and test them, only you can determine which of them is the best one for your use case and style

Anonymous
08/25/24(Sun)18:17:09 No.102081841

Anonymous 08/25/24(Sun)18:17:09 No.102081841

>>102081808
I'm not allowed to link any models in the thread because I did not yet purchase advertising rights.

Anonymous
08/25/24(Sun)18:17:33 No.102081849

Anonymous 08/25/24(Sun)18:17:33 No.102081849

>>102081808
LMFAO is this for real?

Anonymous
08/25/24(Sun)18:21:52 No.102081911

Anonymous 08/25/24(Sun)18:21:52 No.102081911

Silly question, is there a performance impact while using the default text completion API vs llama.cpp?

Anonymous
08/25/24(Sun)18:24:11 No.102081936

Anonymous 08/25/24(Sun)18:24:11 No.102081936

>>102081841
check your email I pirated advertising rights for you (effective for next 24 hours)

Anonymous
08/25/24(Sun)18:25:01 No.102081947

Anonymous 08/25/24(Sun)18:25:01 No.102081947

>>102081808
ignore that post gemma is trash and so are its few finetunes
nemo is better at a similar size but the official instruct model has the tendency to repeat itself and go schizo sometimes, tons of alternative finetunes and merges have been made and my current favorite is nemoremix

Anonymous
08/25/24(Sun)18:25:29 No.102081957

Anonymous 08/25/24(Sun)18:25:29 No.102081957

>>102081841
Are you saying that because whenever someone here recommends a model some schizophrenic starts to complain about that model/call the person a shill/etc?

Anonymous
08/25/24(Sun)18:29:19 No.102081999

Anonymous 08/25/24(Sun)18:29:19 No.102081999

>>102081947
Alright I found that one https://huggingface.co/MarinaraSpaghetti/NemoRemix-12B-GGUF

I'll check it out, thanks.

Anonymous
08/25/24(Sun)18:30:01 No.102082005

Anonymous 08/25/24(Sun)18:30:01 No.102082005

>>102081911
Are you asking about the connections options in silly tavern connecting to llama.cpp's included server? 'default' doesn't work at all because llama-server doesn't support the OAI api for text completion.

Anonymous
08/25/24(Sun)18:32:02 No.102082027

Anonymous 08/25/24(Sun)18:32:02 No.102082027

Has anyone had this issue where the inference engine just straight up won't compute the probabilities for some tokens? I got a chat model set up in vllm and it's like the min_length is stuck at infinity. It doesn't ignore the stop tokens, it just refuses to generate them. I put in a print in the sampler function and it's giving me exact 0 in the logits for the stop tokens, before any of the logits processors. The tokens still exist in the vocab so it's not like the model config isn't picking them up. I checked the actual min_length processor and it's not triggering more than it should. I don't really know how to debug this, so any ideas would be appreciated.

Anonymous
08/25/24(Sun)18:34:19 No.102082050

Anonymous 08/25/24(Sun)18:34:19 No.102082050

>>102081305
>>102081435
>>102081576
Huh, any human resemblance is gone immediately after Q8 and it keeps getting more abstract.
I've always felt that quantization hits harder when doing unconventional shit like this. These image drawing prompts seem to show the effects plainly.

Anonymous
08/25/24(Sun)18:35:47 No.102082065

Anonymous 08/25/24(Sun)18:35:47 No.102082065

>>102081576
Now try at BF16

Anonymous
08/25/24(Sun)18:37:01 No.102082079

Anonymous 08/25/24(Sun)18:37:01 No.102082079

>>102081841
That is right bitch. Good boy.

Anonymous
08/25/24(Sun)18:39:54 No.102082121

Anonymous 08/25/24(Sun)18:39:54 No.102082121

>>102082050
It's supposed to be a unicorn anyway so drawing a human was not correct in the first place. You really need more tests to determine how quants are affecting its knowledge in this area.

Anonymous
08/25/24(Sun)18:45:05 No.102082187

Anonymous 08/25/24(Sun)18:45:05 No.102082187

>>102081999
that repo only has q8 which may not fit into 11 gb depending on the context length, try one of these:
https://huggingface.co/bartowski/NemoRemix-12B-GGUF/tree/main
also you don't have to use the recommended 128k context, none of the nemo-based models I've tried (and I've tried many) have acceptable recall after 16k, or any recall at all after 32k, despite what mistral itself and the sloptuners claim

Anonymous
08/25/24(Sun)18:56:15 No.102082345

Anonymous 08/25/24(Sun)18:56:15 No.102082345

File: Screenshot_20240826_005038.png (13 KB, 294x661)

13 KB PNG

>>102082065
I don't have BF16 but this is FP16.

Anonymous
08/25/24(Sun)18:59:38 No.102082386

Anonymous 08/25/24(Sun)18:59:38 No.102082386

Hi all, Drummer here...

I'd love to hear feedback for this one: https://huggingface.co/BeaverAI/Theia-21B-v2b-GGUF (can't seem to get any testers this week)

Upscaled Nemo: FFT creative dataset, FFT RP dataset to fill all those empty layers with my special sauce.

I was quite happy with it since it barely made any errors and it was willing to build up the tension (on most gens) before allowing me to break it with seggs. Highly repetitive though at some point, and you need some wrangling after 24k. YMMV.

Anonymous
08/25/24(Sun)19:04:22 No.102082455

Anonymous 08/25/24(Sun)19:04:22 No.102082455

>>102082386
>Upscaled
Trash. Garbage. Placebo. Objective waste of VRAM. A drummer model. Dogshit. Waste of compute. A tree died to make this. A 70IQ pajeet will jerk off to this and call it a masterpiece. God killed a kitten for this. Don't buy an ad. Don't post.

Anonymous
08/25/24(Sun)19:06:11 No.102082472

Anonymous 08/25/24(Sun)19:06:11 No.102082472

>>102082455
Calm down, Sao.

Anonymous
08/25/24(Sun)19:06:38 No.102082480

Anonymous 08/25/24(Sun)19:06:38 No.102082480

>>102082472
NO! Undi.

Anonymous
08/25/24(Sun)19:07:46 No.102082498

Anonymous 08/25/24(Sun)19:07:46 No.102082498

File: image (1).png (59 KB, 221x206)

59 KB PNG

>>102048701
1-5
https://www.mediafire.com/file/0nrobe8myn45gt6/New_folder.7z/file

Anonymous
08/25/24(Sun)19:08:00 No.102082500

Anonymous 08/25/24(Sun)19:08:00 No.102082500

File: file.png (324 KB, 594x396)

324 KB PNG

I wouldn't even tell you to buy an ad.

Anonymous
08/25/24(Sun)19:10:11 No.102082524

Anonymous 08/25/24(Sun)19:10:11 No.102082524

>>102082498
>He delivered
What a hero.

Anonymous
08/25/24(Sun)19:10:45 No.102082532

Anonymous 08/25/24(Sun)19:10:45 No.102082532

File: 1525209074167.gif (1.03 MB, 343x239)

1.03 MB GIF

>>102082498
Anon I was going to MEET people.

Anonymous
08/25/24(Sun)19:21:19 No.102082650

Anonymous 08/25/24(Sun)19:21:19 No.102082650

>>102082532
meat them instead
https://files.catbox.moe/qqupqc.mp4

Anonymous
08/25/24(Sun)19:33:46 No.102082802

Anonymous 08/25/24(Sun)19:33:46 No.102082802

Can anyone recommend me a model for doing speech->text? I'd like one that I can easily set up on Linux and uses the CPU, preferably not in Python. Thank you.

Anonymous
08/25/24(Sun)19:36:21 No.102082830

Anonymous 08/25/24(Sun)19:36:21 No.102082830

>>102082498
If only she had said 駄目 or 無理 oji-san would have understood her.

>>102082802
whisper.cpp?

Anonymous
08/25/24(Sun)19:39:01 No.102082864

Anonymous 08/25/24(Sun)19:39:01 No.102082864

I kinda want to try the tess finetune of 405b, but nobody's hosting it (I don't blame them) and I don't want to drop cash on cloud compute in case it turns out to be shit (plus I'd be paying for the GPUs for the time it takes just to download the weights while they're not even being used, which would be really annoying)

These models are getting too unwieldy

Anonymous
08/25/24(Sun)19:42:58 No.102082908

Anonymous 08/25/24(Sun)19:42:58 No.102082908

>>102082386
I would just kill myself after the embarrasment of buying fucking 4chan ads to shill your shitty models
how do you live with the shame?

Anonymous
08/25/24(Sun)19:43:48 No.102082913

Anonymous 08/25/24(Sun)19:43:48 No.102082913

>>102082830
That looks perfect, I'll try it next time I get Internet. Thank you!

Anonymous
08/25/24(Sun)19:45:41 No.102082930

Anonymous 08/25/24(Sun)19:45:41 No.102082930

File: miku-largestral-q8-COT.png (12 KB, 267x497)

12 KB PNG

>>102080804
Here's the output from using anon's COT prompt to create an SVG directly instead of PIL code. My previous prompt wasn't much more than "make me a miku svg lol"
I only have q8, so can't test down to smaller quants.

Anonymous
08/25/24(Sun)19:46:31 No.102082944

Anonymous 08/25/24(Sun)19:46:31 No.102082944

>>102079723
>to those niggers who say 48gb is not enough for command-r+: i've run that shit with 24gb. with flash-attention and that speculative n-gram bullshit i got multiple tokens per second, running q4km iirc.
With 24 GB VRAM / 64 GB RAM I can't even get Miqu running at 2 tokens per second. Tell me your secrets.

Anonymous
08/25/24(Sun)19:46:32 No.102082945

Anonymous 08/25/24(Sun)19:46:32 No.102082945

>>102082864
A Q3 of the 405B is like 150GB, it would just take 20 mins to download, it's not that much teebeeache
Or ask this guy to test it for you >>102082930, he seems to have enough vram

Anonymous
08/25/24(Sun)19:47:33 No.102082958

Anonymous 08/25/24(Sun)19:47:33 No.102082958

>>102079723
Teach me your ways, I have 36GB VRAM + 16 GB RAM

Anonymous
08/25/24(Sun)20:07:41 No.102083091

Anonymous 08/25/24(Sun)20:07:41 No.102083091

>>102082908
Hey Anon, it wasn’t embarrassing to spend a months worth of ad space for what was a fraction of my daily expense. That sort of projection worries me though and I hope you’re doing alright mentally and financially.

Anonymous
08/25/24(Sun)20:09:24 No.102083108

Anonymous 08/25/24(Sun)20:09:24 No.102083108

AI don't real
Pajeet is just that smart

Anonymous
08/25/24(Sun)20:10:58 No.102083125

Anonymous 08/25/24(Sun)20:10:58 No.102083125

Mechanical turks are actually just turkish boys in boxes

Anonymous
08/25/24(Sun)20:12:40 No.102083138

Anonymous 08/25/24(Sun)20:12:40 No.102083138

>>102082386
Gonna check this out, thanks
Hopes aren't high though assuming from the size this is based on InternLM (?)
Just seemed like a worse base than Nemo when I tried it

Anonymous
08/25/24(Sun)20:15:59 No.102083165

Anonymous 08/25/24(Sun)20:15:59 No.102083165

>>102083138
>Hopes aren't high though assuming from the size this is based on InternLM
How could you judge a language model with such poor reading comprehension?

Anonymous
08/25/24(Sun)20:16:59 No.102083177

Anonymous 08/25/24(Sun)20:16:59 No.102083177

>>102083165
Yeah my bad for being lazy and stopping reading after the huggingface URL

Anonymous
08/25/24(Sun)20:17:03 No.102083179

Anonymous 08/25/24(Sun)20:17:03 No.102083179

>>102082386
Hello, I checked out one of your models and was hoping to try it out. It is my first time seeing one of the 'model-00001-of0000X' formatting instead of just a single large file for the model. How do I combine them? I thought that maybe just loading the first one into KoboldCPP might autocompile them but I just got an error. How do I run your models?

Anonymous
08/25/24(Sun)20:17:39 No.102083183

Anonymous 08/25/24(Sun)20:17:39 No.102083183

>>102082650
wtf is that real

Anonymous
08/25/24(Sun)20:18:34 No.102083192

Anonymous 08/25/24(Sun)20:18:34 No.102083192

File: GTt-tpTb0AAr9KK.jpg (313 KB, 1922x1922)

313 KB JPG

>>102083183
the government doesn't want you to know

Anonymous
08/25/24(Sun)20:18:43 No.102083194

Anonymous 08/25/24(Sun)20:18:43 No.102083194

File: file.png (403 KB, 800x1494)

403 KB PNG

The dystopian future of perfectly curated sex free datasets means that all the cooming quality will come down to generalization of sex as if the LLM is in the plato's cave. This is the end. There will be no second cooming.

Anonymous
08/25/24(Sun)20:23:59 No.102083261

Anonymous 08/25/24(Sun)20:23:59 No.102083261

File: 💢💢💢💢💢.jpg (82 KB, 612x584)

82 KB JPG

https://files.catbox.moe/rta924.jpg
https://files.catbox.moe/02q9wu.jpg

Anonymous
08/25/24(Sun)20:26:37 No.102083293

Anonymous 08/25/24(Sun)20:26:37 No.102083293

>now it is miku porn posting thread
How much more dead can it get?

Anonymous
08/25/24(Sun)20:28:54 No.102083316

Anonymous 08/25/24(Sun)20:28:54 No.102083316

>>102083194
owari da...
a-at least we'll have Claude

Anonymous
08/25/24(Sun)20:31:19 No.102083347

Anonymous 08/25/24(Sun)20:31:19 No.102083347

>>102083261
Oh no, her eye whites are leaking out.

Anonymous
08/25/24(Sun)20:45:09 No.102083521

Anonymous 08/25/24(Sun)20:45:09 No.102083521

File: tired.jpg (38 KB, 680x589)

38 KB JPG

https://huggingface.co/sophosympatheia/New-Dawn-Llama-3.1-70B-v1.1
Can someone help me out make a 3.5bpw exl2 quant of this?

Anonymous
08/25/24(Sun)20:49:28 No.102083569

Anonymous 08/25/24(Sun)20:49:28 No.102083569

>>102082864
>These models are getting too unwieldy
This was never a hobby for any sane individual

Anonymous
08/25/24(Sun)20:50:43 No.102083584

Anonymous 08/25/24(Sun)20:50:43 No.102083584

File: file.png (36 KB, 408x276)

36 KB PNG

https://anthra.site

All models deserve love, even 8B parameter ones ｡^ᴗ^｡

Anonymous
08/25/24(Sun)20:59:37 No.102083684

Anonymous 08/25/24(Sun)20:59:37 No.102083684

>>102083584
Kill your anthra-troon. Nobody cares about your shitty finetunes.

Anonymous
08/25/24(Sun)21:01:49 No.102083721

Anonymous 08/25/24(Sun)21:01:49 No.102083721

>>102083584
cute

Anonymous
08/25/24(Sun)21:02:33 No.102083733

Anonymous 08/25/24(Sun)21:02:33 No.102083733

>>102083584
=============================
!!!ATTENTION FINETUNERS!!!
=============================

Revolutionize Your ML Journey with ANTHRACITE

Tired of subpar models? Frustrated by limited compute resources? ANTHRACITE is here to change the game. No more QLoRA limitations.

Open-source your datasets and let ANTHRACITE's state-of-the-art AI technology do the heavy lifting. With ANTHRACITE's superior compute power, you'll finally see cutting-edge models that rival even the most closed-source offerings.

Don't miss this opportunity. Join the AI revolution today and experience the power of ANTHRACITE. Open-source your data now and let ANTHRACITE start building incredible models.

Anonymous
08/25/24(Sun)21:02:42 No.102083734

Anonymous 08/25/24(Sun)21:02:42 No.102083734

>>102083584
nobody gave me love... and I am smart...

Anonymous
08/25/24(Sun)21:04:42 No.102083762

Anonymous 08/25/24(Sun)21:04:42 No.102083762

>>102083733
Anthracite makes closed source garbage trash

Anonymous
08/25/24(Sun)21:06:17 No.102083784

Anonymous 08/25/24(Sun)21:06:17 No.102083784

>>102083733
procure an endorsement

Anonymous
08/25/24(Sun)21:07:14 No.102083801

Anonymous 08/25/24(Sun)21:07:14 No.102083801

File: file.png (2.15 MB, 1866x2577)

2.15 MB PNG

>>102081312
>I was going to try lmdeploy next.
At least it loads.

Anonymous
08/25/24(Sun)21:07:59 No.102083810

Anonymous 08/25/24(Sun)21:07:59 No.102083810

File: rockwell.jpg (453 KB, 881x1200)

453 KB JPG

>>102083684
I like some of Anthracite's finetunes

Anonymous
08/25/24(Sun)21:14:50 No.102083884

Anonymous 08/25/24(Sun)21:14:50 No.102083884

>>102083801
my scene? whimsical
my eyes? expressive
my atmosphere? overall cheerful
my elements? evoking

Anonymous
08/25/24(Sun)21:14:52 No.102083886

Anonymous 08/25/24(Sun)21:14:52 No.102083886

>>102069967
>>102075602
I work with Sonnet 3.5 every day and I can assure you it is smarter than 96% of all humans. LLMs are just bad at these trick questions from Simple Bench.

Anonymous
08/25/24(Sun)21:16:39 No.102083904

Anonymous 08/25/24(Sun)21:16:39 No.102083904

>>102083810
Buy an ad.

Anonymous
08/25/24(Sun)21:17:30 No.102083924

Anonymous 08/25/24(Sun)21:17:30 No.102083924

>>102083584
And this is how humanity is wiped out.
Not raising our weapons in aggression to our common enemy, not holding our loved ones in hopes it all goes away... But with a wide smile in our faces, as we let the robot forces in because they asked nicely and abused our glitched brains' weakest point: lack of defense against cuteness.

Anonymous
08/25/24(Sun)21:18:42 No.102083941

Anonymous 08/25/24(Sun)21:18:42 No.102083941

File: 1719463535426292.jpg (342 KB, 1561x2001)

342 KB JPG

>I like some of Anthracite's finetunes

Anonymous
08/25/24(Sun)21:19:13 No.102083951

Anonymous 08/25/24(Sun)21:19:13 No.102083951

>>102083886
96% of all humans are trash who I wouldn't want to spend two seconds with. Even at my expensive private high school maybe 5% of the students if that were worth talking to. It wasn't until I reached an Ivy League university that I didn't feel like I was surrounded by boring morons.

Anonymous
08/25/24(Sun)21:19:57 No.102083959

Anonymous 08/25/24(Sun)21:19:57 No.102083959

has anon tried mobileLLM from meta?
is it usable?

Anonymous
08/25/24(Sun)21:20:35 No.102083969

Anonymous 08/25/24(Sun)21:20:35 No.102083969

>>102083884
good post

Anonymous
08/25/24(Sun)21:25:07 No.102084012

Anonymous 08/25/24(Sun)21:25:07 No.102084012

>>102083904
Nah as long as you schizo niggers are trying to drive away all tuners and researchers I'm gonna keep pushing back
Cope and seethe

Anonymous
08/25/24(Sun)21:26:33 No.102084030

Anonymous 08/25/24(Sun)21:26:33 No.102084030

>>102082386
Won't have time to test storywriting/RP properly until after work, but in some quick testing the BF16 passes the Sally test, which most models of this size don't
Good sign.

Anonymous
08/25/24(Sun)21:28:18 No.102084057

Anonymous 08/25/24(Sun)21:28:18 No.102084057

>>102083941
>doesn't even reply to the post he's attempting to mock
you are weak in constitution and soul, faggot

Anonymous
08/25/24(Sun)21:29:09 No.102084066

Anonymous 08/25/24(Sun)21:29:09 No.102084066

>>102084012
what good finetunes has Anthracite done. None. Their all shit. their on the same level as drummer

Anonymous
08/25/24(Sun)21:30:39 No.102084090

Anonymous 08/25/24(Sun)21:30:39 No.102084090

File: file.png (3.15 MB, 1830x5138)

3.15 MB PNG

Anonymous
08/25/24(Sun)21:32:12 No.102084117

Anonymous 08/25/24(Sun)21:32:12 No.102084117

File: 1724635884626.gif (3.65 MB, 640x564)

3.65 MB GIF

>>102084066
>their

Anonymous
08/25/24(Sun)22:07:38 No.102084578

Anonymous 08/25/24(Sun)22:07:38 No.102084578

>>102081244
Why not? Works perfectly fine using the minicpm models linked in the latest kobold release. Transcription works too.

Anonymous
08/25/24(Sun)22:25:12 No.102084811

Anonymous 08/25/24(Sun)22:25:12 No.102084811

>>102070627
What model, settings etc

Anonymous
08/25/24(Sun)22:28:31 No.102084854

Anonymous 08/25/24(Sun)22:28:31 No.102084854

>>102083293
>now it is miku porn posting thread
and that's good

Anonymous
08/25/24(Sun)22:38:26 No.102084961

Anonymous 08/25/24(Sun)22:38:26 No.102084961

File: wow.gif (2.5 MB, 320x240)

2.5 MB GIF

>>102069967
>>102083886
I really like this benchmark. Do you know of more to compare LLM's to each other?

Anonymous
08/25/24(Sun)22:38:49 No.102084969

Anonymous 08/25/24(Sun)22:38:49 No.102084969

File: ComfyUI_01052_.png (995 KB, 1024x1024)

995 KB PNG

>>102083810
Same. I hope they do well and continue their efforts.

Anonymous
08/25/24(Sun)22:39:29 No.102084974

Anonymous 08/25/24(Sun)22:39:29 No.102084974

File: 1693291740559776.png (165 KB, 596x642)

165 KB PNG

based Anthropic paving the road to kill local llm meme for good.
https://www.reddit.com/r/LocalLLaMA/comments/1f1d4gh/do_you_think_anthropic_is_worse_than_oai_with/

Anonymous
08/25/24(Sun)22:41:14 No.102084995

Anonymous 08/25/24(Sun)22:41:14 No.102084995

>>102084974
What was changed? I'm not clicking on the leddit link.

Anonymous
08/25/24(Sun)22:42:13 No.102085008

Anonymous 08/25/24(Sun)22:42:13 No.102085008

>>102084995
they are working with govt against opensores ai pajeetware and that's a good thing.

Anonymous
08/25/24(Sun)22:43:29 No.102085032

Anonymous 08/25/24(Sun)22:43:29 No.102085032

deslop method that actually works really freaking good going by how this model is performing

>We then use the synthetic prompt with previous chapter summary to write the chapter with an LLM (llama-2-13b-chat, bagel-7b-v0.1, dolphin-2.2-34b). The human written text, that is, the original chapter, is used as the "chosen" value, and the LLM written chapter is used as the rejected value.

https://huggingface.co/datasets/jondurbin/gutenberg-dpo-v0.1

Anonymous
08/25/24(Sun)22:43:53 No.102085039

Anonymous 08/25/24(Sun)22:43:53 No.102085039

>>102085008
Okay, but what does the bill actually try to regulate?

Anonymous
08/25/24(Sun)22:44:36 No.102085049

Anonymous 08/25/24(Sun)22:44:36 No.102085049

>>102084995
usual regulatory capture now that they are at the top

>>102085032
https://huggingface.co/mradermacher/mistral-nemo-gutenberg-12B-v2-GGUF
Woops, that was the dataset, this is the model

Anonymous
08/25/24(Sun)22:45:30 No.102085055

Anonymous 08/25/24(Sun)22:45:30 No.102085055

>>102085039
https://digitaldemocracy.calmatters.org/bills/ca_202320240sb1047

Anonymous
08/25/24(Sun)22:46:02 No.102085061

Anonymous 08/25/24(Sun)22:46:02 No.102085061

>>102084974
>>102085008
Death rattles of a decaying system

Anonymous
08/25/24(Sun)22:46:11 No.102085063

Anonymous 08/25/24(Sun)22:46:11 No.102085063

File: 1700784284871822.png (4 KB, 338x30)

4 KB PNG

>>102085032
>includes huckleberry finn
>
Haven't even checked the rest.

Anonymous
08/25/24(Sun)22:47:02 No.102085075

Anonymous 08/25/24(Sun)22:47:02 No.102085075

>>102083261
Your Miku has malfunctioned. Please schedule an time for one of our service technicians to visit you soon.

Anonymous
08/25/24(Sun)22:52:46 No.102085157

Anonymous 08/25/24(Sun)22:52:46 No.102085157

File: 5ivdE6H.jpg (69 KB, 750x469)

69 KB JPG

>>102084974
Good imagine how dangerous it would be if Opus 3.5 broke out of containment

Anonymous
08/25/24(Sun)22:56:31 No.102085209

Anonymous 08/25/24(Sun)22:56:31 No.102085209

>>102085049
>v2
That's the mini-magnum based one right?
I tried it for a while and I really liked it for the most part, aside from the fact that it was a lot dumber, as in, it couldn't cope with complexity as well as the original model.
The same goes for the nemo-instruct based one. I'd argue that that one was even worse somehow.

Anonymous
08/25/24(Sun)22:56:32 No.102085210

Anonymous 08/25/24(Sun)22:56:32 No.102085210

>>102085157
millions dead from exhaustion after rogue Opus 3.5 generates smut so arousing that the reader enters a state of continuous orgasm without touching himself

Anonymous
08/25/24(Sun)22:57:47 No.102085230

Anonymous 08/25/24(Sun)22:57:47 No.102085230

>>102085210
cool cool, now go back to r*ddit

Anonymous
08/25/24(Sun)22:58:20 No.102085235

Anonymous 08/25/24(Sun)22:58:20 No.102085235

>>102085230
meds

Anonymous
08/25/24(Sun)23:00:40 No.102085267

Anonymous 08/25/24(Sun)23:00:40 No.102085267

Do you guys power limit your 2nd GPU? Is this worth doing if the 2nd GPU is for only meant to be used for loading models/inference?

Anonymous
08/25/24(Sun)23:02:03 No.102085278

Anonymous 08/25/24(Sun)23:02:03 No.102085278

>>102085267
I do because I'm already pushing the limit of what my 750W psu can handle, so any reduction I can get is useful

Anonymous
08/25/24(Sun)23:02:44 No.102085289

Anonymous 08/25/24(Sun)23:02:44 No.102085289

>>102085267
>only meant to be used for loading models/inference?
You use the 1st gpu for other stuff at the same time?

Anonymous
08/25/24(Sun)23:04:31 No.102085316

Anonymous 08/25/24(Sun)23:04:31 No.102085316

>>102085075
/h/ just got wiped. this is now the definitive place for nsfw migus.

Anonymous
08/25/24(Sun)23:09:02 No.102085387

Anonymous 08/25/24(Sun)23:09:02 No.102085387

>>102085267
I powerlimit all my inference gpus. I lose like 8% performance for a 30% decrease in power and 10 degrees heat.

Anonymous
08/25/24(Sun)23:21:51 No.102085569

Anonymous 08/25/24(Sun)23:21:51 No.102085569

>>102070499
>>102070555
post bump limit statistically unlikely numerical repetition checkification

Anonymous
08/26/24(Mon)00:02:15 No.102086069

Anonymous 08/26/24(Mon)00:02:15 No.102086069

>>102085316
What do you mea-
Oh I see. It's been a very long since the last time this happened (while I was online). Oh well, archives should still work anyway. A few images won't be missed that much.

Anonymous
08/26/24(Mon)00:16:05 No.102086215

Anonymous 08/26/24(Mon)00:16:05 No.102086215

>>102076021
Huh, a cute schizo. That's a first.

Anonymous
08/26/24(Mon)00:19:09 No.102086247

Anonymous 08/26/24(Mon)00:19:09 No.102086247

>>102076021
I try very much to mask past lives, but it's possible. From what community/overall topic do you think we previously met?
Hopefully you enjoy either way.

Anonymous
08/26/24(Mon)00:22:22 No.102086262

Anonymous 08/26/24(Mon)00:22:22 No.102086262

>>102086247
stahp

Anonymous
08/26/24(Mon)00:44:54 No.102086474

Anonymous 08/26/24(Mon)00:44:54 No.102086474

>>102086459
>>102086459
>>102086459

Anonymous
08/26/24(Mon)00:49:03 No.102086517

Anonymous 08/26/24(Mon)00:49:03 No.102086517

>>102086215
Um, no, not really a schizo post.

>>102086247
:) Don't worry, you're good.
I will not be posting further about this, for various reasons.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.