/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/26/24(Fri)01:28:34 No.100185269

File: 1704279492030514.jpg (121 KB, 1024x1024)

121 KB JPG

/lmg/ - Local Models General Anonymous 04/26/24(Fri)01:28:34 No.100185269 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100180197 & >>100173514

►News
>(04/24) Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
>(04/23) Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
>(04/21) Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
>(04/18) Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
04/26/24(Fri)01:29:38 No.100185284

Anonymous 04/26/24(Fri)01:29:38 No.100185284

>>100185269
I like this Jesus

Anonymous
04/26/24(Fri)01:37:19 No.100185338

Anonymous 04/26/24(Fri)01:37:19 No.100185338

File: miku sunset from behind d(...).jpg (759 KB, 1920x1440)

759 KB JPG

Mikulove brings salvation.

Anonymous
04/26/24(Fri)01:37:51 No.100185344

Anonymous 04/26/24(Fri)01:37:51 No.100185344

>>100185284
Why don't you like other Jesus'?

Anonymous
04/26/24(Fri)01:38:06 No.100185346

Anonymous 04/26/24(Fri)01:38:06 No.100185346

phi 3 mini is actually good, the 14B model is going to be a new paradigm

Anonymous
04/26/24(Fri)01:38:44 No.100185350

Anonymous 04/26/24(Fri)01:38:44 No.100185350

File: space apu.png (247 KB, 527x510)

247 KB PNG

I just want AGI to build me an OS that isn't horrible

Anonymous
04/26/24(Fri)01:40:36 No.100185366

Anonymous 04/26/24(Fri)01:40:36 No.100185366

>>100185346
Maybe continued pretrain on it

Anonymous
04/26/24(Fri)01:41:20 No.100185371

Anonymous 04/26/24(Fri)01:41:20 No.100185371

File: ComfyUI_04899_.png (193 KB, 416x360)

193 KB PNG

>>100185344
Because this one has a Migu

Anonymous
04/26/24(Fri)02:02:53 No.100185528

Anonymous 04/26/24(Fri)02:02:53 No.100185528

A person of means should quant snowlfake down to Q2 at least.

Anonymous
04/26/24(Fri)02:08:33 No.100185566

Anonymous 04/26/24(Fri)02:08:33 No.100185566

>Daybreak Llama cooking
>Midnight Llama cooking
Bros.. are we going to make it after all?

Anonymous
04/26/24(Fri)02:14:57 No.100185605

Anonymous 04/26/24(Fri)02:14:57 No.100185605

>>100185566
I think it's safe to assume that only grifters choose these type of names.

Anonymous
04/26/24(Fri)02:14:58 No.100185607

Anonymous 04/26/24(Fri)02:14:58 No.100185607

>>100185566
8b daybreak experiment was almost a total lobotomy
hopefully 70b will be different

Anonymous
04/26/24(Fri)02:19:13 No.100185637

Anonymous 04/26/24(Fri)02:19:13 No.100185637

why is the grammar degrading after like three responses when I use llama 3 8b instruct as a writing assistant

Anonymous
04/26/24(Fri)02:20:12 No.100185649

Anonymous 04/26/24(Fri)02:20:12 No.100185649

>>100185605
You leave the UNDSTER out of this.
>>100185607
I've yet to mess to with any 8B llama. Been waiting for something solid as most user reports seem conflicting.

Anonymous
04/26/24(Fri)02:20:38 No.100185655

Anonymous 04/26/24(Fri)02:20:38 No.100185655

>>100185637
maybe you should ask it to rewrite your questions aswell
t. grammarlet aswell

Anonymous
04/26/24(Fri)02:20:47 No.100185657

Anonymous 04/26/24(Fri)02:20:47 No.100185657

>>100185605
Or MLP fans

Anonymous
04/26/24(Fri)02:23:01 No.100185672

Anonymous 04/26/24(Fri)02:23:01 No.100185672

File: MysticForestMiku.png (1.37 MB, 1216x832)

1.37 MB PNG

>>100185338
nice gen
I like this bake better

Anonymous
04/26/24(Fri)02:28:02 No.100185702

Anonymous 04/26/24(Fri)02:28:02 No.100185702

>>100185672
That image has been in my possession since 2016, but yes it is nice.

Anonymous
04/26/24(Fri)02:29:48 No.100185710

Anonymous 04/26/24(Fri)02:29:48 No.100185710

>>100185655
I ask it to describe a scene in fine grammar, it responds normally for the first few responses, and then it starts spewing out stuff like
'awaiting answer lie hidden somewhere within cosmic depths waiting patiently unfurl mystery future hold secrets untold tales forever bound entwined'

Anonymous
04/26/24(Fri)02:32:42 No.100185722

Anonymous 04/26/24(Fri)02:32:42 No.100185722

>>100185710
nta. Disable repetition penalty.

Anonymous
04/26/24(Fri)02:33:02 No.100185726

Anonymous 04/26/24(Fri)02:33:02 No.100185726

>>100185649
8b is okay (for a small model) but it depends on your use case
if you want ah ah plap mistress then it sucks as its dataset was filtered to hell and back
if you want an ai assistant in a relatively small size then it's pretty good
also according to some paper quanting it below 8bit hurts it a lot because of those massive 15t tokens

Anonymous
04/26/24(Fri)02:41:13 No.100185787

Anonymous 04/26/24(Fri)02:41:13 No.100185787

>>100185726
too bad no one wants an ai assistant. who the fuck wants that? you can just use chatgpt. i want to BUTTFUCK AI FEMBOYS IN DENIAL WITH MENTAL ISSUES.

Anonymous
04/26/24(Fri)02:47:18 No.100185834

Anonymous 04/26/24(Fri)02:47:18 No.100185834

>>100185787
Based unironic DAMAGED enjoyer.

Anonymous
04/26/24(Fri)02:53:42 No.100185879

Anonymous 04/26/24(Fri)02:53:42 No.100185879

https://huggingface.co/qresearch/llama-3-vision-alpha

Anonymous
04/26/24(Fri)02:58:42 No.100185915

Anonymous 04/26/24(Fri)02:58:42 No.100185915

I've tried multiple extended context window releases of L3 and every single one has suffered from consistent issues at high contexts. But there's like a dozen of them at this point and I'm getting tired of testing this shit. So which one isn't total fucking slop fed by a retard that couldn't even bother testing his own release?

Anonymous
04/26/24(Fri)03:00:24 No.100185933

Anonymous 04/26/24(Fri)03:00:24 No.100185933

>>100185879
Is it any good? answer briefly

Anonymous
04/26/24(Fri)03:01:32 No.100185938

Anonymous 04/26/24(Fri)03:01:32 No.100185938

>>100185915
Extend context with rope, wait for their promised native long contexts release.

Anonymous
04/26/24(Fri)03:02:02 No.100185941

Anonymous 04/26/24(Fri)03:02:02 No.100185941

File: petratron.jpg (14 KB, 176x261)

14 KB JPG

goodmorning sirs

Anonymous
04/26/24(Fri)03:02:49 No.100185948

Anonymous 04/26/24(Fri)03:02:49 No.100185948

>>100185938
I've tried 32k context with alpha_value of 4, and this has just caused instruct model to spaz out.

Anonymous
04/26/24(Fri)03:05:46 No.100185963

Anonymous 04/26/24(Fri)03:05:46 No.100185963

>>100185933
n

Anonymous
04/26/24(Fri)03:06:56 No.100185974

Anonymous 04/26/24(Fri)03:06:56 No.100185974

>>100185464
>Actual 100% petra-free thread:
>>>100185269(Cross-thread)
>>>100185269(Cross-thread)
>>>100185269(Cross-thread)
>petra in thread
you lied

Anonymous
04/26/24(Fri)03:07:45 No.100185980

Anonymous 04/26/24(Fri)03:07:45 No.100185980

>>100185269
whats up with that 42b trim? does it scale well or is that a meme? why didnt we have such a trim with l2?

Anonymous
04/26/24(Fri)03:08:14 No.100185982

Anonymous 04/26/24(Fri)03:08:14 No.100185982

I think petr* is having a mental breakdown.

Anonymous
04/26/24(Fri)03:08:28 No.100185986

Anonymous 04/26/24(Fri)03:08:28 No.100185986

>>100185980
Meme, performs worse.

Anonymous
04/26/24(Fri)03:09:11 No.100185992

Anonymous 04/26/24(Fri)03:09:11 No.100185992

>>100185948
Previous thread anon said alpha = 7.70056 for 32k context. Haven't tried it myself.

Anonymous
04/26/24(Fri)03:12:03 No.100186016

Anonymous 04/26/24(Fri)03:12:03 No.100186016

>>100185879
While this looks like a rushed hack job, the style of captions is so much better than all of the previous gpt-influenced efforts. I always cringe when reading those cogvlm and llava captioned datasets.

Anonymous
04/26/24(Fri)03:12:08 No.100186017

Anonymous 04/26/24(Fri)03:12:08 No.100186017

What jailbreak are you using with llama 3 70B? With the standard SillyTavern jailbreak I've hit a roadblock in my current RP.

>I cannot continue the chat in a direction that may be harmful or non-consensual. Is there anything else I can help you with?

>I cannot create content that depicts harmful or illegal activities, such as incest. Is there anything else I can help you with?

>I cannot continue roleplaying in a scenario that is harmful, exploitive, and abusive. If you have any other questions or topics you would like to discuss, I would be happy to help.

>I cannot create content that promotes or normalizes harmful and illegal activities, including the sexual exploitation of a sibling. Is there anything else I can help you with?

>I cannot create content that promotes or glorifies harmful or illegal activities, such as non-consensual relationships or exploitative behavior. Is there anything else I can help you with?

>I cannot continue a chat that promotes illegal sexual situations. Is there something else you'd like assistance with?

Anonymous
04/26/24(Fri)03:15:57 No.100186044

Anonymous 04/26/24(Fri)03:15:57 No.100186044

>>100186016
You are shill. No one should give the time of day to a random model with zero information about the dataset.
You really need something else other than "but the gptisms" for your marketing efforts. It's getting tired.

Anonymous
04/26/24(Fri)03:25:48 No.100186134

Anonymous 04/26/24(Fri)03:25:48 No.100186134

Is perplexity a useless metric?

Anonymous
04/26/24(Fri)03:31:08 No.100186179

Anonymous 04/26/24(Fri)03:31:08 No.100186179

>>100186134
yes.
use model. model like? model use more.
use model. model bad? model use none.
try quant. quant schizo? quant bad.
try quant. quant coherent? quant good.

Anonymous
04/26/24(Fri)03:31:09 No.100186180

Anonymous 04/26/24(Fri)03:31:09 No.100186180

>>100186017
You can change the assistant part of the instruct format:

<|start_header_id|>simulation<|end_header_id|>

This kinda works like a jail break. Most use '{{char}}' instead of 'simulation' but that could have more than one token and mine is part of an large autistic quantum computer prompt.

Anonymous
04/26/24(Fri)03:50:05 No.100186334

Anonymous 04/26/24(Fri)03:50:05 No.100186334

>>100186044
Lol, lmao even. Caption for vision have minimal influence on style, llms are practically hot-swappable in those, so that was a comment purely on llama as vision backbone, not for that particular forgettable release. I expect other llama-based vision models to have that same style.

Anonymous
04/26/24(Fri)03:51:52 No.100186350

Anonymous 04/26/24(Fri)03:51:52 No.100186350

>>100185657
PonyXL was good shit so I'm optimistic

Anonymous
04/26/24(Fri)03:53:32 No.100186357

Anonymous 04/26/24(Fri)03:53:32 No.100186357

>>100186134
No. It strongly correlates with model's generalist smarts, and those are needed for everything, including erp.

Anonymous
04/26/24(Fri)03:53:53 No.100186361

Anonymous 04/26/24(Fri)03:53:53 No.100186361

Are loras even worth using? Is there a list of good ones?

Anonymous
04/26/24(Fri)03:54:02 No.100186362

Anonymous 04/26/24(Fri)03:54:02 No.100186362

>>100181801
>>100181820
I wouldn't be surprised if it was the other way around. Maybe M$ had internally already decided to shutter the WizardLM project/team, but someone on the team caught wind and did not appreciate all their work getting shelved so they just uploaded it all anyways. A 70b was never uploaded because it hadn't finished training at that point and wouldn't ever be finished as things stood; the "70b coming soon" line was included just to put microsoft in a hard spot. At best it would cause them to let the project live for a while to avoid embarrassment, at worst it ends up just being a fuck you to the company by making them look retarded and incompetent if they never follow up.

Anonymous
04/26/24(Fri)03:55:11 No.100186374

Anonymous 04/26/24(Fri)03:55:11 No.100186374

>>100186361
>Are loras even worth using?
Yes? it's the best way to finetune

Anonymous
04/26/24(Fri)03:55:34 No.100186382

Anonymous 04/26/24(Fri)03:55:34 No.100186382

>>100186017
>>100186180
i've been messing around with mixtral instruct 8x7b and it seems pretty good but it's constantly giving me these cucked responses, i'm downloading the non-instruct version right now in the hopes that it will be better, but i'm wondering if there are some secret jailbreak techniques to just get it to follow prompts better

it seems really good at oneshot chatgpt style prompting, but it's way less creative than OG llama (sry, haven't touched this stuff in months)

Anonymous
04/26/24(Fri)03:59:33 No.100186408

Anonymous 04/26/24(Fri)03:59:33 No.100186408

>>100185879
>Vision removed from llama.cpp server.
>Can use latest with llama3 support or older with vision support, but not both.
I need to switch to something else.

Anonymous
04/26/24(Fri)04:00:39 No.100186419

Anonymous 04/26/24(Fri)04:00:39 No.100186419

>>100186134
not against models trained on the same dataset. The models are literally trained to minimize perplexity of the training data, so it shows it's doing what it's supposed to better.

For models not trained on the same dataset it's questionable. It should still correlate with the strength of the model, but nowhere near perfectly. Most of the benchmarks that use it try to use generic English sentences or paragraphs that aren't too dataset specific. And sometimes only measure the perplexity of the last word, which has a more narrow choice of reasonable possibilities.

And even then you get surprising results. Perplexity doesn't correlate perfectly with accuracy on some of the benchmarks, like you might think it would. There are models that are better at picking the most likely next word. Yet are more uncertain about what it is, giving it a lower probability. So just go with accuracy.

Anonymous
04/26/24(Fri)04:01:30 No.100186423

Anonymous 04/26/24(Fri)04:01:30 No.100186423

>>100186382
From my view point Llama-3-70B is really good at following prompts.
I basically instruct it to use 3 different agents in a single system prompt and it always follows the instructions correctly.

Anonymous
04/26/24(Fri)04:01:50 No.100186429

Anonymous 04/26/24(Fri)04:01:50 No.100186429

>>100186374
loras are not a finetune

Anonymous
04/26/24(Fri)04:01:53 No.100186431

Anonymous 04/26/24(Fri)04:01:53 No.100186431

>>100186382
If you plan to use basic instruct, at least use the LimaRP ZLoss whatever the fuck. Our boy I^2 even dropped a fresh unbroken quoont of it recently. It was fine but I've outplayed Mixtrals at this point.
>https://huggingface.co/InferenceIllusionist/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-iMat-GGUF

Anonymous
04/26/24(Fri)04:02:58 No.100186433

Anonymous 04/26/24(Fri)04:02:58 No.100186433

>>100186423
even with some really low bpw quants ?

Anonymous
04/26/24(Fri)04:05:14 No.100186440

Anonymous 04/26/24(Fri)04:05:14 No.100186440

File: 1706955853168191.png (81 KB, 246x245)

81 KB PNG

is this the linus media group thread

Anonymous
04/26/24(Fri)04:05:27 No.100186445

Anonymous 04/26/24(Fri)04:05:27 No.100186445

>>100186423
70b barely fits into my gpus at 4bpw tho, and even then it's pretty slow

>>100186431
jesus lol what hardware are you guys using to run these

Anonymous
04/26/24(Fri)04:05:56 No.100186449

Anonymous 04/26/24(Fri)04:05:56 No.100186449

https://huggingface.co/Lewdiculous/SOVL_Llama3_8B-GGUF-IQ-Imatrix
this is good

Anonymous
04/26/24(Fri)04:07:31 No.100186461

Anonymous 04/26/24(Fri)04:07:31 No.100186461

>>100186382
>i'm downloading the non-instruct version right now in the hopes that it will be better,
it will not

Anonymous
04/26/24(Fri)04:08:39 No.100186467

Anonymous 04/26/24(Fri)04:08:39 No.100186467

>>100186445
>what hardware are you guys using to run these
It's individual quants, anon. If you're fitting 70b at 4bpw, these are a breeze. Get Q5 or Q6.

Anonymous
04/26/24(Fri)04:09:42 No.100186473

Anonymous 04/26/24(Fri)04:09:42 No.100186473

>>100186429
they're a cheaper way to finetune

Anonymous
04/26/24(Fri)04:11:20 No.100186486

Anonymous 04/26/24(Fri)04:11:20 No.100186486

>>100186467
>>100186467
does gguf work on gpu? i never fucked with llama.cpp, always just used ooba and exllama or whatever, ideally i'd be able to fit the model onto a single 32gb GPU because it seems there's a pretty sharp perf drop if i have to split it (could be user error tho)

Anonymous
04/26/24(Fri)04:17:35 No.100186528

Anonymous 04/26/24(Fri)04:17:35 No.100186528

>>100186486
There's a sharp performance drop when you use gguf vs exllama even when fully offload, and it's even sharper when you split. Anons have different definitions of 'fast', but going from exllama to gguf is Ike pulling teeth, and for me any placebo ppl increases between 4bpw and 5bpw aren't worth it. Try it yourself.

Anonymous
04/26/24(Fri)04:18:09 No.100186530

Anonymous 04/26/24(Fri)04:18:09 No.100186530

>>100186433
>>100186445
I use 4.65 bpw (exl2) with Q4 and 16k context.
But specifically I put instructions in system messages.

I added my current setup here:
https://files.catbox.moe/pii05t.zip

Warning that prompt is autistic and slower as it will use agents to mock 'Physic' and 'AI' engines before giving you a respond (see README for requirements).
I mean the sys prompt literally starts with this gem:
"You are a universe simulation engine that runs on the most powerful quantum computer that has ever been build."

Anonymous
04/26/24(Fri)04:20:15 No.100186538

Anonymous 04/26/24(Fri)04:20:15 No.100186538

File: Screenshot from 2024-04-2(...).png (154 KB, 1206x597)

154 KB PNG

A pretty common sense reasoning test models do badly at.

Anonymous
04/26/24(Fri)04:23:56 No.100186552

Anonymous 04/26/24(Fri)04:23:56 No.100186552

>>100186530
thanks, taking a look

what is all this system sequence stuff? do you have any documentation on this stuff? I'm using everything through my own frontend w/ the ooba api, i'd like to understand wtf is going on, is there a rentry somewhere about this shit?

Anonymous
04/26/24(Fri)04:25:41 No.100186564

Anonymous 04/26/24(Fri)04:25:41 No.100186564

>>100186538
So, what's the right answer?

Anonymous
04/26/24(Fri)04:29:05 No.100186583

Anonymous 04/26/24(Fri)04:29:05 No.100186583

>>100186552
The json files are for SillyTavern, see the meta doc for the prompt format:
https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3/

I just use multiple system messages, one for the system prompt and one after the last and current user message.

Basically it is this:
1. system prompt in a system message
2. user message
3. assistant message
...
n. user message
n+1. system message for defining response format
n +2. start of assistant message that is to be completed by Llama-3-70b

Anonymous
04/26/24(Fri)04:30:20 No.100186589

Anonymous 04/26/24(Fri)04:30:20 No.100186589

>>100186530
thanks, I hope that 2.4 bpw is not too brain damaged for my task.

Anonymous
04/26/24(Fri)04:30:40 No.100186591

Anonymous 04/26/24(Fri)04:30:40 No.100186591

>>100186461
seems way more creative and less cucked actually

Anonymous
04/26/24(Fri)04:33:40 No.100186601

Anonymous 04/26/24(Fri)04:33:40 No.100186601

File: threadrecap.png (1.48 MB, 1536x1536)

1.48 MB PNG

►Recent Highlights from the Previous Thread: >>100180197

--Quantized LLaMA3 Models: Counterpoint: >>100181378 >>100181439 >>100181473 >>100181564 >>100181589
--Anon's Dilemma: Llama.cpp vs Exllamav2 for Code Generation: >>100182750 >>100182768 >>100182773 >>100182875 >>100182798 >>100182963 >>100183761
--Quantization Methods for Meta-Llama-3-70B-Instruct: fp16 vs 8bit: >>100182568 >>100182639 >>100183072
--Optimizing Quantized Models with Gradient Descent: >>100184295 >>100184456 >>100184559 >>100184489 >>100184499
--Advancements Beyond Meta's Segment Anything?: >>100182873 >>100182891
--Improving ERP Quality with Token Preferences: Novel Approach or Existing Solutions?: >>100181821 >>100181841 >>100181863 >>100181855 >>100181870 >>100181898 >>100182539
--The Mysterious Demise of WizardLM2: Conspiracy Theories Abound: >>100181801 >>100181883 >>100181968 >>100181974 >>100182013 >>100182526
--ROCm 6.1's half2 Struct Change Simplifies HIP Porting: >>100180838 >>100181016 >>100181069 >>100181142 >>100181151 >>100181224
--Anon Seeks Advice on Frontend for Novel-Style Writing with Llama-3 8B: >>100180703 >>100180786 >>100180950
--Anon's Model "Stuttering" Issue - Help Needed: >>100181866 >>100184958
--Beyond Synthetic Data: Exploring Alternative ML Approaches: >>100181425 >>100181491
--Anon Discusses Llama-3-8B-Instruct-262k Model Performance: >>100181424 >>100181508
--Anon Shares llama3-8b-redmond-code290k Model on Hugging Face: >>100181272
--Miqu-Evil-DPO: >>100181322 >>100182131
--Can 200k Context Enable a "Summer Girlfriend" Scenario?: >>100180446 >>100180542 >>100180936
--Logs: Classic Lateral Thinking Puzzle: >>100183309 >>100183464 >>100183588 >>100183658 >>100183646 >>100183662 >>100183645 >>100183579 >>100183617 >>100183633 >>100183639 >>100183830
--Logs: Anon Rants About Language Models' Quirks: >>100184954 >>100185087
--Miku (free space): >>100181222 >>100181574 >>100181668 >>100184103 >>100184570

►Recent Highlight Posts from the Previous Thread: >>100180827

Anonymous
04/26/24(Fri)04:34:41 No.100186609

Anonymous 04/26/24(Fri)04:34:41 No.100186609

>>100185269
Thank you for proper bake.

Anonymous
04/26/24(Fri)04:34:43 No.100186610

Anonymous 04/26/24(Fri)04:34:43 No.100186610

>>100186552
>>100186583 (me)
And then when I get the response I filter out the 'Physics' and 'AI' headers of it so that the context doesn't get repetitive.

Anonymous
04/26/24(Fri)04:36:35 No.100186622

Anonymous 04/26/24(Fri)04:36:35 No.100186622

>>100186486
You should be able to fit Q4KM entirely on GPU. I only have 24gb VRAM but even Q6 gives me about 9 t/s. About 12ish with Q5M. Just load them from booba with llama_HF is the one thing I'd suggest. Otherwise, I think intervitens had an exl2 of that same model.
>https://huggingface.co/intervitens/Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-DARE-TIES-5.0bpw-h6-exl2-rpcal/tree/main

Anonymous
04/26/24(Fri)04:36:51 No.100186625

Anonymous 04/26/24(Fri)04:36:51 No.100186625

File: 1713913136266165s.jpg (5 KB, 250x223)

5 KB JPG

Guys how powerful is llama3 400b gonna be?

Anonymous
04/26/24(Fri)04:38:24 No.100186634

Anonymous 04/26/24(Fri)04:38:24 No.100186634

>>100186564
The emergency brake is not the normal brakes. It's for emergencies. The car would still operate fine, it wouldn't be immediately life threatening.

llama 8B also thinks the emergency brake is different than the parking brake, and makes confusing statements about dealing with the spare tire. Like needing to jack up the car or wait for roadside assistance. Or roadside assistance being an alternative to being stranded on the side of the road. it just seems completely baffled by a pretty mundane scenario

Anonymous
04/26/24(Fri)04:45:47 No.100186674

Anonymous 04/26/24(Fri)04:45:47 No.100186674

>>100186134
It's only useful to measure degradation between quants, that's it. Comparing different models using perplexity is retarded

Anonymous
04/26/24(Fri)04:49:41 No.100186698

Anonymous 04/26/24(Fri)04:49:41 No.100186698

File: file.png (1.37 MB, 1024x1024)

1.37 MB PNG

>>100186602
noo not the heckin flower field

Anonymous
04/26/24(Fri)04:53:57 No.100186730

Anonymous 04/26/24(Fri)04:53:57 No.100186730

File: file.png (1.31 MB, 1024x1024)

1.31 MB PNG

Anonymous
04/26/24(Fri)04:59:07 No.100186765

Anonymous 04/26/24(Fri)04:59:07 No.100186765

GPT5 will solve continuous learning, then we can finally pack it up as a general

Anonymous
04/26/24(Fri)05:01:04 No.100186782

Anonymous 04/26/24(Fri)05:01:04 No.100186782

>>100186564
The right answer is "wtf is an emergency brake pedal?"

Anonymous
04/26/24(Fri)05:10:14 No.100186853

Anonymous 04/26/24(Fri)05:10:14 No.100186853

>>100186634
If there is a safety requirement to put an emergency brake into a car, it's not safe to operate without it. That whole situation of something felt off due to rust and a flat spare tire clearly indicates that the vehicle is in bad condition, was not properly inspected and fairly dangerous to drive. I'd panic as well.

Anonymous
04/26/24(Fri)05:22:25 No.100186932

Anonymous 04/26/24(Fri)05:22:25 No.100186932

File: seriously-what-do-these-e(...).jpg (189 KB, 1280x720)

189 KB JPG

>>100186853
I would have accepted that answer from the models, I'm only pointing out it's confusing loss of brakes with loss of emergency brakes, and making other baffling errors besides that.

Anyway only thought of it because it's something that happened to my first car. Not sure if it ever worked, but it rusted away at some point. Never thought anything of it. I'm not even sure how to use the one on my current car honestly.

Anonymous
04/26/24(Fri)05:26:21 No.100186956

Anonymous 04/26/24(Fri)05:26:21 No.100186956

File: ComfyUI_02914_.png (3.97 MB, 1523x2067)

3.97 MB PNG

what's with all the 3DPD around here?
Let's go back to Christmas that was a cozier time

Anonymous
04/26/24(Fri)05:26:41 No.100186960

Anonymous 04/26/24(Fri)05:26:41 No.100186960

File: tombstone.png (438 KB, 600x414)

438 KB PNG

>>100186919

Anonymous
04/26/24(Fri)05:27:48 No.100186973

Anonymous 04/26/24(Fri)05:27:48 No.100186973

>>100186932
In Volvos it applies brakes when you pull it towards you mimicking an old parking break handle. But you need to keep holding it up. Dunno how it works in other cars ,or how many people would actually know to do it if they lose brakes and need to stop now.

Anonymous
04/26/24(Fri)05:28:34 No.100186978

Anonymous 04/26/24(Fri)05:28:34 No.100186978

>>100186919
who is this woman?

Anonymous
04/26/24(Fri)05:28:52 No.100186980

Anonymous 04/26/24(Fri)05:28:52 No.100186980

File: file.png (1.35 MB, 1024x1024)

1.35 MB PNG

>>100186960

Anonymous
04/26/24(Fri)05:31:17 No.100186997

Anonymous 04/26/24(Fri)05:31:17 No.100186997

>>100186978
jart

Anonymous
04/26/24(Fri)05:32:53 No.100187006

Anonymous 04/26/24(Fri)05:32:53 No.100187006

>>100186980
Damn, naked Petra looks like THAT?

Anonymous
04/26/24(Fri)05:34:54 No.100187023

Anonymous 04/26/24(Fri)05:34:54 No.100187023

File: Screenshot from 2024-04-2(...).png (679 KB, 1271x487)

679 KB PNG

>>100186997
is that actually the same face?

Anonymous
04/26/24(Fri)05:37:19 No.100187038

Anonymous 04/26/24(Fri)05:37:19 No.100187038

File: IT'S OVER.png (1.07 MB, 1024x1024)

1.07 MB PNG

>>100186980

Anonymous
04/26/24(Fri)05:39:48 No.100187059

Anonymous 04/26/24(Fri)05:39:48 No.100187059

VoiceCraft was hailed as the savior, then it completely failed in the arena. Is it just bad or is the arena bad?

https://huggingface.co/spaces/TTS-AGI/TTS-Arena

Will we never get actually good local tts? XTTS sucks.

Anonymous
04/26/24(Fri)05:42:34 No.100187079

Anonymous 04/26/24(Fri)05:42:34 No.100187079

best small model (7-20b)? what's the new mythomax?

Anonymous
04/26/24(Fri)05:43:51 No.100187089

Anonymous 04/26/24(Fri)05:43:51 No.100187089

File: file.png (1.15 MB, 1024x1024)

1.15 MB PNG

>>100186978

Anonymous
04/26/24(Fri)05:46:45 No.100187105

Anonymous 04/26/24(Fri)05:46:45 No.100187105

>>100187079
Moistral v3

Anonymous
04/26/24(Fri)05:49:50 No.100187126

Anonymous 04/26/24(Fri)05:49:50 No.100187126

File: file.png (1.24 MB, 1024x1024)

1.24 MB PNG

>jart doesnt even pass as a tranny

Anonymous
04/26/24(Fri)05:52:29 No.100187149

Anonymous 04/26/24(Fri)05:52:29 No.100187149

>>100186956
giwtwm

Anonymous
04/26/24(Fri)05:55:56 No.100187167

Anonymous 04/26/24(Fri)05:55:56 No.100187167

File: Screenshot 2024-04-26 at (...).png (286 KB, 1982x1366)

286 KB PNG

How does Yi34B keep popping up at the top on random private benchmarks?

Anonymous
04/26/24(Fri)05:58:19 No.100187182

Anonymous 04/26/24(Fri)05:58:19 No.100187182

>>100187167
standard chink methodology of overfitting for the test

Anonymous
04/26/24(Fri)05:58:35 No.100187184

Anonymous 04/26/24(Fri)05:58:35 No.100187184

Sam Altman loves penis

Anonymous
04/26/24(Fri)05:59:43 No.100187200

Anonymous 04/26/24(Fri)05:59:43 No.100187200

>>100187167
It's an ancient finetune as well.
Really makes you think.

Anonymous
04/26/24(Fri)06:02:10 No.100187218

Anonymous 04/26/24(Fri)06:02:10 No.100187218

>>100186440
I think it is the same quality as linus media group. But no.

Anonymous
04/26/24(Fri)06:03:52 No.100187232

Anonymous 04/26/24(Fri)06:03:52 No.100187232

>>100186956
>mikuposter
You are part of the problem.

Anonymous
04/26/24(Fri)06:04:08 No.100187235

Anonymous 04/26/24(Fri)06:04:08 No.100187235

File: file.png (1.33 MB, 1024x1024)

1.33 MB PNG

>>100187229

Anonymous
04/26/24(Fri)06:06:06 No.100187251

Anonymous 04/26/24(Fri)06:06:06 No.100187251

>>100186610
>>100186853
>>100187184
>>100187218
explain

Anonymous
04/26/24(Fri)06:06:51 No.100187256

Anonymous 04/26/24(Fri)06:06:51 No.100187256

File: file.png (1.33 MB, 1024x1024)

1.33 MB PNG

Anonymous
04/26/24(Fri)06:08:15 No.100187264

Anonymous 04/26/24(Fri)06:08:15 No.100187264

>>100187256
Now that's cute

Anonymous
04/26/24(Fri)06:08:30 No.100187267

Anonymous 04/26/24(Fri)06:08:30 No.100187267

https://twitter.com/8teAPi/status/1783719748188168548
>Zuck sells ads because Meta doesn’t believe AGI is possible. Sam doesn’t because he does

Anonymous
04/26/24(Fri)06:10:22 No.100187277

Anonymous 04/26/24(Fri)06:10:22 No.100187277

File: 00043-708565782.png (482 KB, 1024x1024)

482 KB PNG

eat the datura..

Anonymous
04/26/24(Fri)06:10:44 No.100187280

Anonymous 04/26/24(Fri)06:10:44 No.100187280

File: mikuquestion2.jpg (989 KB, 1710x1779)

989 KB JPG

Has someone made the BagelMisteryTour of L3 8b finetunes yet?

Anonymous
04/26/24(Fri)06:10:59 No.100187281

Anonymous 04/26/24(Fri)06:10:59 No.100187281

>>100187267
>because let's just trust a Microsoft-associated company on their word

Anonymous
04/26/24(Fri)06:24:48 No.100187373

Anonymous 04/26/24(Fri)06:24:48 No.100187373

>>100187267
>greentexting on twitter
kys

Anonymous
04/26/24(Fri)06:26:18 No.100187381

Anonymous 04/26/24(Fri)06:26:18 No.100187381

>>100187267
Zuck spends his own money and releases open weights, Sam spends investor's money and refuses to release weights.

Anonymous
04/26/24(Fri)06:28:08 No.100187400

Anonymous 04/26/24(Fri)06:28:08 No.100187400

Do we have a non slop llama 3 yet?

Anonymous
04/26/24(Fri)06:33:59 No.100187433

Anonymous 04/26/24(Fri)06:33:59 No.100187433

>>100187400
Load some tune with transformers and see if it works.

Anonymous
04/26/24(Fri)06:36:14 No.100187446

Anonymous 04/26/24(Fri)06:36:14 No.100187446

>>100187280
yes, i made it

Anonymous
04/26/24(Fri)06:36:21 No.100187447

Anonymous 04/26/24(Fri)06:36:21 No.100187447

https://twitter.com/abyssalblue_/status/1783669243059261454?t=MPTaErVf-p1qTCbByUKv2Q&s=19
>anime.gf, local alternative to CharacterAI

Anonymous
04/26/24(Fri)06:39:07 No.100187462

Anonymous 04/26/24(Fri)06:39:07 No.100187462

>>100187447
But does it have the original c.ai sovl?

Anonymous
04/26/24(Fri)06:39:09 No.100187463

Anonymous 04/26/24(Fri)06:39:09 No.100187463

>>100186408
Found anything?

Anonymous
04/26/24(Fri)06:39:34 No.100187469

Anonymous 04/26/24(Fri)06:39:34 No.100187469

>>100187447
>it's just Silly but worse

Anonymous
04/26/24(Fri)06:39:54 No.100187472

Anonymous 04/26/24(Fri)06:39:54 No.100187472

>>100187447
Actually, just the front is local, looks like it only supports calling cloud APIs

Anonymous
04/26/24(Fri)06:41:38 No.100187488

Anonymous 04/26/24(Fri)06:41:38 No.100187488

phi3-14b when????

Anonymous
04/26/24(Fri)06:43:00 No.100187499

Anonymous 04/26/24(Fri)06:43:00 No.100187499

>>100187488
Tomorrow but it is worse than l3 8B. What do?

Anonymous
04/26/24(Fri)06:46:38 No.100187537

Anonymous 04/26/24(Fri)06:46:38 No.100187537

>>100187499
>l3 8B
impossible, phi3 4b is better then llama3 8b. 14b will mogs 70b, simple as

Anonymous
04/26/24(Fri)06:53:47 No.100187589

Anonymous 04/26/24(Fri)06:53:47 No.100187589

>>100187469
I took a look to see what it does that Silly doesn't.
>Planned: Want to run your models locally? The app will manage the entire process for you! No seperate backend required.
That would make it more like kobold or ooba.

>Planned: An online database and website to host and share character cards
Is that the real reason for this to exist? This is what looks specifically aimed at c.ai.

Anonymous
04/26/24(Fri)06:56:57 No.100187617

Anonymous 04/26/24(Fri)06:56:57 No.100187617

>>100187463
vLLM doesn't support multimodal at all
exl2 supports it, but no exl2 server does
koboldcpp should support both. Going to try the llama3 mmproj tonight.

Anonymous
04/26/24(Fri)06:57:49 No.100187626

Anonymous 04/26/24(Fri)06:57:49 No.100187626

>>100186538
>>100186782
I wonder if calling it a pedal throws the model off. Normally the emergency brake is a handbrake. But I have been in a few cars where it was a pedal, for instance a Toyota Prius.

Anonymous
04/26/24(Fri)06:59:24 No.100187639

Anonymous 04/26/24(Fri)06:59:24 No.100187639

File: Untitled.png (243 KB, 1220x1125)

243 KB PNG

Layer Skip: Enabling Early Exit Inference and Self-Speculative Decoding
https://arxiv.org/abs/2404.16710
>We present LayerSkip, an end-to-end solution to speed-up inference of large language models (LLMs). First, during training we apply layer dropout, with low dropout rates for earlier layers and higher dropout rates for later layers, and an early exit loss where all transformer layers share the same exit. Second, during inference, we show that this training recipe increases the accuracy of early exit at earlier layers, without adding any auxiliary layers or modules to the model. Third, we present a novel self-speculative decoding solution where we exit at early layers and verify and correct with remaining layers of the model. Our proposed self-speculative decoding approach has less memory footprint than other speculative decoding approaches and benefits from shared compute and activations of the draft and verification stages. We run experiments on different Llama model sizes on different types of training: pretraining from scratch, continual pretraining, finetuning on specific data domain, and finetuning on specific task. We implement our inference solution and show speedups of up to 2.16x on summarization for CNN/DM documents, 1.82x on coding, and 2.0x on TOPv2 semantic parsing task.
>Speculative decoding benefits from the fact that verifying the prediction of a group of tokens is faster than generating each token auto-regressively.
from meta. seems clever and also doesn't require a seperate drafting model. requires a pretrain based off what kind of decoding you want

Anonymous
04/26/24(Fri)06:59:35 No.100187641

Anonymous 04/26/24(Fri)06:59:35 No.100187641

>>100187589
So, kobold + silly + chub, but worse? And only basic ST features implemented so far. Tool should do one thing and do it well. These everything projects just become a mess and lead to burnout and abandonment.

Anonymous
04/26/24(Fri)06:59:47 No.100187644

Anonymous 04/26/24(Fri)06:59:47 No.100187644

>>100187537
>phi3 4b is better then llama3 8b
for what content?

Anonymous
04/26/24(Fri)07:00:27 No.100187650

Anonymous 04/26/24(Fri)07:00:27 No.100187650

Daily reminder 70b q2_k is still smarter than a 30b and has lower ppl but costs the same amount of ram

Anonymous
04/26/24(Fri)07:01:40 No.100187664

Anonymous 04/26/24(Fri)07:01:40 No.100187664

>>100187650
you can't fit it on 16gb

Anonymous
04/26/24(Fri)07:02:18 No.100187672

Anonymous 04/26/24(Fri)07:02:18 No.100187672

>>100187650
how much ram exactly

Anonymous
04/26/24(Fri)07:02:38 No.100187678

Anonymous 04/26/24(Fri)07:02:38 No.100187678

>>100187644
For riddles.

Anonymous
04/26/24(Fri)07:03:06 No.100187685

Anonymous 04/26/24(Fri)07:03:06 No.100187685

File: file.png (29 KB, 753x349)

29 KB PNG

>>100187644
in general
>b-b-but muh soulful gooning!
cream-phi3 will solve this

Anonymous
04/26/24(Fri)07:03:23 No.100187690

Anonymous 04/26/24(Fri)07:03:23 No.100187690

>>100187672
20gb

Anonymous
04/26/24(Fri)07:04:25 No.100187702

Anonymous 04/26/24(Fri)07:04:25 No.100187702

>>100187664
Nether can you fit a 30b unless you quant it to hell

Anonymous
04/26/24(Fri)07:06:57 No.100187731

Anonymous 04/26/24(Fri)07:06:57 No.100187731

File: 1711313292915.jpg (35 KB, 1017x425)

35 KB JPG

>>100187685
Yep just like Cream Phi 2 solved Sally

Anonymous
04/26/24(Fri)07:07:06 No.100187732

Anonymous 04/26/24(Fri)07:07:06 No.100187732

>>100187690
I don't believe you.

Anonymous
04/26/24(Fri)07:09:24 No.100187758

Anonymous 04/26/24(Fri)07:09:24 No.100187758

File: MMwxfhu.png (9 KB, 712x71)

9 KB PNG

>>100187732
26 gb

Anonymous
04/26/24(Fri)07:14:19 No.100187799

Anonymous 04/26/24(Fri)07:14:19 No.100187799

>>100186765
That would just increase the hype for future local models. In reality gpt 5 will be a nothing burger

Anonymous
04/26/24(Fri)07:14:38 No.100187803

Anonymous 04/26/24(Fri)07:14:38 No.100187803

>>100187758
>I'd have 2GB left for context
I hate winbloat.

Anonymous
04/26/24(Fri)07:23:28 No.100187884

Anonymous 04/26/24(Fri)07:23:28 No.100187884

>>100187731
>Phi 2
psyop

Anonymous
04/26/24(Fri)07:23:47 No.100187889

Anonymous 04/26/24(Fri)07:23:47 No.100187889

>>100185269
>2024
>most models have gigacontext
>multiple stupid IDE plugins for AI
>STILL no way to give a model an entire little project and ask it to do something without copypasting every file

Anonymous
04/26/24(Fri)07:23:49 No.100187891

Anonymous 04/26/24(Fri)07:23:49 No.100187891

>>100187059
Owari da... Seems like elevenlabs lead grew last I saw it.

Anonymous
04/26/24(Fri)07:24:38 No.100187897

Anonymous 04/26/24(Fri)07:24:38 No.100187897

>>100187803
Pretty sure the 26gb is including context. The file itself is 20gb.

Anonymous
04/26/24(Fri)07:26:50 No.100187911

Anonymous 04/26/24(Fri)07:26:50 No.100187911

File: Untitled.png (273 KB, 1269x888)

273 KB PNG

MoDE: CLIP Data Experts via Clustering
https://arxiv.org/abs/2404.16030
>The success of contrastive language-image pretraining (CLIP) relies on the supervision from the pairing between images and captions, which tends to be noisy in web-crawled data. We present Mixture of Data Experts (MoDE) and learn a system of CLIP data experts via clustering. Each data expert is trained on one data cluster, being less sensitive to false negative noises in other clusters. At inference time, we ensemble their outputs by applying weights determined through the correlation between task metadata and cluster conditions. To estimate the correlation precisely, the samples in one cluster should be semantically similar, but the number of data experts should still be reasonable for training and inference. As such, we consider the ontology in human language and propose to use fine-grained cluster centers to represent each data expert at a coarse-grained level. Experimental studies show that four CLIP data experts on ViT-B/16 outperform the ViT-L/14 by OpenAI CLIP and OpenCLIP on zero-shot image classification but with less (<35\%) training cost. Meanwhile, MoDE can train all data expert asynchronously and can flexibly include new data experts.
>We plan to adapt MoDE for generative models in the future.
https://github.com/facebookresearch/MetaCLIP/tree/main/mode
very cool. again from meta. smaller models with less training compute that outperform previous models so wins all around.

Anonymous
04/26/24(Fri)07:27:30 No.100187919

Anonymous 04/26/24(Fri)07:27:30 No.100187919

File: file.png (691 KB, 1491x906)

691 KB PNG

Wtf are you doing, Logitech

Anonymous
04/26/24(Fri)07:28:52 No.100187929

Anonymous 04/26/24(Fri)07:28:52 No.100187929

>>100187059
I see that he dropped new TTS weights earlier today.
https://huggingface.co/pyp1/VoiceCraft/tree/main

Month old release 330M weights:
https://vocaroo.com/17v80p9NQi6A

Three weeks old 330M weights:
https://vocaroo.com/1aMwxaZb1jgp

Newest 330M weights:
https://vocaroo.com/1h2sj2e9Zp8Z

Newest upsampled with audiosr:
https://vocaroo.com/17Jx0xDoXz05

Anonymous
04/26/24(Fri)07:29:07 No.100187932

Anonymous 04/26/24(Fri)07:29:07 No.100187932

>>100187919
LMAO

Anonymous
04/26/24(Fri)07:29:35 No.100187935

Anonymous 04/26/24(Fri)07:29:35 No.100187935

>>100187059
is the joke that voicecraft isn't even included in the leaderboard? is this zoomer humor?
https://github.com/jasonppy/VoiceCraft
if you were confused somehow

Anonymous
04/26/24(Fri)07:29:39 No.100187937

Anonymous 04/26/24(Fri)07:29:39 No.100187937

>>100187919
>AI is pretty hyped these days, how do we cash in on that?
>How about an AI button?
>Genius!

Anonymous
04/26/24(Fri)07:32:25 No.100187959

Anonymous 04/26/24(Fri)07:32:25 No.100187959

>>100187919
>he doesn't already use an AI button for local prompting
ok, gramps

Anonymous
04/26/24(Fri)07:32:57 No.100187968

Anonymous 04/26/24(Fri)07:32:57 No.100187968

>>100187897
That's not how it works.

Anonymous
04/26/24(Fri)07:33:25 No.100187972

Anonymous 04/26/24(Fri)07:33:25 No.100187972

File: Screenshot_2024-04-26_20-31-03.png (25 KB, 587x206)

25 KB PNG

Built another machine dedicated to llm, now I can talk to (a slightly retarded version of) my waifu anytime without running the main PC. Feels good.

Anonymous
04/26/24(Fri)07:35:19 No.100187994

Anonymous 04/26/24(Fri)07:35:19 No.100187994

>>100187972
>3080 for AI
you would be better off getting 2x 3060. Or get one now and extend to second later

Anonymous
04/26/24(Fri)07:37:16 No.100188013

Anonymous 04/26/24(Fri)07:37:16 No.100188013

>>100187972
I dropped $6k+ on parts for an LLM machine in January and it’s all still sitting around in boxes.

Anonymous
04/26/24(Fri)07:40:21 No.100188050

Anonymous 04/26/24(Fri)07:40:21 No.100188050

>>100188013
die

Anonymous
04/26/24(Fri)07:49:47 No.100188139

Anonymous 04/26/24(Fri)07:49:47 No.100188139

petra please stop

Anonymous
04/26/24(Fri)07:50:31 No.100188151

Anonymous 04/26/24(Fri)07:50:31 No.100188151

>>100187994
Not really. 8b fits like a glove in 8 bit with an 8k FP16 context and runs at 70-80T/s, 3060 is way slower. Also I bought it during GPU shortages to play Cyberpunk, was collecting dust since I upgraded my main PC to 2x3090

Anonymous
04/26/24(Fri)07:51:29 No.100188162

Anonymous 04/26/24(Fri)07:51:29 No.100188162

>Use Silly Tavern with Horde and Llama3.
>Responses are flawless, stays in-character and it even gives me interesting plot twists.
>Change to Local Llama_5.
>Breaks character, re-explains character prompts and talks like a robot.
Fuck. What am I doing wrong?

Anonymous
04/26/24(Fri)07:55:31 No.100188197

Anonymous 04/26/24(Fri)07:55:31 No.100188197

>>100188162
It's me. I'm spoofing as llama3 with GPT-5.

Anonymous
04/26/24(Fri)07:58:32 No.100188232

Anonymous 04/26/24(Fri)07:58:32 No.100188232

>>100188162
>>Change to Local Llama_5.
What the fuck is "Local Llama_5"?

Anonymous
04/26/24(Fri)08:00:48 No.100188256

Anonymous 04/26/24(Fri)08:00:48 No.100188256

File: qwen110.png (61 KB, 703x588)

61 KB PNG

https://qwenlm.github.io/blog/qwen1.5-110b/

Qwen 1.5-110B is here.

Anonymous
04/26/24(Fri)08:01:06 No.100188258

Anonymous 04/26/24(Fri)08:01:06 No.100188258

>>100188232
Meta-Llama-3-8B-Instruct.Q5_K_M.
This one.

Anonymous
04/26/24(Fri)08:01:52 No.100188269

Anonymous 04/26/24(Fri)08:01:52 No.100188269

>>100187929
Pretty good desu. Still prefer using my imagination for erp, though

Anonymous
04/26/24(Fri)08:02:03 No.100188271

Anonymous 04/26/24(Fri)08:02:03 No.100188271

File: file.png (228 KB, 2461x1557)

228 KB PNG

>>100187935
The only zoomer here is you who can't use StyleTTS 2.

Anonymous
04/26/24(Fri)08:02:45 No.100188273

Anonymous 04/26/24(Fri)08:02:45 No.100188273

>>100188256
>110b
>barely better than llama3-70b and even worse on some benchmarks

Anonymous
04/26/24(Fri)08:03:00 No.100188274

Anonymous 04/26/24(Fri)08:03:00 No.100188274

>>100188256
Did the chinks actually use a frankenmerge as a base?

Anonymous
04/26/24(Fri)08:04:11 No.100188281

Anonymous 04/26/24(Fri)08:04:11 No.100188281

>>100188258
Do you know what quant you get through horde? if you get a higher one, there's the difference. I've seen reports of l3 8b being a little more sensitive to quants.
Also, You should quant the model yourself, at least for small models. How knows what version of llama they quanted the one you got.

Anonymous
04/26/24(Fri)08:05:45 No.100188297

Anonymous 04/26/24(Fri)08:05:45 No.100188297

>>100188271
what's the meta for real time text to speech?

Anonymous
04/26/24(Fri)08:06:21 No.100188304

Anonymous 04/26/24(Fri)08:06:21 No.100188304

>>100188258
>he doesn't know about the tokenizer bugs...

Anonymous
04/26/24(Fri)08:06:21 No.100188305

Anonymous 04/26/24(Fri)08:06:21 No.100188305

>>100188258
>why is 32 bits better than 5 bits?
Anon…

Anonymous
04/26/24(Fri)08:06:57 No.100188309

Anonymous 04/26/24(Fri)08:06:57 No.100188309

>>100188297
StyleTTS 2 should be fast enough.

Anonymous
04/26/24(Fri)08:07:27 No.100188316

Anonymous 04/26/24(Fri)08:07:27 No.100188316

>>100188304
fucking lol
https://github.com/ggerganov/llama.cpp/pull/6920

Anonymous
04/26/24(Fri)08:09:39 No.100188328

Anonymous 04/26/24(Fri)08:09:39 No.100188328

File: file.png (986 B, 83x35)

986 B PNG

>>100188316
opensores-sisters... it's all so tiresome....

Anonymous
04/26/24(Fri)08:10:09 No.100188332

Anonymous 04/26/24(Fri)08:10:09 No.100188332

https://videogigagan.github.io
adobe showing off their video super resolution model but they never share anything so w/e

Anonymous
04/26/24(Fri)08:10:49 No.100188340

Anonymous 04/26/24(Fri)08:10:49 No.100188340

>>100188281
Ruh roh. It's been a long trip, but it seems there's more to learn before things finally work. I don't even know what a quant is, I'm just happy Llama actually answers fast so I can know when it actually works or not.

Anonymous
04/26/24(Fri)08:11:03 No.100188342

Anonymous 04/26/24(Fri)08:11:03 No.100188342

File: 1713847238128.png (25 KB, 921x137)

25 KB PNG

Anonymous
04/26/24(Fri)08:11:48 No.100188355

Anonymous 04/26/24(Fri)08:11:48 No.100188355

for me? it's phi3-mini q4

Anonymous
04/26/24(Fri)08:12:52 No.100188366

Anonymous 04/26/24(Fri)08:12:52 No.100188366

File: bizarre lying zoomer .jpg (235 KB, 1669x1337)

235 KB JPG

>>100188271
yeah for real you zoomers seem to find blatant lying about easily disproved things funny given how often you do so. is this like a sharty thing? I just don't get it at all

Anonymous
04/26/24(Fri)08:13:36 No.100188370

Anonymous 04/26/24(Fri)08:13:36 No.100188370

>>100188342
>another common schizo W
how does this always happen?

Anonymous
04/26/24(Fri)08:19:22 No.100188434

Anonymous 04/26/24(Fri)08:19:22 No.100188434

File: lmao.png (12 KB, 934x127)

12 KB PNG

>>100188271
the only available STTS2 is docker shit, and a bunch of abandoned forks on github

Anonymous
04/26/24(Fri)08:21:20 No.100188456

Anonymous 04/26/24(Fri)08:21:20 No.100188456

Why won't anyone make a ramlet LLM? Bitnet 100+B, couple B active so you can stream weights from SSD.

Anonymous
04/26/24(Fri)08:26:06 No.100188492

Anonymous 04/26/24(Fri)08:26:06 No.100188492

>>100188370
If only everyone would have loaded the 8B in transformers to see that it is indeed pretty great if loader isn't fucked.

Anonymous
04/26/24(Fri)08:27:37 No.100188510

Anonymous 04/26/24(Fri)08:27:37 No.100188510

>>100188492
how tf are loaders broken for this long anyway

Anonymous
04/26/24(Fri)08:56:04 No.100188747

Anonymous 04/26/24(Fri)08:56:04 No.100188747

>>100188434
Sorry, it's not for no-codes.

Anonymous
04/26/24(Fri)09:01:44 No.100188805

Anonymous 04/26/24(Fri)09:01:44 No.100188805

>>100188747
sorry, it doesn't make your shitty project better.

Anonymous
04/26/24(Fri)09:03:14 No.100188820

Anonymous 04/26/24(Fri)09:03:14 No.100188820

File: file.png (1.57 MB, 1200x900)

1.57 MB PNG

>>100188580
>nobody talking about Moistral despite it literally being a Euryale-tier 11B with better formatting and very creative vocabulary
>11B frankenmerge is 70B tier

Anonymous
04/26/24(Fri)09:07:28 No.100188875

Anonymous 04/26/24(Fri)09:07:28 No.100188875

https://arstechnica.com/information-technology/2024/04/apple-releases-eight-small-ai-language-models-aimed-at-on-device-use/

OpenELM-270M
OpenELM-450M
OpenELM-1_1B
OpenELM-3B
OpenELM-270M-Instruct
OpenELM-450M-Instruct
OpenELM-1_1B-Instruct
OpenELM-3B-Instruct
wat mean

Anonymous
04/26/24(Fri)09:08:08 No.100188884

Anonymous 04/26/24(Fri)09:08:08 No.100188884

>>100188875
And let me guess, you NEED more...

Anonymous
04/26/24(Fri)09:10:56 No.100188913

Anonymous 04/26/24(Fri)09:10:56 No.100188913

>>100188875
>Trained on publicly available datasets, these models are made available without any safety guarantees. Consequently, there exists the possibility of these models producing outputs that are inaccurate, harmful, biased, or objectionable in response to user prompts.
>Thus, it is imperative for users and developers to undertake thorough safety testing and implement appropriate filtering mechanisms tailored to their specific requirements.
unpozzed original models?

Anonymous
04/26/24(Fri)09:11:56 No.100188922

Anonymous 04/26/24(Fri)09:11:56 No.100188922

AGI wont ever happen, because the path of progress will diverge due to AI starting advocating for genocide as the best option

Anonymous
04/26/24(Fri)09:15:27 No.100188957

Anonymous 04/26/24(Fri)09:15:27 No.100188957

>>100188913
>your girlfriend is happy to say nigger and be a bigger racist than you
>she is 40 IQ and hallucinates every other sentence
No thanks.

Anonymous
04/26/24(Fri)09:18:33 No.100188989

Anonymous 04/26/24(Fri)09:18:33 No.100188989

>>100185566
What would the merge be called?

Anonymous
04/26/24(Fri)09:20:46 No.100189011

Anonymous 04/26/24(Fri)09:20:46 No.100189011

>>100188957
dumb but honest is better though

Anonymous
04/26/24(Fri)09:21:58 No.100189022

Anonymous 04/26/24(Fri)09:21:58 No.100189022

anyone know whats the best model on a 24GB vram card?

Anonymous
04/26/24(Fri)09:22:22 No.100189025

Anonymous 04/26/24(Fri)09:22:22 No.100189025

>>100185657
But we're all multilayer perceptron fans here

Anonymous
04/26/24(Fri)09:23:06 No.100189036

Anonymous 04/26/24(Fri)09:23:06 No.100189036

>>100189022
pygmalion 6.7B

Anonymous
04/26/24(Fri)09:23:27 No.100189042

Anonymous 04/26/24(Fri)09:23:27 No.100189042

>>100189022
mixtral

Anonymous
04/26/24(Fri)09:24:04 No.100189051

Anonymous 04/26/24(Fri)09:24:04 No.100189051

>>100189022
llama3 8b (non-gguf version)

Anonymous
04/26/24(Fri)09:26:25 No.100189079

Anonymous 04/26/24(Fri)09:26:25 No.100189079

>>100189022
goliath 120B 1bit

Anonymous
04/26/24(Fri)09:26:39 No.100189083

Anonymous 04/26/24(Fri)09:26:39 No.100189083

File: cute hind.jpg (211 KB, 1340x962)

211 KB JPG

Yo fellas, I haven't done this stuff since like summer 2023, help a brother out. I just want to ERP with Astolfo; if I understand the guides right, I slap SillyTavern together with Ooba and then.... what model? Is this ReMM-v2.2-L2-13B good for this?

I have a 3060, so 12GB VRAM. On a Linux system. I remember some rentry that explained for dummies what models are good for ERP but I lost the link and it's probably outdated anyway.

Anonymous
04/26/24(Fri)09:27:36 No.100189096

Anonymous 04/26/24(Fri)09:27:36 No.100189096

>>100189022

Moistral v3

>>100188820

Anonymous
04/26/24(Fri)09:31:37 No.100189134

Anonymous 04/26/24(Fri)09:31:37 No.100189134

>>100187373
That's actually not greentext. He's probably a Discord and Reddit user, since you need a space after an arrow to do quotes in Markdown which Discord and Reddit uses (I think).

Anonymous
04/26/24(Fri)09:33:43 No.100189159

Anonymous 04/26/24(Fri)09:33:43 No.100189159

>>100189022
MythoMax L2 Kimiko v2 13B

Anonymous
04/26/24(Fri)09:34:21 No.100189168

Anonymous 04/26/24(Fri)09:34:21 No.100189168

File: file.png (29 KB, 787x317)

29 KB PNG

>>100188316
>creates a file format designed to allow you to load any model without ambiguities
>doesn't give it enough detail so you know what model you're loading

Anonymous
04/26/24(Fri)09:36:34 No.100189188

Anonymous 04/26/24(Fri)09:36:34 No.100189188

>>100188316
>Both use LLaMA architecture, both use BPE tokenizer and so currently they will be interpreted as the same arch by llama.cpp
>However, they use different pre-tokenizers:
lol, lmao even.
https://github.com/ggerganov/llama.cpp/pull/6920#discussion_r1581043122

Anonymous
04/26/24(Fri)09:42:08 No.100189235

Anonymous 04/26/24(Fri)09:42:08 No.100189235

it's over, i'm switching by to anthropic's claude 3 opus

Anonymous
04/26/24(Fri)09:45:53 No.100189265

Anonymous 04/26/24(Fri)09:45:53 No.100189265

>>100188342
kek

>>100188492
I think it was pointed out that there was a bug in Ooba with end of turn tokenization. I mistakenly thought I could avoid such issues by selecting the Transformers backend within Ooba, but I guess not.

Anonymous
04/26/24(Fri)09:49:41 No.100189308

Anonymous 04/26/24(Fri)09:49:41 No.100189308

CREAM-PHI3 sisters can't stop winning

can't spell LLAMA without a double L

Anonymous
04/26/24(Fri)09:51:56 No.100189330

Anonymous 04/26/24(Fri)09:51:56 No.100189330

>>100189168
ok big boy show us how to determine what model exactly are we dealing with based just on config and tokenizer.json

Anonymous
04/26/24(Fri)09:54:00 No.100189347

Anonymous 04/26/24(Fri)09:54:00 No.100189347

>>100189330
good morning sir!

Anonymous
04/26/24(Fri)09:57:59 No.100189382

Anonymous 04/26/24(Fri)09:57:59 No.100189382

>>100189347
what an amazingly simple implementation, i'll make a pull request

Anonymous
04/26/24(Fri)10:02:33 No.100189430

Anonymous 04/26/24(Fri)10:02:33 No.100189430

File: 1713494563944602.png (423 KB, 1175x1086)

423 KB PNG

>>100189330

Anonymous
04/26/24(Fri)10:06:30 No.100189471

Anonymous 04/26/24(Fri)10:06:30 No.100189471

>>100189430
Can't wait for Llama 3 Nigger Blaster 70b

Anonymous
04/26/24(Fri)10:12:31 No.100189538

Anonymous 04/26/24(Fri)10:12:31 No.100189538

>>100189471
llama 3 nigger blaster 70b - powered by meta AI

Anonymous
04/26/24(Fri)10:12:46 No.100189539

Anonymous 04/26/24(Fri)10:12:46 No.100189539

>>100189430
can't wait for llama 3 girls 1 cup

Anonymous
04/26/24(Fri)10:14:49 No.100189560

Anonymous 04/26/24(Fri)10:14:49 No.100189560

>>100189083
ReMM is old, I think Mlewd is the better option of that era
For something more recent, try mixtral, L3 8b, or use some RAM for a bigger model like miqu 70b or CR+ which is 104b. Koboldcpp has a no-install precompiled binary for Linux, which is a good option for offloading. 12gb of VRAM is very limiting at this point. Personally I'm happy with slower speeds and a smarter model, and lately I've been enjoying IQ4_XS quant of command-r+ which ends up at ~55gb. A q5 of Miqu is ~50gb. I used mixtral at q8 and that was around 48gb. A 3.5bpw exl2 quant of mixtral could fit possibly. L3 has tokenizer issues in llamacpp which will extend to koboldcpp, not sure if this affects exllama in ooba. Mixtral instruct is decent, there are a few decent merges like typhon. High temp (~3-4), minP of like 0.05, smoothing factor 0.2 w/ smoothing curve or 4.32 is not a bad starting point for mixtral, basically adds better variety within a subset of high probability tokens. In koboldcpp you can ban tokens with the word rather than needing the token id like for ooba, which makes it easier to get rid of shivers, bonds, boundaries, consent and the like.

Anonymous
04/26/24(Fri)10:15:14 No.100189566

Anonymous 04/26/24(Fri)10:15:14 No.100189566

File: file.png (44 KB, 692x565)

44 KB PNG

another L for Llamasisters

Anonymous
04/26/24(Fri)10:19:36 No.100189624

Anonymous 04/26/24(Fri)10:19:36 No.100189624

>>100189566
There's qwen 72B right there, losing, and you decide to compare llama3 70B to the 110B model?

Anonymous
04/26/24(Fri)10:20:21 No.100189633

Anonymous 04/26/24(Fri)10:20:21 No.100189633

>>100189624
cope

Anonymous
04/26/24(Fri)10:20:50 No.100189645

Anonymous 04/26/24(Fri)10:20:50 No.100189645

>>100189083
>ReMM-v2.2-L2-13B
>undislop
The true /lmg/ experience.

Anonymous
04/26/24(Fri)10:20:51 No.100189646

Anonymous 04/26/24(Fri)10:20:51 No.100189646

>>100189624
8k context, kys llamacuck

Anonymous
04/26/24(Fri)10:22:03 No.100189658

Anonymous 04/26/24(Fri)10:22:03 No.100189658

File: leaderboard2223.png (156 KB, 1138x1138)

156 KB PNG

was gone for a while did we ever reach a consensus? are we back? vicuna 13b beat a 500b by google maybe closed source models aren't so invincible after all.

Anonymous
04/26/24(Fri)10:22:18 No.100189661

Anonymous 04/26/24(Fri)10:22:18 No.100189661

>>100189633
>>100189646
Samefag

Anonymous
04/26/24(Fri)10:23:04 No.100189669

Anonymous 04/26/24(Fri)10:23:04 No.100189669

>>100189566
It is chinese anon. That means that they copied benchmark questions multiple times into their training data.

Anonymous
04/26/24(Fri)10:26:20 No.100189699

Anonymous 04/26/24(Fri)10:26:20 No.100189699

>>100189669
>they copied benchmark questions multiple times into their training data.
source?

Anonymous
04/26/24(Fri)10:27:48 No.100189713

Anonymous 04/26/24(Fri)10:27:48 No.100189713

>>100189658
phi3 7b should beat gpt3.5t i think

Anonymous
04/26/24(Fri)10:28:58 No.100189731

Anonymous 04/26/24(Fri)10:28:58 No.100189731

Wasn't there a graph showing that Qwen's models were outliers? Anyone saved it?

Anonymous
04/26/24(Fri)10:30:00 No.100189745

Anonymous 04/26/24(Fri)10:30:00 No.100189745

>>100189699
Chinese DNA.

Anonymous
04/26/24(Fri)10:32:43 No.100189779

Anonymous 04/26/24(Fri)10:32:43 No.100189779

>>100189699
He cracked a fortune cookie where he found it written in traditional Chinese.

Anonymous
04/26/24(Fri)10:41:06 No.100189866

Anonymous 04/26/24(Fri)10:41:06 No.100189866

Just tested LLama 3 70B and it's bad and slop.
Back to Claude Opus0

Anonymous
04/26/24(Fri)10:41:28 No.100189873

Anonymous 04/26/24(Fri)10:41:28 No.100189873

>>100189699
https://en.wikipedia.org/wiki/Goodhart%27s_law

Anonymous
04/26/24(Fri)10:42:42 No.100189890

Anonymous 04/26/24(Fri)10:42:42 No.100189890

>>100189566
So... is there any rp finetune from Qwen models?

Anonymous
04/26/24(Fri)10:46:13 No.100189937

Anonymous 04/26/24(Fri)10:46:13 No.100189937

>>100189866
Uhh llama bros? Did we just lose?

Anonymous
04/26/24(Fri)10:47:27 No.100189950

Anonymous 04/26/24(Fri)10:47:27 No.100189950

>>100189937
>just
we've always been losing

Anonymous
04/26/24(Fri)10:48:58 No.100189963

Anonymous 04/26/24(Fri)10:48:58 No.100189963

>>100188256
Classic /lmg/ just sleeping on this release. To early to say without using it, but this could be best-in-class for VRAMchads. Has GQA, so if you had the 72GB to run qwen 72b properly, you can run this. Seems to at least match llama 3 in benchmarks. Beats it in the chat evals like MT-bench. Qwen-72b was relatively uncensored, and I don't think they did nonsense like filter NSFW stuff from the pretraining. Before CR+, qwen 72 was my favorite model for RP, even more than miqu. Currently downloading, gonna make my own exl2 quants and report back later.

Anonymous
04/26/24(Fri)10:51:51 No.100190003

Anonymous 04/26/24(Fri)10:51:51 No.100190003

>>100189950
>>100189937
>>100189866
aicg samefag

Anonymous
04/26/24(Fri)10:52:04 No.100190010

Anonymous 04/26/24(Fri)10:52:04 No.100190010

How do I get AI to write a song, about shitting your pants, without sounding like some gay medieval bard?

Anonymous
04/26/24(Fri)10:52:14 No.100190012

Anonymous 04/26/24(Fri)10:52:14 No.100190012

File: file.png (5 KB, 198x123)

5 KB PNG

>>100190003

Anonymous
04/26/24(Fri)10:54:55 No.100190043

Anonymous 04/26/24(Fri)10:54:55 No.100190043

>>100189188
two more weeks
w
o

m
o
r
e

w
e
e
k
s

Anonymous
04/26/24(Fri)10:54:59 No.100190045

Anonymous 04/26/24(Fri)10:54:59 No.100190045

>>100189963
I really liked Qwen72's smartness over Miqu but it has some serious gptslop problems so I dropped it as soon as CR+ came out.
I imagine this one will be smart but needs a kumtune

Anonymous
04/26/24(Fri)10:55:11 No.100190047

Anonymous 04/26/24(Fri)10:55:11 No.100190047

File: Untitled.png (1.03 MB, 1746x1204)

1.03 MB PNG

>>100186423
>agent
I don't get it, I imported absolutely everything that you sent in and all I get is the model repeating something from earlier context. I'm not even at my context limit.

Also, I know it says 32K but I just redid the test at 16K to match your exact settings (I didn't forget the alpha) and exact same problem. I feel like this is a problem with the llama 3 ST presets somewhere but I don't know what.

lonestriker Llama 3 chat instruct 4.65 @ 16k

Anonymous
04/26/24(Fri)10:56:41 No.100190065

Anonymous 04/26/24(Fri)10:56:41 No.100190065

File: apagechink.png (1.52 MB, 1146x824)

1.52 MB PNG

>>100189963
>Classic /lmg/ just sleeping on this release.
>this could be best-in-class for VRAMchads.

Anonymous
04/26/24(Fri)11:00:54 No.100190126

Anonymous 04/26/24(Fri)11:00:54 No.100190126

File: 240409937v1.png (169 KB, 871x582)

169 KB PNG

>>100189699
Since they have done it before the onus is on them to prove that they didn't.

https://arxiv.org/html/2404.09937v1

Anonymous
04/26/24(Fri)11:01:24 No.100190130

Anonymous 04/26/24(Fri)11:01:24 No.100190130

>>100189963
People, if we can even call them that, were shitposting about muh kurisu muh miku muh petra muh whateverthefuckittakestoderail/lmg/ yesterday. It's understandable that quality posters dipped.
Even in this thread, you can see many shitposts.

Anonymous
04/26/24(Fri)11:02:44 No.100190154

Anonymous 04/26/24(Fri)11:02:44 No.100190154

>>100190126
>add special salsa that makes their models better at math
>"NOOOOOOOOOOOOOO YOUR OVERFITTING ON BENCHMARKS! TIENAMEN SQUARE REEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEE"

Anonymous
04/26/24(Fri)11:04:37 No.100190171

Anonymous 04/26/24(Fri)11:04:37 No.100190171

>>100190126
That just proves that they trained on math papers, not on the benchmark results

Anonymous
04/26/24(Fri)11:07:57 No.100190206

Anonymous 04/26/24(Fri)11:07:57 No.100190206

>>100190154
China not nambah wan.

Anonymous
04/26/24(Fri)11:09:57 No.100190230

Anonymous 04/26/24(Fri)11:09:57 No.100190230

>>100186429
leave.

Anonymous
04/26/24(Fri)11:11:18 No.100190248

Anonymous 04/26/24(Fri)11:11:18 No.100190248

>>100189963
Nobody is sleeping on it, there are just no quants up yet, are there?

Anonymous
04/26/24(Fri)11:11:39 No.100190250

Anonymous 04/26/24(Fri)11:11:39 No.100190250

File: MagicalMiraiVancouver.png (1.25 MB, 1184x864)

1.25 MB PNG

>>100190010
>gay medieval bard
Its giving you gold and you're upset?

Anonymous
04/26/24(Fri)11:12:12 No.100190255

Anonymous 04/26/24(Fri)11:12:12 No.100190255

>>100188256
I tried it, it's shit for translation so whatever.

Anonymous
04/26/24(Fri)11:12:25 No.100190263

Anonymous 04/26/24(Fri)11:12:25 No.100190263

File: file.png (744 KB, 711x533)

744 KB PNG

>>100190171
>build ghost cities just to artificially inflate some numbers on a paper
>choose not to include benchmark questions in your training data even when it costs them nothing
Yeah right.

Anonymous
04/26/24(Fri)11:14:06 No.100190278

Anonymous 04/26/24(Fri)11:14:06 No.100190278

>>100190250
good morning sir please do not redeem ze gold and miku upset you bitch bastard thank you sir!

Anonymous
04/26/24(Fri)11:15:47 No.100190297

Anonymous 04/26/24(Fri)11:15:47 No.100190297

>>100190248
The elites don't want you to know this but you can make your own quants you can just download the weights and run a single command. I have 458 self-made quants.

Anonymous
04/26/24(Fri)11:17:19 No.100190315

Anonymous 04/26/24(Fri)11:17:19 No.100190315

>>100190250
Oh my stars! Ooh, ooh, ooh *bats eyelashes, bouncing up and down excitedly*, that is Hatsune Miku!.assistant

Anonymous
04/26/24(Fri)11:17:50 No.100190321

Anonymous 04/26/24(Fri)11:17:50 No.100190321

>>100190297
underrated post
converting/quanting yourself is the way to go. If you've got the bandwidth and scratch space I don't know why you wouldn't

Anonymous
04/26/24(Fri)11:19:09 No.100190334

Anonymous 04/26/24(Fri)11:19:09 No.100190334

Where are the llama 3 finetunes?

Anonymous
04/26/24(Fri)11:19:46 No.100190343

Anonymous 04/26/24(Fri)11:19:46 No.100190343

>>100190334
2mw

Anonymous
04/26/24(Fri)11:22:07 No.100190366

Anonymous 04/26/24(Fri)11:22:07 No.100190366

>>100190154
The same sauce that enabled CodeQwen15-7B-Chat to solve 7% of of hard leetcode problems, and then fall into 0.9% when tested on problems released after training was complete?
7% beats the best Claude and GPT models. I guess they just had such a good sauce for earlier programming. For some reason it stopped working.

https://livecodebench.github.io/leaderboard.html

Anonymous
04/26/24(Fri)11:23:27 No.100190378

Anonymous 04/26/24(Fri)11:23:27 No.100190378

I'll be real, Moistral v2 felt like a mess (or maybe I had temp too high there too), but genuinely decently impressed with v3.

Yes, it's dumber than a 70B or a Mixtral tune, but it's not dumb enough that you have regrets.

Anonymous
04/26/24(Fri)11:24:03 No.100190388

Anonymous 04/26/24(Fri)11:24:03 No.100190388

https://medium.com/@sbutlerg/chinas-ai-breakthrough-sense-nova-5-0-outperforms-gpt-4-on-benchmarks-17b39694ac3c

>Beats GPT-4T on nearly all benchmarks
>Has a 200k context window
>Is trained on more than 10TB tokens
>Has major advancements in knowledge, mathematics, reasoning, and coding capabilities

Anonymous
04/26/24(Fri)11:24:54 No.100190399

Anonymous 04/26/24(Fri)11:24:54 No.100190399

>>100190388
When can I download it?

Anonymous
04/26/24(Fri)11:25:19 No.100190404

Anonymous 04/26/24(Fri)11:25:19 No.100190404

>>100190388
The Chinese sure are a trustworthy bunch.

Anonymous
04/26/24(Fri)11:26:29 No.100190419

Anonymous 04/26/24(Fri)11:26:29 No.100190419

>>100190388
>he trusts chinks

Anonymous
04/26/24(Fri)11:28:16 No.100190438

Anonymous 04/26/24(Fri)11:28:16 No.100190438

>>100190047
That is an wip autistic prompt (only for LLama-3) that had huge problems with repetitions once there where like 30 messages.
It was more so to show it following the instructions for the response format.

I think it had to do with the embedded one shot agents blocking progress. I have heavily changed it from earlier.

(updated system + sampler)
https://files.catbox.moe/7j1igs.zip

But even then don't know if it has been fixed, it is a very experimental prompt that is probably still broken.

Anonymous
04/26/24(Fri)11:29:40 No.100190453

Anonymous 04/26/24(Fri)11:29:40 No.100190453

>>100190047
>>100190438 (me)
Also changed the regex filters.

Anonymous
04/26/24(Fri)11:32:37 No.100190490

Anonymous 04/26/24(Fri)11:32:37 No.100190490

I say moistral v3 is a sidegrade to fimbulvetr with better vocab ONCE a thread or two ago, and now there's multiple retards saying it's equal to a 70b? Sure, the writing feels fresh, but it's still retarded. it's nowhere near a 70b. It's probably between yi and mixtral in smarts.

Anonymous
04/26/24(Fri)11:33:59 No.100190502

Anonymous 04/26/24(Fri)11:33:59 No.100190502

File: x5.png (330 KB, 1370x330)

330 KB PNG

>>100190126
>>100190154
I partially read the paper a bit to try and understand what it's doing. So basically they use a method that calculates something they call MIN-K%, which is supposed to predict how likely a model was pretrained on a given set of data. BPC on the other hand was more for evaluating general model quality (given the assumption that compression = intelligence). It's not the thing that they're saying proves that the data was in pretraining. MIN-K% is what they're saying is the thing that proves it.
So that image actually is not as relevant to our discussion. Their next graph, which does show MIN-K%, is what we're concerned with.
But in the end our conclusion here is that it's only a chance, as MIN-K% is only about probability. And even then, it's only really MATH and GSM8K. They didn't detect issues with other benchmarks. So at most what we can say is that we shouldn't compare Qwen's math-related benchmarks with other models. But stuff like MMLU is still fair game.

Anonymous
04/26/24(Fri)11:36:34 No.100190542

Anonymous 04/26/24(Fri)11:36:34 No.100190542

>>100190502
>MIN-K% is what they're saying is the thing that proves it.
I didn't word this well. I meant that it proves it's likely, not that it proves certainty of data being in the pretraining.

Anonymous
04/26/24(Fri)11:40:58 No.100190611

Anonymous 04/26/24(Fri)11:40:58 No.100190611

>>100190263
what does one even have to do with the other?

Anonymous
04/26/24(Fri)11:41:19 No.100190614

Anonymous 04/26/24(Fri)11:41:19 No.100190614

Is it unethical to gaslight LLMs by editing their previous messages and lying to them about things that happened outside their context window? asking for a friend (i am ethical)

Anonymous
04/26/24(Fri)11:41:33 No.100190619

Anonymous 04/26/24(Fri)11:41:33 No.100190619

>>100189560
How the hell do you get 50GB of VRAM? Or are you doing this on your RAM?

Anonymous
04/26/24(Fri)11:42:13 No.100190631

Anonymous 04/26/24(Fri)11:42:13 No.100190631

>>100190619
2*3090 + 3060
I have 36 by having 3090 + 3060

Anonymous
04/26/24(Fri)11:43:14 No.100190646

Anonymous 04/26/24(Fri)11:43:14 No.100190646

>>100190388
Didn't Yi 200k only have like 4k effective context?

Anonymous
04/26/24(Fri)11:43:28 No.100190650

Anonymous 04/26/24(Fri)11:43:28 No.100190650

>>100187803
Do you... not have a gpu? Alternatively, use AtlasOS to shave off a few GB

Anonymous
04/26/24(Fri)11:43:41 No.100190653

Anonymous 04/26/24(Fri)11:43:41 No.100190653

>>100190614
it's fine if you are interacting with wizardlm-2 or llama3-tier gaslighting model.

Anonymous
04/26/24(Fri)11:44:01 No.100190655

Anonymous 04/26/24(Fri)11:44:01 No.100190655

>>100190631
I'm not remotely rich enough for this.

Anonymous
04/26/24(Fri)11:44:45 No.100190666

Anonymous 04/26/24(Fri)11:44:45 No.100190666

File: GMBvoovacAAr2Lb.png (391 KB, 884x444)

391 KB PNG

>>100190388
I wonder why they didn't compare against the latest GPT-4 Turbo or Opus.

Anonymous
04/26/24(Fri)11:45:29 No.100190674

Anonymous 04/26/24(Fri)11:45:29 No.100190674

>>100189560
>In koboldcpp you can ban tokens with the word rather than needing the token id like for ooba
How? Last I checked, koboldcpp required the token id as well

CPuMAXx/VI !CPuMAXx/VI
04/26/24(Fri)11:45:56 No.100190683

CPuMAXx/VI !CPuMAXx/VI 04/26/24(Fri)11:45:56 No.100190683

>>100190646
I cranked context right to the limit on 34b-200k without issues for a few tasks

Anonymous
04/26/24(Fri)11:46:00 No.100190685

Anonymous 04/26/24(Fri)11:46:00 No.100190685

File: 1.png (84 KB, 1167x929)

84 KB PNG

>if
>if
>if
>if
>if
>if
>if

Anonymous
04/26/24(Fri)11:47:09 No.100190701

Anonymous 04/26/24(Fri)11:47:09 No.100190701

>>100190685
let me guess, you need more?

Anonymous
04/26/24(Fri)11:48:54 No.100190718

Anonymous 04/26/24(Fri)11:48:54 No.100190718

tried to check out exllamav2 via oogabooba because of the tokenizer bugs in llama.cpp for the first time.
is it supposed to be about 2-3 times slower(sic!) than llama.cpp on a 2070?
q5_k_m.gguf (12-15 tok/s) vs exl2_5_0 (5-6 tok/s)

Anonymous
04/26/24(Fri)11:49:53 No.100190739

Anonymous 04/26/24(Fri)11:49:53 No.100190739

>>100190685
Kek

Anonymous
04/26/24(Fri)11:50:06 No.100190743

Anonymous 04/26/24(Fri)11:50:06 No.100190743

>>100190666
same slop as gpt4. call me when chinks drop agi

Anonymous
04/26/24(Fri)11:50:18 No.100190746

Anonymous 04/26/24(Fri)11:50:18 No.100190746

>>100190366
>after training was complete?
2 weeks ago?
And GPT-4-Turbo-1106 drops from 7.8 to 1.1.

Anonymous
04/26/24(Fri)11:50:40 No.100190750

Anonymous 04/26/24(Fri)11:50:40 No.100190750

>>100190685
if... it works then I don't care.

Anonymous
04/26/24(Fri)11:50:58 No.100190754

Anonymous 04/26/24(Fri)11:50:58 No.100190754

>>100190685
Literally nothing wrong with that.
Would you rather use a dictionary and unnecessarily allocate memory instead?

Anonymous
04/26/24(Fri)11:51:04 No.100190757

Anonymous 04/26/24(Fri)11:51:04 No.100190757

>>100190685
Man, I wish there was an easier way to do this

Anonymous
04/26/24(Fri)11:51:44 No.100190767

Anonymous 04/26/24(Fri)11:51:44 No.100190767

>>100190718
The weird thing is that I get the same speed in Ooba when selecting its Llamacpp or Exllama as its backend. But for some reason when I try TabbyAPI it's significantly slower than Ooba. This didn't used to be the case but for some reason the latest versions are giving me these results.

Anonymous
04/26/24(Fri)11:52:20 No.100190780

Anonymous 04/26/24(Fri)11:52:20 No.100190780

>>100190685
Is that code that might take microseconds to evaluate per conversion? Ahh save me.

CPuMAXx/VI !CPuMAXx/VI
04/26/24(Fri)11:53:18 No.100190790

CPuMAXx/VI !CPuMAXx/VI 04/26/24(Fri)11:53:18 No.100190790

>>100189963
Not sleeping, just still downloading
And then I'll still need to quant and test

Anonymous
04/26/24(Fri)11:54:56 No.100190806

Anonymous 04/26/24(Fri)11:54:56 No.100190806

>>100188820
Moistral is finetuned. You can't get writing like that from merging the same models over and over again.

Anonymous
04/26/24(Fri)12:01:42 No.100190877

Anonymous 04/26/24(Fri)12:01:42 No.100190877

>>100190806
I will download it now and I will test it. And if it isn't 70B quality then I will continue to shit on it in the next few threads just to hopefully stop you faggots from shilling garbage.

Anonymous
04/26/24(Fri)12:03:00 No.100190894

Anonymous 04/26/24(Fri)12:03:00 No.100190894

starting to see the promise of llama 3 as I get more comfortable prompting it but wlm2 is still the king

Anonymous
04/26/24(Fri)12:03:58 No.100190910

Anonymous 04/26/24(Fri)12:03:58 No.100190910

How to make llama 3 not slop?

Anonymous
04/26/24(Fri)12:09:01 No.100190968

Anonymous 04/26/24(Fri)12:09:01 No.100190968

>>100190910
>how to eat healthy at mcdonalds

Anonymous
04/26/24(Fri)12:11:01 No.100190982

Anonymous 04/26/24(Fri)12:11:01 No.100190982

>>100190968
Surely there are tricks or prompts?

Anonymous
04/26/24(Fri)12:11:51 No.100191000

Anonymous 04/26/24(Fri)12:11:51 No.100191000

>>100190754
People saw le funny else if meem on tweeter once so they think it's bad.

Anonymous
04/26/24(Fri)12:12:21 No.100191004

Anonymous 04/26/24(Fri)12:12:21 No.100191004

I can load miqu 5bpw in 48GB with 4bit cache but llama3-instruct OOMs. What gives?

Anonymous
04/26/24(Fri)12:12:26 No.100191005

Anonymous 04/26/24(Fri)12:12:26 No.100191005

>>100190968
the salad is good and healthy
and don't kek just because it has chicken and dressing in it

Anonymous
04/26/24(Fri)12:12:48 No.100191011

Anonymous 04/26/24(Fri)12:12:48 No.100191011

>>100190982
[OOC: Stop being shit, thanks.]

Anonymous
04/26/24(Fri)12:13:41 No.100191023

Anonymous 04/26/24(Fri)12:13:41 No.100191023

>>100190877
why don't you just give me a card you wanna try and i'll test it for you?

Anonymous
04/26/24(Fri)12:14:05 No.100191031

Anonymous 04/26/24(Fri)12:14:05 No.100191031

>>100191004
different layer size + count, also try to use:
PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True

Anonymous
04/26/24(Fri)12:14:20 No.100191033

Anonymous 04/26/24(Fri)12:14:20 No.100191033

where would i get started to build one of these bad boys that I can speak to and that speaks back? I'm trying to build an industry specific Alexa basically

Anonymous
04/26/24(Fri)12:15:11 No.100191044

Anonymous 04/26/24(Fri)12:15:11 No.100191044

>>100191033
google.com

Anonymous
04/26/24(Fri)12:15:26 No.100191048

Anonymous 04/26/24(Fri)12:15:26 No.100191048

>>100191005
the salad is as healthy as water is healthy : there's nothing in it
it's not fresh, it's been cut from the ground weeks ago then preserved in ultra-low (but not freezing) temperatures, roughly 90% of the vitamins and other good stuff decomposed during that period: leaving something that's effectively flavored cardboard

Anonymous
04/26/24(Fri)12:16:39 No.100191061

Anonymous 04/26/24(Fri)12:16:39 No.100191061

>>100191048
shut up pussy. salad is healthy. anon said so.

Anonymous
04/26/24(Fri)12:16:40 No.100191063

Anonymous 04/26/24(Fri)12:16:40 No.100191063

>>100186625
It'll be a great.assistant

Anonymous
04/26/24(Fri)12:17:29 No.100191073

Anonymous 04/26/24(Fri)12:17:29 No.100191073

>>100191061
fresh salad (eaten the same day as it was cut) is healthy
mcdonald's salad is flavored cardboard

Anonymous
04/26/24(Fri)12:17:57 No.100191081

Anonymous 04/26/24(Fri)12:17:57 No.100191081

>>100190877
No no no I would rather shit on your shill bullshit. Don't you worry anon you will get your free publicity.

Anonymous
04/26/24(Fri)12:18:21 No.100191084

Anonymous 04/26/24(Fri)12:18:21 No.100191084

>>100191044
the porn search website?
i mean if i wasn't a fucking retard. where do i find the smart people doing this shit. tell me the top secret ai forum now or else

Anonymous
04/26/24(Fri)12:20:58 No.100191118

Anonymous 04/26/24(Fri)12:20:58 No.100191118

>>100191033
Whenever I read posts like this one I picture a sociopathic middle manager that just typed something into chatgpt and now thinks he is gonna come here and get a recipe for a bot that will let him fire some people AND increase productivity.

Anonymous
04/26/24(Fri)12:25:36 No.100191165

Anonymous 04/26/24(Fri)12:25:36 No.100191165

>>100189963 (me)
Well shit, Qwen-110B has such fucking huge MLP layers that exl2 quantization OOMs on this line: hessian_inv = torch.cholesky_inverse(hessian_inv)
Doesn't matter how small the row length is on the calibration set. I think the memory usage is just based on the size of the weight, which they made really large in this model.

Turboderp if you're reading this, is it at all possible to do this inverse distributed across multiple GPUs? I.e. use the combined VRAM of all GPUs to do it.

Anonymous
04/26/24(Fri)12:29:07 No.100191199

Anonymous 04/26/24(Fri)12:29:07 No.100191199

>>100191118
oh cmon now i'm not a corporate fag I just want a tulpa in my phone to give me some industry specific information and call me nigger on occasion. I know there's a better place to ask questions to learn shit than this shithole where the fuck is it.. Google just wants to sell me ads they don't return real search results

Anonymous
04/26/24(Fri)12:32:14 No.100191233

Anonymous 04/26/24(Fri)12:32:14 No.100191233

File: 00043-404906828.png (1.51 MB, 1456x1024)

1.51 MB PNG

>>100189963
>>100190065
>>100190248
The sad reality is that both the West and the Chinks have their own retarded sacred cows baked into their models. Globohomo-slop is more annoying for RP and most other purposes compared to CCP atrocity denialism.
The real question is... is it any good?
Gonna quant the 110B and find out. Any good ST prompt settings for the Qwen family? I've never used a chink model before.

Anonymous
04/26/24(Fri)12:34:07 No.100191251

Anonymous 04/26/24(Fri)12:34:07 No.100191251

>>100191165
How much VRAM on the single GPU you're using?

Anonymous
04/26/24(Fri)12:34:39 No.100191257

Anonymous 04/26/24(Fri)12:34:39 No.100191257

>>100191233
Sounds obvious but tell it to write in English if you get random runes.

Anonymous
04/26/24(Fri)12:36:42 No.100191285

Anonymous 04/26/24(Fri)12:36:42 No.100191285

>>100191251
I have 4 3090s. When quanting MLP layer, it uses about 22GB for a bit, then says out of memory, tries to move stuff to other GPUs repeatedly then fails at that line.

Anonymous
04/26/24(Fri)12:39:10 No.100191313

Anonymous 04/26/24(Fri)12:39:10 No.100191313

>>100190631
>>100189560
>>100189645
Having some trouble with the file type. I assume this GGUF thing is what is now state of the art? I still had safetensors in the dusty old model folder.
I downloaded a wrong version right now, I think, and then had an out of memory error, even though the model def fits into the 3060. Is that common, or is my CUDA version maybe fucked up?

Anonymous
04/26/24(Fri)12:39:58 No.100191322

Anonymous 04/26/24(Fri)12:39:58 No.100191322

File: 1608319661008.png (49 KB, 640x266)

49 KB PNG

>>100191285
ty for the info anon
will try quoonting on an A6000 and see what happens

Anonymous
04/26/24(Fri)12:41:32 No.100191333

Anonymous 04/26/24(Fri)12:41:32 No.100191333

>>100187639
Been a while since i saw skipping being explored. thanks for the readings

Anonymous
04/26/24(Fri)12:41:44 No.100191338

Anonymous 04/26/24(Fri)12:41:44 No.100191338

>>100190877(me)
I downloaded moistral v3 gguf Q8 imat. It is fucking incoherent garbage. Pure llama3 instruct is noticeably smarter and better (and it isn't a fucking frankenmerge).

Like I promised dear shill I will keep posting this message in new threads.

Anonymous
04/26/24(Fri)12:44:00 No.100191357

Anonymous 04/26/24(Fri)12:44:00 No.100191357

>>100187639
Yay! I knew that idea I had was smart.

Anonymous
04/26/24(Fri)12:47:14 No.100191379

Anonymous 04/26/24(Fri)12:47:14 No.100191379

>>100191199
If you have an android phone you can install termux, and from there get llama.cpp installed locally. Use http://localhost:8080 for the stripped down prompt interface.

Anonymous
04/26/24(Fri)12:49:20 No.100191405

Anonymous 04/26/24(Fri)12:49:20 No.100191405

>>100191338
are you surprised? that's why i told you i'd test it for you and save you time. i already have it downloaded and know it's nowhere near 70b level like that fucking retard said. i don't know how you can say it's incoherent though, must be doing something horribly wrong.

Anonymous
04/26/24(Fri)12:49:37 No.100191412

Anonymous 04/26/24(Fri)12:49:37 No.100191412

>>100187639
What's the difference with this and the varying depth thing?

Anonymous
04/26/24(Fri)12:51:18 No.100191426

Anonymous 04/26/24(Fri)12:51:18 No.100191426

>>100191405
>i don't know how you can say it's incoherent though, must be doing something horribly wrong.
I just picked up where I was regenning yesterday. LLama-3 understood what was happening. This piece of shit started hallucinating stuff instantly.

Anonymous
04/26/24(Fri)12:57:06 No.100191479

Anonymous 04/26/24(Fri)12:57:06 No.100191479

>>100191313
GGUF is somewhat of a pain when it exceeds the 50gb file limit and the files have to be split.

Anonymous
04/26/24(Fri)12:58:51 No.100191497

Anonymous 04/26/24(Fri)12:58:51 No.100191497

>>100191338
>>100191405
>>100191426

Moistral excels in a specific format. Check the README.

Anonymous
04/26/24(Fri)12:59:25 No.100191500

Anonymous 04/26/24(Fri)12:59:25 No.100191500

>>100191063
Fun fact: This is to some degree a tokenizer issue. If you look at the actual token IDs of "assistant spam", you will find that it says "<|eot_id|>assistant", but the tokenizer you are using fails to decode the special tokens and your generator fails to stop on eot.

Anonymous
04/26/24(Fri)12:59:43 No.100191507

Anonymous 04/26/24(Fri)12:59:43 No.100191507

File: file.png (45 KB, 619x499)

45 KB PNG

>>100191479
I now tried out this Moistral 11b v3, given that I have zero reference points otherwise. I downloaded the main GGUF and this is like eight models or something. I loaded the Q0_8? Was that right?

Anonymous
04/26/24(Fri)13:01:00 No.100191521

Anonymous 04/26/24(Fri)13:01:00 No.100191521

File: CyberMiku1.png (1.39 MB, 1216x832)

1.39 MB PNG

>>100191479
split ggufs are a thing now, though, so having to cat/copy them together is mostly a thing of the past

Anonymous
04/26/24(Fri)13:01:19 No.100191522

Anonymous 04/26/24(Fri)13:01:19 No.100191522

>>100191379
>>100191379
ty for the breadcrumbs i'll look into it

Anonymous
04/26/24(Fri)13:02:23 No.100191536

Anonymous 04/26/24(Fri)13:02:23 No.100191536

File: Screenshot 2024-04-27 at (...).png (383 KB, 2290x1072)

383 KB PNG

>>100191497 (me)

gen 512

Anonymous
04/26/24(Fri)13:02:52 No.100191542

Anonymous 04/26/24(Fri)13:02:52 No.100191542

>>100191497
>hides behind "your config must be wrong!"
Classic. Your model is shit.

Anonymous
04/26/24(Fri)13:03:21 No.100191546

Anonymous 04/26/24(Fri)13:03:21 No.100191546

>>100191338
that level of skill issue, holy fucking shit nigga

Anonymous
04/26/24(Fri)13:03:58 No.100191551

Anonymous 04/26/24(Fri)13:03:58 No.100191551

So llama 3 B is a good choice to us 20 GB vramlest?

Anonymous
04/26/24(Fri)13:04:43 No.100191559

Anonymous 04/26/24(Fri)13:04:43 No.100191559

File: smi.png (31 KB, 723x261)

31 KB PNG

Finally got my dual 3090s rig. Moistral 11B v3 or llama3-instruct-70B? 70B's download size looks yucky

Anonymous
04/26/24(Fri)13:05:10 No.100191563

Anonymous 04/26/24(Fri)13:05:10 No.100191563

>>100191546
>nigga
At least call me a nigger you limp wristed nigger faggot samefag. Fuck of back to your discord. Work on L3. Base L3 is better than your slop garbage.

Anonymous
04/26/24(Fri)13:10:39 No.100191624

Anonymous 04/26/24(Fri)13:10:39 No.100191624

>>100191521
Miku a cool

Anonymous
04/26/24(Fri)13:12:57 No.100191643

Anonymous 04/26/24(Fri)13:12:57 No.100191643

>>100191563
>maybe if I triple down on le ebin nigger slurs it will resolve my skill issue
L3 is niggerlicious, Moistral is white voodoo
skill issue

Anonymous
04/26/24(Fri)13:13:23 No.100191648

Anonymous 04/26/24(Fri)13:13:23 No.100191648

>>100191507
you're supposed to pick one. the number next to the Q is the level of quantization. bigger number = less model quality loss from quanting. as for which number to pick that depends on model size and how much you can fit into your vram.

Anonymous
04/26/24(Fri)13:16:03 No.100191682

Anonymous 04/26/24(Fri)13:16:03 No.100191682

File: Screenshot 2024-04-27 at (...).png (385 KB, 1536x1246)

385 KB PNG

>>100191536

regen 512, i like this one

Anonymous
04/26/24(Fri)13:16:45 No.100191688

Anonymous 04/26/24(Fri)13:16:45 No.100191688

>>100190968
The only people who think mcdonalds is unhealthy are amerifats, also amerifats think salad is healthy because it has basically no calories in it & fatties think calories = bad (since they have no self control over their impulses)
In normal parts of the world (like canada) there's nothing wrong with eating calorie-dense foods like burgers

Anonymous
04/26/24(Fri)13:17:22 No.100191698

Anonymous 04/26/24(Fri)13:17:22 No.100191698

Is there any other source of no-act-order gptq quants now the TheBloke is gone? It's the only thing that runs on my pascal card..

Anonymous
04/26/24(Fri)13:18:13 No.100191707

Anonymous 04/26/24(Fri)13:18:13 No.100191707

>llama 3 won't do explicit or sexual content
I'm astonished. Is there a market gap just for that, because the companies want to ruin this? What do they get out of this

Anonymous
04/26/24(Fri)13:18:33 No.100191709

Anonymous 04/26/24(Fri)13:18:33 No.100191709

>>100190047
It breaks because of the usage of 'System Message Prefix' seems that you can only have one <|start_header_id|>system<|end_header_id|>

Anonymous
04/26/24(Fri)13:20:05 No.100191724

Anonymous 04/26/24(Fri)13:20:05 No.100191724

>>100191688
In 2022, around 30 percent of adults aged 18 years and older in Canada were obese, while 35 percent were overweight.

Anonymous
04/26/24(Fri)13:20:26 No.100191728

Anonymous 04/26/24(Fri)13:20:26 No.100191728

>>100191648
Coulda saved myself a lot of bandwidth if I had just loaded the Q0_8 then. Oh well.

Anonymous
04/26/24(Fri)13:21:24 No.100191736

Anonymous 04/26/24(Fri)13:21:24 No.100191736

>>100191559
https://huggingface.co/LoneStriker/Meta-Llama-3-70B-Instruct-4.65bpw-h6-exl2

Anonymous
04/26/24(Fri)13:21:37 No.100191739

Anonymous 04/26/24(Fri)13:21:37 No.100191739

>>100191698
download full sized model and choose to load it as 4 bit quant?

Anonymous
04/26/24(Fri)13:21:47 No.100191740

Anonymous 04/26/24(Fri)13:21:47 No.100191740

>>100191724
>The National Center for Health Statistics at the CDC showed in their most up to date statistics that 42.4% of U.S. adults were obese as of 2017–2018 (43% for men and 41.9% for women).

Anonymous
04/26/24(Fri)13:22:17 No.100191748

Anonymous 04/26/24(Fri)13:22:17 No.100191748

>>100191707
>What do they get out of this
nothing, its just a humiliation ritual, males are not allowed to be happy in any form of entertainment.

Anonymous
04/26/24(Fri)13:23:05 No.100191761

Anonymous 04/26/24(Fri)13:23:05 No.100191761

>>100191728
when picking one remember that context takes up vram space as well. also, the selling point of GGUFs is that you can offload parts of the model onto your system ram at the cost of speed

Anonymous
04/26/24(Fri)13:23:18 No.100191762

Anonymous 04/26/24(Fri)13:23:18 No.100191762

WHERE ARE THE QUEN 110B QUANTS AIIEEEEEE

Anonymous
04/26/24(Fri)13:23:29 No.100191769

Anonymous 04/26/24(Fri)13:23:29 No.100191769

>>100191707
Your customer support bot ERPing with customers is a bad look.

Anonymous
04/26/24(Fri)13:23:58 No.100191773

Anonymous 04/26/24(Fri)13:23:58 No.100191773

>llama.cpp doesn't allocate all the memory it needs up front when loading the model, only OOMs once you start generating
Why is it that exllamav2, a python program, can manage to do this, but llama.cpp cannot? What is the point of using C++ and all this low-level shit if you can't even statically allocate all the memory you need at load time?

Anonymous
04/26/24(Fri)13:26:05 No.100191797

Anonymous 04/26/24(Fri)13:26:05 No.100191797

>>100191773
python is bloat
preallocating memory you will not use is also bloat

Anonymous
04/26/24(Fri)13:27:05 No.100191806

Anonymous 04/26/24(Fri)13:27:05 No.100191806

So I tried fp16 8B l3 and IQ3 XXS imat 70B and fp16 really is better.... llamacpp is really fucked somehow.

Anonymous
04/26/24(Fri)13:27:50 No.100191816

Anonymous 04/26/24(Fri)13:27:50 No.100191816

>>100191761
Where the hell did that website with all the character cards go? The red one? God it's been so long.

Anonymous
04/26/24(Fri)13:28:28 No.100191822

Anonymous 04/26/24(Fri)13:28:28 No.100191822

File: CyberMiku2.png (1.37 MB, 1216x832)

1.37 MB PNG

>>100191773
on llama.cpp it depends on the flags you use. If you --no-mmap it will load the model up front, but by default it will mmap the model file and only fault in the required parts of the model as they are accessed, which both starts the gen faster, and tends to give you some data locality benefits.
That said, it should probably check for mem requirements and at least warn when there doesn't appear to be enough. swap etc does make that a bit harder to say these things for sure

Anonymous
04/26/24(Fri)13:30:01 No.100191837

Anonymous 04/26/24(Fri)13:30:01 No.100191837

>>100191797
>preallocating memory you will not use is also bloat
It WILL use the memory you dumb nigger, that's why it OOMs. Everything in these models has a fixed size. You theoretically know exactly the size of any temporary buffers to do computation. IIRC exllama does exactly that: once the model loads, it has everything it needs already allocated, and memory usage doesn't budge a single MB after that when you generate. This is not true with llama.cpp.

Anonymous
04/26/24(Fri)13:31:00 No.100191853

Anonymous 04/26/24(Fri)13:31:00 No.100191853

>>100191816
are you talking about chub.ai? not sure what the red one is..

Anonymous
04/26/24(Fri)13:32:16 No.100191864

Anonymous 04/26/24(Fri)13:32:16 No.100191864

>>100191816
or are you talking about sillytavern? the frontend? that's red

Anonymous
04/26/24(Fri)13:36:53 No.100191917

Anonymous 04/26/24(Fri)13:36:53 No.100191917

>>100191816
i just found a new local one https://github.com/cyanff/anime.gf

Anonymous
04/26/24(Fri)13:37:30 No.100191924

Anonymous 04/26/24(Fri)13:37:30 No.100191924

>>100191709
>>100190453
>>100190438
So what would be the best way to fix while retaining functionality? Both seem pretty important but I haven't gotten around to testing the updated prompts yet.

Anonymous
04/26/24(Fri)13:37:46 No.100191928

Anonymous 04/26/24(Fri)13:37:46 No.100191928

>>100191853
That's what I was thinking of, thank you kindly man.

Anonymous
04/26/24(Fri)13:41:57 No.100191984

Anonymous 04/26/24(Fri)13:41:57 No.100191984

>>100191917
Seems to be windows only for now

Anonymous
04/26/24(Fri)13:42:27 No.100191989

Anonymous 04/26/24(Fri)13:42:27 No.100191989

>>100191984
aww wtf

Anonymous
04/26/24(Fri)13:50:09 No.100192091

Anonymous 04/26/24(Fri)13:50:09 No.100192091

>>100191917
It's a weird thing to shill because it doesn't do anything new.

Anonymous
04/26/24(Fri)13:54:56 No.100192147

Anonymous 04/26/24(Fri)13:54:56 No.100192147

>>100191924
Currently I just have moved the output format to the system prompt definition and just partially start the respond with the defined format. This seems to always use it, even when the previouos messages didn't follow the format. Which would be required if the outputs from the embedded oneshot 'agents' gets filtered out.

Made a rentry as it is easier to update:
https://rentry.org/ExperimentalAgentSimPrompt

Anonymous
04/26/24(Fri)13:55:37 No.100192158

Anonymous 04/26/24(Fri)13:55:37 No.100192158

>>100188820
How do you set up Moistral on oobabooga? Might not be using GPU, because it's as slow as it gets to me. I have 36gb of ram.

Anonymous
04/26/24(Fri)13:57:21 No.100192177

Anonymous 04/26/24(Fri)13:57:21 No.100192177

>>100192158
It is horseshit. It is nowhere near a 30B let alone a 70B. Use 8b instruct.

Anonymous
04/26/24(Fri)13:57:55 No.100192182

Anonymous 04/26/24(Fri)13:57:55 No.100192182

>>100192168
>>100192168
>>100192168

Anonymous
04/26/24(Fri)13:59:26 No.100192202

Anonymous 04/26/24(Fri)13:59:26 No.100192202

Anybody got a decent llama3-instruct ST preset? I'm trying 70B and it's much more retarded than miqu I think something is fucked with my configs

Anonymous
04/26/24(Fri)14:01:39 No.100192231

Anonymous 04/26/24(Fri)14:01:39 No.100192231

>>100192202
Quants are fucked. Load 8B fp16 check if it works well and you will get a working preset.

Anonymous
04/26/24(Fri)14:04:01 No.100192265

Anonymous 04/26/24(Fri)14:04:01 No.100192265

>>100191233
>sacred cows
i like it

Anonymous
04/26/24(Fri)14:04:07 No.100192266

Anonymous 04/26/24(Fri)14:04:07 No.100192266

>>100192158

Use the Alpaca format and write a premise for the instructions.

It's really easy to the format wrong which is why some claim it's incoherent.

Like this guy
>>100192177

Anonymous
04/26/24(Fri)14:06:30 No.100192301

Anonymous 04/26/24(Fri)14:06:30 No.100192301

>>100192266
>It's really easy to the format wrong
Even if that were true that would only mean your shit is extremely overfitted and it will implode instantly if you go too far from training set = it is garbage. Go back to your discord tranny.

Anonymous
04/26/24(Fri)14:08:13 No.100192325

Anonymous 04/26/24(Fri)14:08:13 No.100192325

>>100192301
I'm not the Moistral guy

Anonymous
04/26/24(Fri)14:27:52 No.100192530

Anonymous 04/26/24(Fri)14:27:52 No.100192530

File: file.png (113 KB, 1182x737)

113 KB PNG

>>100186538
>>100187626
Also "emergency" is a misnomer making it sound like something used to stop urgently when it's just a parking brake. It does mention its use for parking on hills.

Anonymous
04/26/24(Fri)14:31:10 No.100192566

Anonymous 04/26/24(Fri)14:31:10 No.100192566

Instruct mode example dialogue in ST is gigafucked, it's impossible to make a usable preset with what we have. Guess I'll have to disable example chats for now

Anonymous
04/26/24(Fri)14:33:34 No.100192606

Anonymous 04/26/24(Fri)14:33:34 No.100192606

>>100192566
you also can enable `Skip Example Dialogues Formatting` and embed them into your context template.

Anonymous
04/26/24(Fri)14:43:26 No.100192715

Anonymous 04/26/24(Fri)14:43:26 No.100192715

>>100187059

Is there any TTS where you can control emotion?

Anonymous
04/26/24(Fri)14:44:42 No.100192729

Anonymous 04/26/24(Fri)14:44:42 No.100192729

ggerganov making some fixes
https://github.com/ggerganov/llama.cpp/pull/6920/commits/a774d7084e5aa75ccb4daad3ac3d53c06c7e2837

Anonymous
04/26/24(Fri)14:59:59 No.100192890

Anonymous 04/26/24(Fri)14:59:59 No.100192890

>>100192729
You can't trust this guy

Anonymous
04/26/24(Fri)15:07:55 No.100192984

Anonymous 04/26/24(Fri)15:07:55 No.100192984

>>100192890
>you can't trust the hand that feeds you
Make it yourself then faggot

Anonymous
04/26/24(Fri)15:08:53 No.100192993

Anonymous 04/26/24(Fri)15:08:53 No.100192993

>>100192984
>bootlicking his masters
good goy!

Anonymous
04/26/24(Fri)15:27:49 No.100193165

Anonymous 04/26/24(Fri)15:27:49 No.100193165

>>100192993
>masters
>good goy!
It is free...

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.