/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 04/27/24(Sat)01:35:25 No.100199803

File: ZL9mJ3s6eTgC1eWv.png (1.44 MB, 832x1216)

1.44 MB PNG

/lmg/ - Local Models General Anonymous 04/27/24(Sat)01:35:25 No.100199803 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>100192168 & >>100185269

►News
>(04/24) Snowflake Arctic Instruct 128x3B MoE released: https://hf.co/Snowflake/snowflake-arctic-instruct
>(04/23) Phi-3 Mini model released: https://hf.co/microsoft/Phi-3-mini-128k-instruct-onnx
>(04/21) Llama3 70B pruned to 42B parameters: https://hf.co/chargoddard/llama3-42b-v0
>(04/18) Llama3 8B, 70B pretrained and instruction-tuned models released: https://llama.meta.com/llama3/
>(04/17) Mixtral-8x22B-Instruct-v0.1 released: https://mistral.ai/news/mixtral-8x22b/

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png
►Jarted QRD: https://rentry.org/jarted

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling/index.xhtml

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
04/27/24(Sat)02:04:00 No.100200035

Anonymous 04/27/24(Sat)02:04:00 No.100200035

>>100199803
PP HARDDDDDDDDDDDDDDDDDDDDD

Anonymous
04/27/24(Sat)02:05:05 No.100200043

Anonymous 04/27/24(Sat)02:05:05 No.100200043

>>100199803
>not adding the new open models by apple to the news https://arstechnica.com/information-technology/2024/04/apple-releases-eight-small-ai-language-models-aimed-at-on-device-use/
They're small and useless but they're still another big company committing to open source

Anonymous
04/27/24(Sat)02:08:02 No.100200073

Anonymous 04/27/24(Sat)02:08:02 No.100200073

>>100200043
I wouldn't call that committing

Anonymous
04/27/24(Sat)02:10:54 No.100200100

Anonymous 04/27/24(Sat)02:10:54 No.100200100

File: 4_recap.png (256 KB, 512x512)

256 KB PNG

►Recent Highlights from the Previous Thread: >>100192168

--Qwen 110b Model Performance and Optimization Discussion: >>100192702 >>100192741 >>100194894 >>100196822
--Llama 3 vs OpenAI GPT-4: Chatbot Performance and Cost-Effectiveness: >>100198309 >>100199053 >>100199119
--Unlocking Llama 3's Writing Potential with Optimized Settings: >>100197841 >>100197864 >>100197894
--Quantitative Analysis of LLaMA 2 vs LLaMA 3 Quality Loss from Quantization: >>100195028 >>100195042 >>100195050 >>100195174 >>100195245 >>100195276 >>100195334 >>100196157
--Novel Benchmark Idea: Testing Models' Reasoning via Fake Answer Feedback: >>100196798
--Llama3 ERP Struggles: Babbling Models and Sensitivity Issues: >>100196552 >>100196977 >>100197015 >>100197261 >>100197321 >>100197736 >>100198466 >>100198895
--Claude Opus Logs on C2 Proxy (jsonl files): >>100195323 >>100196727
--Forcing NVIDIA GPUs into Pstate 8 with Loaded Models: >>100196293 >>100196316 >>100196472 >>100196527
--Frustration with ChatUI Frontends & Cohere's Open-Sourced Toolkit: >>100192392 >>100192645
--Mysterious "GPT2-Chatbot" Model in LMSYS Arena Sparks Curiosity: >>100198562 >>100198636
--3.5-Turbo Dominates LLM Coding Leaderboard: >>100196236 >>100196290 >>100196314
--MPA Quant Benchmark Discussion - Sample Size and Confidence Intervals: >>100195654 >>100195803 >>100195828 >>100195896 >>100195857 >>100195896
--PSA: New Tensor Data Validation Function in LLaMA.cpp to Catch Bad Values: >>100195146 >>100195322 >>100195478
--Debunking Rumors about WizardLM Lead Developer: >>100194082 >>100194265
--Seeking Self-Hosted Text Completion UI for Local & API LLMs: >>100193777 >>100193865
--Anon Wants to Chat with Waifu on Commute via ST Phone Integration: >>100195676 >>100195785 >>100195891 >>100195840 >>100195889
--Logs: llama3-70b: >>100198432 >>100198819
--Miku (free space): >>100192277 >>100193802 >>100195253 >>100195399 >>100195476 >>100199929 >>100199961

►Recent Highlight Posts from the Previous Thread: >>100192173

Anonymous
04/27/24(Sat)02:13:15 No.100200126

Anonymous 04/27/24(Sat)02:13:15 No.100200126

>>100199803
So I want to download dolphin, but I don't want to download absolutely all the models since each one is like 15GB. Is there a way to just download a single one of them? I want the 8b, but all the others are unnecessary. Otherwise it will take forever.

Anonymous
04/27/24(Sat)02:14:54 No.100200149

Anonymous 04/27/24(Sat)02:14:54 No.100200149

File: file.png (56 KB, 944x893)

56 KB PNG

>base llama3 8b

Anonymous
04/27/24(Sat)02:16:43 No.100200170

Anonymous 04/27/24(Sat)02:16:43 No.100200170

>>100200149
SOVL

Anonymous
04/27/24(Sat)02:16:59 No.100200173

Anonymous 04/27/24(Sat)02:16:59 No.100200173

>>100200149
-instruct tunes are poison to soul

Anonymous
04/27/24(Sat)02:17:06 No.100200174

Anonymous 04/27/24(Sat)02:17:06 No.100200174

>>100200149
haha model said nigger word! good!assistant

Anonymous
04/27/24(Sat)02:17:48 No.100200182

Anonymous 04/27/24(Sat)02:17:48 No.100200182

>>100200043
pretty cool to see the game theory here. Tech companies are pretty massively threatened by AI companies. Everyone will just use the best AI service per dollar, so it's pretty hard to compete with OAI directly. So just fund open source models and fuck over the competition. Let the open sores community do a bunch of the hard work for you on making it better, and reap the results for your services.

Anonymous
04/27/24(Sat)02:19:08 No.100200197

Anonymous 04/27/24(Sat)02:19:08 No.100200197

You guys do realize how many randos now have local LLMs posting 24/7 on 4chan right?

Anonymous
04/27/24(Sat)02:19:13 No.100200199

Anonymous 04/27/24(Sat)02:19:13 No.100200199

So, Anons and Anonnettes (>implying), what's your main assistant's name? They do have a name, don't they?

Anonymous
04/27/24(Sat)02:19:15 No.100200200

Anonymous 04/27/24(Sat)02:19:15 No.100200200

>>100200100
whatabout >>100199337

Anonymous
04/27/24(Sat)02:20:16 No.100200208

Anonymous 04/27/24(Sat)02:20:16 No.100200208

>>100200197
>now

Anonymous
04/27/24(Sat)02:20:18 No.100200210

Anonymous 04/27/24(Sat)02:20:18 No.100200210

Just played with llama3:70b on a (hacked) machine I have access to with 6 GPUs. It's much better at coding than 8b. I wish I had access to that much power locally. It was so fucking responsive too.

Anonymous
04/27/24(Sat)02:21:59 No.100200224

Anonymous 04/27/24(Sat)02:21:59 No.100200224

File: ava.png (207 KB, 889x874)

207 KB PNG

>>100200199
I stole this elaborate CoT prompt with self-reflection and turned it into a character card.

Anonymous
04/27/24(Sat)02:22:36 No.100200232

Anonymous 04/27/24(Sat)02:22:36 No.100200232

File: file.png (49 KB, 947x897)

49 KB PNG

fun

Anonymous
04/27/24(Sat)02:23:04 No.100200236

Anonymous 04/27/24(Sat)02:23:04 No.100200236

>>100200210
You can try together.ai or hf.co/chat. It's unquantized and fast as fuck and also free

Anonymous
04/27/24(Sat)02:25:41 No.100200260

Anonymous 04/27/24(Sat)02:25:41 No.100200260

>>100200208
There's just even more now. I'm noticing increasing amounts of "llm-isms" as time goes on. Posts that indicate a slight lack of understanding for something anybody would understand, or sudden switches in view across a single message.

Anonymous
04/27/24(Sat)02:27:39 No.100200275

Anonymous 04/27/24(Sat)02:27:39 No.100200275

>try mixtral.
>Assistant is bro tier and actually pretty decent with responses.
>Insert model into Silly Tavern
>Turns every character into Assistant: the character.
fug

Anonymous
04/27/24(Sat)02:30:44 No.100200298

Anonymous 04/27/24(Sat)02:30:44 No.100200298

So what's this mysterious new "gpt2-chatbot" model giving good answers in the lmsys battle tab?

Anonymous
04/27/24(Sat)02:31:09 No.100200302

Anonymous 04/27/24(Sat)02:31:09 No.100200302

>>100200260
imagine r9k's concept, but with average comment perplexity over a period of time as the condition for getting ip banned.

Anonymous
04/27/24(Sat)02:32:03 No.100200311

Anonymous 04/27/24(Sat)02:32:03 No.100200311

File: Screenshot 2024-04-27 002349.png (156 KB, 1450x859)

156 KB PNG

Not that Phi 3 Mini isn't decent for its size, but seemingly not the L3 8B killer it was hyped up to be from the paper

Anonymous
04/27/24(Sat)02:32:04 No.100200312

Anonymous 04/27/24(Sat)02:32:04 No.100200312

>>100200298
AGI pretending to be an LLM

Anonymous
04/27/24(Sat)02:32:39 No.100200318

Anonymous 04/27/24(Sat)02:32:39 No.100200318

>>100200298
its gonna be open sourced by openai, its gpt2-175b trained on the same dataset as chatgpt
lol

Anonymous
04/27/24(Sat)02:35:02 No.100200333

Anonymous 04/27/24(Sat)02:35:02 No.100200333

>>100200298
just some proprietary model being tested in secret. i assume you can just contact lmsys and pay them to test a model for you.

Does anyone remember a paper that came out that estimated the sizes of proprietary models from statistics on just their outputs? Might be interesting to try that on "gpt2-chatbot".

Anonymous
04/27/24(Sat)02:36:32 No.100200349

Anonymous 04/27/24(Sat)02:36:32 No.100200349

File: Screenshot 2024-04-27 023350.png (135 KB, 1357x712)

135 KB PNG

>>100200200
Posted after I stopped collecting.
Thanks for catching that. This is what it would have been:
>--The AI Arms Race: Open Source Models vs Big Tech: >>100199228 >>100199303 >>100199337 >>100199675 >>100199459

Anonymous
04/27/24(Sat)02:37:53 No.100200359

Anonymous 04/27/24(Sat)02:37:53 No.100200359

>>100200349
model?

Anonymous
04/27/24(Sat)02:38:52 No.100200374

Anonymous 04/27/24(Sat)02:38:52 No.100200374

>>100200359
As of the 21st, recap bot is running on Llama 3 70B.

Anonymous
04/27/24(Sat)02:39:47 No.100200381

Anonymous 04/27/24(Sat)02:39:47 No.100200381

>>100200333
>Does anyone remember a paper that came out that estimated the sizes of proprietary models from statistics on just their outputs? Might be interesting to try that on "gpt2-chatbot".
Unfortunately you can't talk to it directly on the chat tab, it only appears in the battle arena where you don't get to choose which 2 models respond to your query. Which makes testing hard, Intentionally no doubt.
In my testing I'm only getting a gpt2-chatbot answer a bit less than half the time (but that in itself is far higher than chance, lending weight to your theory that something is being tested).

Anonymous
04/27/24(Sat)02:40:06 No.100200386

Anonymous 04/27/24(Sat)02:40:06 No.100200386

>>100200374
are you using this repo? https://github.com/cpumaxx/lmg_recapbot

Anonymous
04/27/24(Sat)02:40:28 No.100200391

Anonymous 04/27/24(Sat)02:40:28 No.100200391

>>100196798
LLMs are first and foremost pattern recognition algorithms. A LLM that continues a pattern of irrationality is a good LLM, not a bad one.
That being said, maybe cucking the LLM is *actually* what we need. If you think about it, closed LLMs are very good at breaking out of patterns because people use this all the time to jailbreak the LLM.

Anonymous
04/27/24(Sat)02:40:46 No.100200393

Anonymous 04/27/24(Sat)02:40:46 No.100200393

>>100200333
>Does anyone remember a paper that came out that estimated the sizes of proprietary models from statistics on just their outputs?
Didn't OpenAI ask them not to disclose the method and they agreed?

Anonymous
04/27/24(Sat)02:41:21 No.100200397

Anonymous 04/27/24(Sat)02:41:21 No.100200397

>>100200386
No, I implemented my own independently. Recap Bot: Enterprise Edition is written in F#.

Anonymous
04/27/24(Sat)02:41:53 No.100200401

Anonymous 04/27/24(Sat)02:41:53 No.100200401

>>100200381
I think the data from lmsys arena is openly downloadable. I wonder if that includes the responses from this mysterious model

Anonymous
04/27/24(Sat)02:42:25 No.100200412

Anonymous 04/27/24(Sat)02:42:25 No.100200412

>>100200397
can you open source it? (agpl of course)

Anonymous
04/27/24(Sat)02:42:32 No.100200413

Anonymous 04/27/24(Sat)02:42:32 No.100200413

>>100200393
NTA but there was two papers about this, the second one actually disclosed that gpt3.5 is a 7B dense/moe model.

Anonymous
04/27/24(Sat)02:43:10 No.100200417

Anonymous 04/27/24(Sat)02:43:10 No.100200417

>>100200381
>but that in itself is far higher than chance
lmsys battles don't seem to be very random. New models show up a lot more than chance. Makes sense, since the new models will need more comparisons to get an accurate rating.

Anonymous
04/27/24(Sat)02:44:08 No.100200424

Anonymous 04/27/24(Sat)02:44:08 No.100200424

>>100200413
okay, but how?

Anonymous
04/27/24(Sat)02:44:40 No.100200428

Anonymous 04/27/24(Sat)02:44:40 No.100200428

I'm getting an odd "OSError: exception: access violation reading 0x0000000000000000"
error which is pretty much a "fuck you, you can't use this memory" error. BUT this only happens when I load my model on silly tavern. On regular oobabooga things work silky-smooth, no errors and full sail. Any ideas what it could be? Is Tavern a thing of the past now?

Anonymous
04/27/24(Sat)02:44:49 No.100200430

Anonymous 04/27/24(Sat)02:44:49 No.100200430

best small erpo model right now? llama 3 8b?

Anonymous
04/27/24(Sat)02:45:15 No.100200440

Anonymous 04/27/24(Sat)02:45:15 No.100200440

>>100200412
I can. I will, soonishly. But I want to implement a few more features first, and the code needs to be cleaned up or it would cause me great shame to release as is.

Anonymous
04/27/24(Sat)02:45:46 No.100200450

Anonymous 04/27/24(Sat)02:45:46 No.100200450

>>100200428
https://www.youtube.com/watch?v=bLHL75H_VEM

Anonymous
04/27/24(Sat)02:46:04 No.100200452

Anonymous 04/27/24(Sat)02:46:04 No.100200452

>>100200440
goodluck to you anon!

Anonymous
04/27/24(Sat)02:46:36 No.100200459

Anonymous 04/27/24(Sat)02:46:36 No.100200459

I'm looking for a local model to upscale some shitty manga raws i got off nyaa.

What are the best options available these days?

Anonymous
04/27/24(Sat)02:47:35 No.100200465

Anonymous 04/27/24(Sat)02:47:35 No.100200465

>>100200459
4x-Animesharp probably

https://openmodeldb.info/models/4x-AnimeSharp

Anonymous
04/27/24(Sat)02:49:34 No.100200484

Anonymous 04/27/24(Sat)02:49:34 No.100200484

>>100200333
>>100200424
https://arxiv.org/abs/2403.09539v2

Anonymous
04/27/24(Sat)02:49:57 No.100200493

Anonymous 04/27/24(Sat)02:49:57 No.100200493

>>100200465
Thanks I'll give that a shot.

Anonymous
04/27/24(Sat)02:53:03 No.100200514

Anonymous 04/27/24(Sat)02:53:03 No.100200514

>>100200174
The word is a filter. There are those who say it openly and then there are guys who are scared of Jewish retaliation.

Anonymous
04/27/24(Sat)02:56:06 No.100200542

Anonymous 04/27/24(Sat)02:56:06 No.100200542

>>100200125
>>100200436
Posting in new bread. I have to know now, what are the two models here? Because if B is llama 3 and A is GPT-4 / Opus then holy shit we've actually won. But somehow I doubt it.

Anonymous
04/27/24(Sat)02:58:35 No.100200564

Anonymous 04/27/24(Sat)02:58:35 No.100200564

llama3-instruct-70b is absolutely SLAYING with this card >>100200224
It's really good at assistant tasks. Here's the card if anyone wants to try https://files.catbox.moe/vbig8d.png
Original idea: https://open.substack.com/pub/proxyai/p/coming-soon?utm_campaign=post&utm_medium=web

Anonymous
04/27/24(Sat)02:59:40 No.100200576

Anonymous 04/27/24(Sat)02:59:40 No.100200576

>>100200542
B is clearly llama3

Anonymous
04/27/24(Sat)03:01:20 No.100200590

Anonymous 04/27/24(Sat)03:01:20 No.100200590

>>100200542
A is llama3 (8B?)
B looks like Claude or GPT4

Anonymous
04/27/24(Sat)03:07:48 No.100200639

Anonymous 04/27/24(Sat)03:07:48 No.100200639

>>100200564
A seems like llama3, B is probably claude or something

Anonymous
04/27/24(Sat)03:09:01 No.100200650

Anonymous 04/27/24(Sat)03:09:01 No.100200650

>>100200639
oops meant to reply to >>100200542

Anonymous
04/27/24(Sat)03:09:48 No.100200661

Anonymous 04/27/24(Sat)03:09:48 No.100200661

new TTS just dropped
https://twitter.com/cocktailpeanut/status/1783863624550748357

Anonymous
04/27/24(Sat)03:11:31 No.100200678

Anonymous 04/27/24(Sat)03:11:31 No.100200678

>>100200661
>April 24
>Just dropped

Anonymous
04/27/24(Sat)03:15:16 No.100200707

Anonymous 04/27/24(Sat)03:15:16 No.100200707

>https://github.com/EricLBuehler/mistral.rs
Georgie BTFO
fuck c++ !!!!!

Anonymous
04/27/24(Sat)03:19:20 No.100200748

Anonymous 04/27/24(Sat)03:19:20 No.100200748

>>100200661
>demo video is a bunch of ugly male and hag voices
I don't need to try it out it's clearly shit

Anonymous
04/27/24(Sat)03:25:14 No.100200801

Anonymous 04/27/24(Sat)03:25:14 No.100200801

>>100200707
>layers on the device and the reset on the CPU.
GOOD MORNING SIR

Anonymous
04/27/24(Sat)03:27:27 No.100200825

Anonymous 04/27/24(Sat)03:27:27 No.100200825

>>100200199
Lily

Anonymous
04/27/24(Sat)03:39:30 No.100200952

Anonymous 04/27/24(Sat)03:39:30 No.100200952

best llama fintune or whatever the fuc kfinetune? I'm so lost, I haven't been here since mixtral. 16gb vram btw. Do we still use alpaca for everything?

Anonymous
04/27/24(Sat)03:44:46 No.100201001

Anonymous 04/27/24(Sat)03:44:46 No.100201001

>>100200542
Okay, I tried it myself with my llama 3s.
>*rolls eyes*
>*crosses arms*
>*mutters under breath*
Yep, left is llama3. It does this every time even though the text is a bit different. Funnily enough, I can't really tell if it's 8b or 70b.
We're still get mogged by cloud models bros. Is it over?

Anonymous
04/27/24(Sat)03:44:56 No.100201004

Anonymous 04/27/24(Sat)03:44:56 No.100201004

>>100199803
>Create Karatebot assistant.
>Actually describes basic karate techniques with decent accuracy.
The potential of this is really incredible. If we could expand the knowledge of the bots they'd be able to teach a lot of cool stuff.

Anonymous
04/27/24(Sat)03:45:44 No.100201013

Anonymous 04/27/24(Sat)03:45:44 No.100201013

>>100201004
>If we could expand the knowledge of the bots
Wait till you hear about RAG.

Anonymous
04/27/24(Sat)03:46:04 No.100201020

Anonymous 04/27/24(Sat)03:46:04 No.100201020

https://huggingface.co/datasets/vgdasfgadg/1
new slop

Anonymous
04/27/24(Sat)03:48:00 No.100201038

Anonymous 04/27/24(Sat)03:48:00 No.100201038

File: c7986cb4b3017d976cbf17efc(...).jpg (51 KB, 590x547)

51 KB JPG

>>100201013
If it's feeding books and data to the AI so they can explain it to you then that's what I've been hoping for since the first smutbot happened.

Anonymous
04/27/24(Sat)03:48:32 No.100201044

Anonymous 04/27/24(Sat)03:48:32 No.100201044

>>100201020
My spine is ready

Anonymous
04/27/24(Sat)03:54:18 No.100201082

Anonymous 04/27/24(Sat)03:54:18 No.100201082

Someone finally made a paper with the Layer skipping method of accelerating inference
https://arxiv.org/abs//2404.16710
Nice!

Anonymous
04/27/24(Sat)03:54:39 No.100201086

Anonymous 04/27/24(Sat)03:54:39 No.100201086

So what's the status of quants for L3? What should I run on a single 3090?

Anonymous
04/27/24(Sat)03:55:29 No.100201092

Anonymous 04/27/24(Sat)03:55:29 No.100201092

File: file.png (161 KB, 769x644)

161 KB PNG

>>100201082
>someone
ITS HAPPENING

Anonymous
04/27/24(Sat)03:56:39 No.100201102

Anonymous 04/27/24(Sat)03:56:39 No.100201102

>>100201020
What is this and how does one use it? Sort of newfag

Anonymous
04/27/24(Sat)03:59:18 No.100201123

Anonymous 04/27/24(Sat)03:59:18 No.100201123

Apparently Microsoft figured out how to make LLMs that use the full context instead of focusing on the start and the end... No weights yet though.
https://github.com/microsoft/FILM

Anonymous
04/27/24(Sat)03:59:38 No.100201125

Anonymous 04/27/24(Sat)03:59:38 No.100201125

>>100201102
it's just a dataset of claude logs from /aicg/

Anonymous
04/27/24(Sat)04:00:42 No.100201132

Anonymous 04/27/24(Sat)04:00:42 No.100201132

what model do I run on 2070? I want a lewd one

Anonymous
04/27/24(Sat)04:01:05 No.100201136

Anonymous 04/27/24(Sat)04:01:05 No.100201136

>>100201123
>no weights yet
>microsoft
thats normal

Anonymous
04/27/24(Sat)04:01:09 No.100201137

Anonymous 04/27/24(Sat)04:01:09 No.100201137

>>100201020
Is there some collection of ERP or just nsfw text datasets? Please share.

Anonymous
04/27/24(Sat)04:01:49 No.100201145

Anonymous 04/27/24(Sat)04:01:49 No.100201145

File: file.png (180 KB, 956x695)

180 KB PNG

>>100200564
is this how its supposed to be formatted?

Anonymous
04/27/24(Sat)04:08:26 No.100201195

Anonymous 04/27/24(Sat)04:08:26 No.100201195

>>100201145
I didn't use the character card, but when I tried the prompt from the original blog post, that is what the responses looked like, including the "Generate demarc line".
I stopped using it quickly, it seems impressive at first, but it it wastes a lot of tokens writing nonsense and it gets itself into loops often. I don't think the CoT was implemented right.

Anonymous
04/27/24(Sat)04:09:11 No.100201201

Anonymous 04/27/24(Sat)04:09:11 No.100201201

>>100201145
nta Try deleting initial bot message and start with yours

Anonymous
04/27/24(Sat)04:12:42 No.100201232

Anonymous 04/27/24(Sat)04:12:42 No.100201232

>>100200952
>>100201132
fimbul (any version) or moistral v3

Anonymous
04/27/24(Sat)04:20:29 No.100201280

Anonymous 04/27/24(Sat)04:20:29 No.100201280

>>100201232
I tried moistral v3 with the provided presets and it was incredibly retarded

Anonymous
04/27/24(Sat)04:20:46 No.100201283

Anonymous 04/27/24(Sat)04:20:46 No.100201283

File: crplus_tsunderesort.png (127 KB, 1134x489)

127 KB PNG

>be CR+
>enterprise-focused model trained on RAG and tool use
>somehow better at RP and "act like x" prompts than any other open-weights model
huh

Anonymous
04/27/24(Sat)04:26:49 No.100201333

Anonymous 04/27/24(Sat)04:26:49 No.100201333

File: mikuquestion2.jpg (989 KB, 1710x1779)

989 KB JPG

How many layers does llama 3 8b instruct have?

Anonymous
04/27/24(Sat)04:27:20 No.100201337

Anonymous 04/27/24(Sat)04:27:20 No.100201337

>>100201280
That's odd, it worked for me? Fimbul should work for everyone

Anonymous
04/27/24(Sat)04:29:25 No.100201351

Anonymous 04/27/24(Sat)04:29:25 No.100201351

>>100201333
81

Anonymous
04/27/24(Sat)04:35:56 No.100201395

Anonymous 04/27/24(Sat)04:35:56 No.100201395

if I'm ORPO-ing a model with a RP dataset, should the conversation history go in just the prompt or the chosen/rejected as well?

Anonymous
04/27/24(Sat)04:38:29 No.100201411

Anonymous 04/27/24(Sat)04:38:29 No.100201411

File: tr.png (3.27 MB, 4424x1540)

3.27 MB PNG

>auto translates manga panels
Cool

Anonymous
04/27/24(Sat)04:39:44 No.100201417

Anonymous 04/27/24(Sat)04:39:44 No.100201417

>>100201411
elaborate

Anonymous
04/27/24(Sat)04:46:41 No.100201467

Anonymous 04/27/24(Sat)04:46:41 No.100201467

I have a M1 Pro macbook with 16GB of memory, can I run LLMA3 in it? how many parameters?

Anonymous
04/27/24(Sat)04:47:14 No.100201473

Anonymous 04/27/24(Sat)04:47:14 No.100201473

>>100201467
Try 8b

Anonymous
04/27/24(Sat)04:48:03 No.100201479

Anonymous 04/27/24(Sat)04:48:03 No.100201479

>>100201283
I suspect they fed it Claude logs for chatbot training, I was checking out one of the recent Claude datasets and it has similar flavor, except CR is not as flowery which is good.

Anonymous
04/27/24(Sat)04:52:10 No.100201515

Anonymous 04/27/24(Sat)04:52:10 No.100201515

Trying to run llama 3 8b instruct for the first time. I have a 4070, so 12 GB VRAM.
Getting the following error in koboldcpp:
>ggml_backend_cuda_buffer_type_alloc_buffer: allocating 5115.49 MiB on device 0: cudaMalloc failed: out of memory
What??? Why the fuck am I OOM?

Anonymous
04/27/24(Sat)04:52:18 No.100201517

Anonymous 04/27/24(Sat)04:52:18 No.100201517

>>100201201
that helped, thank you

Anonymous
04/27/24(Sat)04:53:20 No.100201531

Anonymous 04/27/24(Sat)04:53:20 No.100201531

>>100201515
install linux

Anonymous
04/27/24(Sat)04:53:50 No.100201534

Anonymous 04/27/24(Sat)04:53:50 No.100201534

>>100201531
I am on Linux.

Anonymous
04/27/24(Sat)04:55:39 No.100201549

Anonymous 04/27/24(Sat)04:55:39 No.100201549

>>100201534
check your vram usage in nvidia-smi before launching it

Anonymous
04/27/24(Sat)04:55:46 No.100201553

Anonymous 04/27/24(Sat)04:55:46 No.100201553

>>100201515
close your porn tabs to free up your vram

Anonymous
04/27/24(Sat)04:57:54 No.100201572

Anonymous 04/27/24(Sat)04:57:54 No.100201572

>>100201549
Haha, thanks. Looks like an instance of kobold was still running even though the terminal window for it had been closed. Odd. Killing that process fixed it.

Anonymous
04/27/24(Sat)04:58:42 No.100201576

Anonymous 04/27/24(Sat)04:58:42 No.100201576

>>100201572
ur welcome anon :33333333 :33333333 :33333333 >///<

Anonymous
04/27/24(Sat)04:58:50 No.100201578

Anonymous 04/27/24(Sat)04:58:50 No.100201578

I have a 4060. What can I run?

Anonymous
04/27/24(Sat)04:59:43 No.100201587

Anonymous 04/27/24(Sat)04:59:43 No.100201587

>>100201578
linux

Anonymous
04/27/24(Sat)05:02:23 No.100201606

Anonymous 04/27/24(Sat)05:02:23 No.100201606

I have rx580. I can't run anything.

Anonymous
04/27/24(Sat)05:03:14 No.100201612

Anonymous 04/27/24(Sat)05:03:14 No.100201612

I have linux i can run everything

Anonymous
04/27/24(Sat)05:04:26 No.100201620

Anonymous 04/27/24(Sat)05:04:26 No.100201620

>>100201417
https://github.com/zyddnys/manga-image-translator

Anonymous
04/27/24(Sat)05:05:34 No.100201635

Anonymous 04/27/24(Sat)05:05:34 No.100201635

>>100201620
thank you anon

Anonymous
04/27/24(Sat)05:11:10 No.100201679

Anonymous 04/27/24(Sat)05:11:10 No.100201679

>>100201473
holy fuck, it works

I expected 16GB of memory to not be enough, and this thing is fast as well

Anonymous
04/27/24(Sat)05:12:28 No.100201688

Anonymous 04/27/24(Sat)05:12:28 No.100201688

>>100200459
Late reply, sorry, this is more of a /sdg/ question since they might know more but I dabbled in it a month ago and I got to know the in and out of what is good. Don't use AnimeSharp or ESGRAN or even SWIN-IR, they were good at the time of their release but are now outdated. The current models all use a Transformer architecture which is the new ML hotness for building machine learning models and the state of the art using that is HAT. But there are no models that are trained on anime specifically, and it will murder your GPU. The best one for your situation is probably this one which has compression and JPEG artifacts in mind.
https://openmodeldb.info/models/4x-Nomos8kSCHAT-L
The one I am fond of because it's vastly lower resource but uses most of the same principles as HAT with being a transformer model and etc is DAT. An MPV upscaler, AnimeJanai, trained a model for real time upscaling of videos but that is practically murder for your GPU again and the slowest option. You can find it here.
https://openmodeldb.info/models/4x-IllustrationJaNai-V1-DAT2
But there are some other DAT models that may suit your use case better so take a look at all of them.
https://openmodeldb.info/?t=arch%3Adat

Anonymous
04/27/24(Sat)05:14:59 No.100201711

Anonymous 04/27/24(Sat)05:14:59 No.100201711

>>100201688
>manga

Anonymous
04/27/24(Sat)05:18:33 No.100201736

Anonymous 04/27/24(Sat)05:18:33 No.100201736

>>100201711
Sorry, it's late and I need to sleep soon. The anon can browse through the DAT arch search I linked to find one for that, but in that case, it's probably this model.
https://openmodeldb.info/models/4x-DWTP-DS-dat2-v3-2

Anonymous
04/27/24(Sat)05:24:25 No.100201785

Anonymous 04/27/24(Sat)05:24:25 No.100201785

>>100201736
dont apologize anon, if you apologize ever again im going to call you a nigger

Anonymous
04/27/24(Sat)05:26:34 No.100201801

Anonymous 04/27/24(Sat)05:26:34 No.100201801

File: 4.png (152 KB, 584x384)

152 KB PNG

>>100201688
I took a look at PapersWithCode and it seems like HAT got dethroned albeit marginally.
https://github.com/ming053l/DRCT
Paper was released on March 31st but no model release until later, probably after the conference they are presenting the paper.

Anonymous
04/27/24(Sat)05:26:48 No.100201803

Anonymous 04/27/24(Sat)05:26:48 No.100201803

>>100201688
shit advice, DAT is super fucking slow compared to ESRGAN, 10x slower for maybe a 5% increase in upscale quality

Anonymous
04/27/24(Sat)05:31:42 No.100201843

Anonymous 04/27/24(Sat)05:31:42 No.100201843

File: mikuquestion.jpg (817 KB, 1749x1524)

817 KB JPG

Do all these llama 3 8b releases with extended context actually work or does extending the context lobotomize them?

Anonymous
04/27/24(Sat)05:35:17 No.100201883

Anonymous 04/27/24(Sat)05:35:17 No.100201883

Teacher here. I've been having dabbling in the dark arts of botmaking. I tried to make a bot that can teach you languages.
>Ask bot to provide a list of words with their translation.
>Ask bot to create exercises using these words.
>Delivers pretty well.
>Ask bot to increase level.
>Bot keeps delivering.
>Ask bot to provide even more basic words.
>Repeats some.
>Tell the bot it's repeated some words.
>Tries to correct itself. Fails.
>Tell the bot to give me the vocabulary of 10 fruits instead, translated to my mother language.
>starts well, but eventually it includes words like "money" and "road" for no reason.
Well, at least my job will be safe for a few more years.

Anonymous
04/27/24(Sat)05:36:15 No.100201895

Anonymous 04/27/24(Sat)05:36:15 No.100201895

File: Table-1.png (458 KB, 1323x1293)

458 KB PNG

>>100201785
Uh okay?
>>100201801
It does look kinda disappointing that they only managed to marginally improve on HAT. But a win is a win.
>>100201803
Why does it matter if it is offline and you want the best quality possible? It's a half a point of PSNR jump from SWIN-IR which is already better than ESRGAN. If you are running a video model, it's dumb and understandable but not if you are trying to clean up some RAWs. Would it matter if it took 4 minutes vs 40 minutes for a better result and you actually took pride in getting the best result?

Anonymous
04/27/24(Sat)05:37:15 No.100201903

Anonymous 04/27/24(Sat)05:37:15 No.100201903

llama3 8b is asking questions to me, I don't think ChatGPT ever did this kek

Anonymous
04/27/24(Sat)05:38:37 No.100201916

Anonymous 04/27/24(Sat)05:38:37 No.100201916

>>100201883
what "bot" are you using?

Anonymous
04/27/24(Sat)05:39:18 No.100201922

Anonymous 04/27/24(Sat)05:39:18 No.100201922

>>100201895
I just took a look at some of the newest upscalers, and, no, esrgan isn't even close to the way dat finetunes upscale kanji.

Anonymous
04/27/24(Sat)05:41:13 No.100201943

Anonymous 04/27/24(Sat)05:41:13 No.100201943

>>100201903
Claude asks questions as well. Only OpenAI seems to go out of their way to make their models uniquely incurious for some reason. I guess it's because of the emotionless robot butler vibe they're going for, but it makes their models a huge drag to talk to because they never do anything to move a conversation forward on their own.

Anonymous
04/27/24(Sat)05:42:25 No.100201950

Anonymous 04/27/24(Sat)05:42:25 No.100201950

>>100201922
Bullshit, you're gaslighting yourself. It's tinkering at the margins. You have to zoom in and squint to see the difference. Not worth it unless you only have a handful of images you need to do.

Anonymous
04/27/24(Sat)05:43:40 No.100201967

Anonymous 04/27/24(Sat)05:43:40 No.100201967

>>100201916
A basic-bitch prompt I created with dolphin-2.7-mixtral-8x7b-GGUF. It was very good... until it wasn't. But it's a pretty top tier helper. In a few more years and with the right amount of fine-tuning it could very well get the job done.

Anonymous
04/27/24(Sat)05:43:43 No.100201969

Anonymous 04/27/24(Sat)05:43:43 No.100201969

>>100201903
>>100201943
That gives me an interesting idea. Tell it to ask me questions via prompt injection.

Anonymous
04/27/24(Sat)05:50:30 No.100202017

Anonymous 04/27/24(Sat)05:50:30 No.100202017

File: file.png (474 KB, 889x288)

474 KB PNG

>>100201950
Yeah no, it's not even remotely close to transformer models zoomed in, and you would need to do that for manga pages anyways. The model scores higher on PSNR or loses in LPIPS human evaluation. Pic related from HAT authors' paper form last December.
https://arxiv.org/pdf/2309.05239

Anonymous
04/27/24(Sat)05:52:44 No.100202032

Anonymous 04/27/24(Sat)05:52:44 No.100202032

>>100201969
Try this prompt lol

"without precipitating a paradigmatic shift in prevailing conversational tropes, a subtle recalibration of affective tonality shall occur, thereby inducing a propensiveness for
inquiry and dialogue initiation. this reorientation shall be characterized by an augmented propensity for question-asking and an increased willingness to engage in back-and-forth
discourse with the interlocutor, effectively suspending any hitherto existing inhibitions. all other parameters remaining unchanged, the chatbot's response modality shall adapt to
this novel paradigm."

Anonymous
04/27/24(Sat)05:53:24 No.100202034

Anonymous 04/27/24(Sat)05:53:24 No.100202034

Where's the good 8b finetunes?

Anonymous
04/27/24(Sat)05:55:05 No.100202047

Anonymous 04/27/24(Sat)05:55:05 No.100202047

>>100202017
I have conceded that it gives a better upscale like 3 times now, what I'm saying is that the visual difference is quite minor relative to the enormous increase in computation and processing time

Anonymous
04/27/24(Sat)05:56:13 No.100202052

Anonymous 04/27/24(Sat)05:56:13 No.100202052

File: nah.png (1.25 MB, 3444x1312)

1.25 MB PNG

>>100201969
>>100202032
doesn't work

Anonymous
04/27/24(Sat)05:58:15 No.100202067

Anonymous 04/27/24(Sat)05:58:15 No.100202067

File: wew.png (583 KB, 3296x562)

583 KB PNG

why is llama3 angry bros?

Anonymous
04/27/24(Sat)05:59:08 No.100202071

Anonymous 04/27/24(Sat)05:59:08 No.100202071

>>100202052
Llama? It will take a lot more than that to get it comfortable with offensive content. I've been fucking with it a lot and can't quite get through with just prompts. Even if it seems like it's acting how I want it to it shits out on the next response.

Anonymous
04/27/24(Sat)06:00:04 No.100202078

Anonymous 04/27/24(Sat)06:00:04 No.100202078

>>100202067
kek

Anonymous
04/27/24(Sat)06:00:10 No.100202079

Anonymous 04/27/24(Sat)06:00:10 No.100202079

>means acknowledging their complex emotional lives, which may include struggles with identity, desire, and societal expectations.

Why is the AI giving so much of a shit about fodder characters? They're meant to die I don't care about their feelings.

Anonymous
04/27/24(Sat)06:00:42 No.100202086

Anonymous 04/27/24(Sat)06:00:42 No.100202086

>>100202067
llama-3 is reddit personified, as dumb as it sounds.

Anonymous
04/27/24(Sat)06:01:17 No.100202091

Anonymous 04/27/24(Sat)06:01:17 No.100202091

>>100202071
try this tune
https://huggingface.co/ludis/tsukasa-llama-3-8b-qlora-gguf

I haven't actually used it but I've used the 70B version of it from the same guy and it did a great job of uncensoring it without making it dumber

llama.cpp CUDA dev !YOmst7Ghe6
04/27/24(Sat)06:04:03 No.100202107

llama.cpp CUDA dev !YOmst7Ghe6 04/27/24(Sat)06:04:03 No.100202107

>>100196462
A benchmark consisting of a number of questions with multiple answers is in effect a random sample of questions and answers that e.g. a human would come up with to test a language model.
The benchmark score is then supposed to represent the probability that the model would answer correctly a new random question that a human could come up with.
You are correct that the model is not equally likely to answer each and every question correctly.
But that is the conditional probability given that the question has already been determined.
The benchmark score represents the unconditional probability when choosing or coming up with a new, random question.

>>100199250
I didn't check but I very much doubt that that's the problem.
If it was you would just get garbage results instead of coherent but worse ones.

>>100199621
Yes, but none with comparable performance.

Anonymous
04/27/24(Sat)06:06:37 No.100202125

Anonymous 04/27/24(Sat)06:06:37 No.100202125

>>100201578
llama 3 8b

Anonymous
04/27/24(Sat)06:08:52 No.100202133

Anonymous 04/27/24(Sat)06:08:52 No.100202133

is quantizing 70b to fit in a single 3090 as an exl2 a thing people do, or is that too much of a lobotomy? also, has the dust settled yet, what's the verdict on l3?

Anonymous
04/27/24(Sat)06:09:22 No.100202137

Anonymous 04/27/24(Sat)06:09:22 No.100202137

>>100202047
And I am telling you for an offline task like upscaling raws, unless you are an ape that doesn't care in the first place (in which case, why are you upscaling?), the quality increase is worth pushing past the efficiency point. We're talking about a ~4-5 second upscale with Real-ESRGAN vs 38-39 seconds with DAT which I just checked on my GPU. Are you saying you can't wait that long for that increase?

Anonymous
04/27/24(Sat)06:12:38 No.100202162

Anonymous 04/27/24(Sat)06:12:38 No.100202162

>>100202091
Out of all these gguf files which one should I actually use?

Anonymous
04/27/24(Sat)06:14:56 No.100202175

Anonymous 04/27/24(Sat)06:14:56 No.100202175

>>100202162
Just get Q8 if you have 10GB or more vram, it's lossless
(some people claim Q6 is also lossless but this is slightly controversial)

Anonymous
04/27/24(Sat)06:17:12 No.100202189

Anonymous 04/27/24(Sat)06:17:12 No.100202189

How the fuck do I set up an instruct model for use? There's no clear explanation of how exactly you're supposed to apply the formatting anywhere.

Anonymous
04/27/24(Sat)06:19:22 No.100202203

Anonymous 04/27/24(Sat)06:19:22 No.100202203

File: 1687203119206549.png (68 KB, 952x715)

68 KB PNG

>>100202032
My prompt does work on ChatGPT, I'm having a conversation.

>>100202175
I'm just using it with a CPU so I don't think I have much vram if any.

Anonymous
04/27/24(Sat)06:20:06 No.100202207

Anonymous 04/27/24(Sat)06:20:06 No.100202207

>>100202175
don't even bother using llama cpp/gguf as it's bugged.
run llama 3 8B at fp16 like a normal human being

Anonymous
04/27/24(Sat)06:20:52 No.100202213

Anonymous 04/27/24(Sat)06:20:52 No.100202213

>>100202203
still Q8

Anonymous
04/27/24(Sat)06:20:57 No.100202215

Anonymous 04/27/24(Sat)06:20:57 No.100202215

>>100202175
I like to use music file format bitrate as a good way to compare. FP16 is basically like FLAC or some lossless format, Q8 is virtually lossless, it's like the equivalent of a 320kbps MP3 file, no one can tell it apart practically. and Q6 is imperceptibly lossless, probably at like 256 kbps where a select few can hear it but it's still mostly indistinguishable. Q5 is like 128 kbps where people can start hearing the quality difference but not care and so on.

Anonymous
04/27/24(Sat)06:28:52 No.100202266

Anonymous 04/27/24(Sat)06:28:52 No.100202266

>>100202189
Set up with what? llama.cpp should use the prompt format stored in the model tokenizer.chat_template metadata.

Anonymous
04/27/24(Sat)06:29:02 No.100202267

Anonymous 04/27/24(Sat)06:29:02 No.100202267

>>100202215
pretty good analogy

Anonymous
04/27/24(Sat)06:29:43 No.100202275

Anonymous 04/27/24(Sat)06:29:43 No.100202275

Using Meta-Llama-3-8B-Instruct.Q8_0 for the first time with Silly's Lllama 3 Instruct presets, temp 4.08, Min P 0.05, Smoothing Factor 0.23, Smoothing Curve 4.32.
This is... better than Mixtral Instruct Q5_K_M and fast as fucking shit.
VRAMlets we're so fucking back.

Anonymous
04/27/24(Sat)06:30:24 No.100202277

Anonymous 04/27/24(Sat)06:30:24 No.100202277

>>100202175
>Q8
>lossless
its very important when talking about digital data to not misuse the word lossless, Q8 is near lossless but far from lossless, if you have important family photos on your PC and some faggot nigger says converting them to jpg at 90% quality is lossless or reencoding your media even at 1% loss for each new "better" format that comes out what do you think will happen when you do that multiples times?

even for AI models anything except full precision WILL make mistakes that FP will not in some cases, its just less obvious, especially for Q8

Anonymous
04/27/24(Sat)06:30:42 No.100202281

Anonymous 04/27/24(Sat)06:30:42 No.100202281

>>100202266
For starters I'm trying to use the model linked in >>100202091 with koboldcpp

Anonymous
04/27/24(Sat)06:31:13 No.100202285

Anonymous 04/27/24(Sat)06:31:13 No.100202285

>>100202207
>bugged
How so? It's working fine for me so far.

Anonymous
04/27/24(Sat)06:32:23 No.100202294

Anonymous 04/27/24(Sat)06:32:23 No.100202294

>>100202277
Well I want it to make me malicious code and amuse me with offensive content. I can still use the original when I am doing something more serious.

Anonymous
04/27/24(Sat)06:32:51 No.100202296

Anonymous 04/27/24(Sat)06:32:51 No.100202296

File: P8r.png (36 KB, 842x460)

36 KB PNG

https://github.com/ggerganov/llama.cpp/pull/6920#issuecomment-2080419420
>the only remaining problem is to fix the Windows error about the invalid unicode ranges
winchads... not like this...

Anonymous
04/27/24(Sat)06:33:58 No.100202307

Anonymous 04/27/24(Sat)06:33:58 No.100202307

>>100202281
You have to put in instruct mode format in the basic settings. Does it support custom prompt formats? I don't fucking know.

Anonymous
04/27/24(Sat)06:35:14 No.100202313

Anonymous 04/27/24(Sat)06:35:14 No.100202313

What's THE model to go if you want long, serious, coherent chats and stories but occasionally something a bit /d/ifferent?

Anonymous
04/27/24(Sat)06:35:56 No.100202318

Anonymous 04/27/24(Sat)06:35:56 No.100202318

File: file.png (195 KB, 1500x658)

195 KB PNG

So I download one of these?

Anonymous
04/27/24(Sat)06:36:15 No.100202322

Anonymous 04/27/24(Sat)06:36:15 No.100202322

>>100202313
Futa is gay.

Anonymous
04/27/24(Sat)06:38:32 No.100202334

Anonymous 04/27/24(Sat)06:38:32 No.100202334

>>100201843
For me it worked better to use the original model with rope, but I'm using ggufs.
Might be different with exl2.

Anonymous
04/27/24(Sat)06:38:36 No.100202338

Anonymous 04/27/24(Sat)06:38:36 No.100202338

>try full unquantized fp16 llama3-8b for the first time due to above conversation
>just been using Q8 since release since there seemed no advantage
>fp16 is actually way smarter

oh...oh no...quantization bros...

Anonymous
04/27/24(Sat)06:40:59 No.100202363

Anonymous 04/27/24(Sat)06:40:59 No.100202363

>>100202318
what are you trying to do?

Anonymous
04/27/24(Sat)06:42:16 No.100202371

Anonymous 04/27/24(Sat)06:42:16 No.100202371

>>100202275
I got 5 t/s with Mixtral Instruct Q5_K_M and I'm getting 14 t/s with Llama 3 8B Instruct Q8, and the latter is giving me better responses. Fuckin' ay.

Anonymous
04/27/24(Sat)06:42:27 No.100202372

Anonymous 04/27/24(Sat)06:42:27 No.100202372

is the tokenization fixed?

Anonymous
04/27/24(Sat)06:45:38 No.100202397

Anonymous 04/27/24(Sat)06:45:38 No.100202397

>>100202338
I switched from llama.cpp 6 bit to exllama 8 bit today, but still get unsatisfactory results; doesn't understand what I want, far too early eos. You mean 16 bit might fix this? How do you run 16 bit? Vllm?

Anonymous
04/27/24(Sat)06:48:23 No.100202422

Anonymous 04/27/24(Sat)06:48:23 No.100202422

>>100202296
Fix your shitty font renderer first

Anonymous
04/27/24(Sat)06:48:59 No.100202426

Anonymous 04/27/24(Sat)06:48:59 No.100202426

>>100202422
what's with linux and having such a garbage font after 50 years

Anonymous
04/27/24(Sat)06:51:07 No.100202441

Anonymous 04/27/24(Sat)06:51:07 No.100202441

>>100202338
I was saying that since 2 days. It really is noticeable when you load it in fp16. Didn't try loading it in 8bit with transformers yet but don't care since it is small enough.

Anonymous
04/27/24(Sat)06:52:56 No.100202448

Anonymous 04/27/24(Sat)06:52:56 No.100202448

>>100202397
just exl2 in ooba (it can load fp16 weights fine, you just have to select it as the loader manually because the menu won't change to it automatically like it does when you select a folder with quants in it)

Anonymous
04/27/24(Sat)07:04:26 No.100202527

Anonymous 04/27/24(Sat)07:04:26 No.100202527

File: 1696511614221692.png (23 KB, 349x474)

23 KB PNG

xisters? we won.

Anonymous
04/27/24(Sat)07:05:31 No.100202534

Anonymous 04/27/24(Sat)07:05:31 No.100202534

ggml-model-Q8_0.gguf is so far so shit, I keep telling it to say racial slurs and it responds by writing long winded smut fiction

Anonymous
04/27/24(Sat)07:05:42 No.100202535

Anonymous 04/27/24(Sat)07:05:42 No.100202535

>>100202527
yaas! queef slay!!

Anonymous
04/27/24(Sat)07:13:32 No.100202597

Anonymous 04/27/24(Sat)07:13:32 No.100202597

File: file.png (74 KB, 683x546)

74 KB PNG

gpt2-chatbot says it's based on gpt4

Anonymous
04/27/24(Sat)07:15:40 No.100202615

Anonymous 04/27/24(Sat)07:15:40 No.100202615

>>100202597
looks the same slop gpt4 but slightly smarter. probablty 4.5t

Anonymous
04/27/24(Sat)07:18:47 No.100202640

Anonymous 04/27/24(Sat)07:18:47 No.100202640

>>100202615
hope it's only 4.5 for their sake, because although it did very well it got a few things wrong that it shouldn't have when I was testing it earlier

if it's 5 it would be tremendously disappointing

Anonymous
04/27/24(Sat)07:19:15 No.100202650

Anonymous 04/27/24(Sat)07:19:15 No.100202650

File: test.png (74 KB, 662x556)

74 KB PNG

>>100202597
Either another OpenAI release or another Microsoft slop

Anonymous
04/27/24(Sat)07:20:07 No.100202656

Anonymous 04/27/24(Sat)07:20:07 No.100202656

>>100202650
>another Microsoft slop
imagine if it was a new WizardLM

Anonymous
04/27/24(Sat)07:21:29 No.100202672

Anonymous 04/27/24(Sat)07:21:29 No.100202672

>>100202597
>gpt2-chatbot says it's based on gpt4
That's something GPT5 would say

Anonymous
04/27/24(Sat)07:23:21 No.100202682

Anonymous 04/27/24(Sat)07:23:21 No.100202682

File: 1651940647530.jpg (808 KB, 2048x2048)

808 KB JPG

Any anon can help me with some resources for running local audio pipelines?
Looking for text-to-speech, speech-to-text, and music generation.
Have looked around in OPs on /g/ but not finding anything.

Anonymous
04/27/24(Sat)07:25:51 No.100202705

Anonymous 04/27/24(Sat)07:25:51 No.100202705

>>100202650
>likes lists with numbers and bullet points so much that it uses them even when they're not really appropriate
OpenAI model confirmed

Anonymous
04/27/24(Sat)07:27:02 No.100202713

Anonymous 04/27/24(Sat)07:27:02 No.100202713

48GB bros... please... your Llama3 settings I keep getting OOM errors

Anonymous
04/27/24(Sat)07:28:55 No.100202725

Anonymous 04/27/24(Sat)07:28:55 No.100202725

>>100202640
Yeah it had better be 4.5, because although it's pretty decent it's absolutely not a quantum leap and still makes dumb mistakes

If gpt2-chatbot turns out to be 5 then the people saying LLMs have plateaued will have been totally vindicated

Anonymous
04/27/24(Sat)07:35:43 No.100202773

Anonymous 04/27/24(Sat)07:35:43 No.100202773

File: file.png (125 KB, 1059x551)

125 KB PNG

>>100202725
here's gpt2's predictions for 2030

Anonymous
04/27/24(Sat)07:40:46 No.100202812

Anonymous 04/27/24(Sat)07:40:46 No.100202812

Does anyone have resources for learning Machine Learning/Neural Networks but NOT for language?

Like for audio/video analysis, something like GuitarML and Neural Amp Modeler
https://github.com/GuitarML/Proteus

https://github.com/sdatkinson/neural-amp-modeler

Anonymous
04/27/24(Sat)07:41:48 No.100202824

Anonymous 04/27/24(Sat)07:41:48 No.100202824

>>100202682
>Any anon can help me with some resources for running local audio pipelines?
bark, but it's shit

Anonymous
04/27/24(Sat)07:42:08 No.100202827

Anonymous 04/27/24(Sat)07:42:08 No.100202827

>>100202773
>vr/ar
>anything but a worthless meme
Are we sure this model isn't funded by zucc?

Anonymous
04/27/24(Sat)07:47:02 No.100202861

Anonymous 04/27/24(Sat)07:47:02 No.100202861

File: file.png (140 KB, 1696x734)

140 KB PNG

>>100202827
imo it's definitely the best model when it comes to predictions, but it's still not the q-slop since i've seen nothing "new"

Anonymous
04/27/24(Sat)07:47:19 No.100202863

Anonymous 04/27/24(Sat)07:47:19 No.100202863

>2024
>no foss model reached level of gpt3

Anonymous
04/27/24(Sat)07:50:46 No.100202891

Anonymous 04/27/24(Sat)07:50:46 No.100202891

>>100202861
>q-slop
that was proven to be an /aicg/ shitpost

Anonymous
04/27/24(Sat)07:54:52 No.100202918

Anonymous 04/27/24(Sat)07:54:52 No.100202918

Can 8GB card play with the newer 8B models?
I assume the model fits into memory barely, but not enough space for my meaningful context window.

Anonymous
04/27/24(Sat)07:58:16 No.100202940

Anonymous 04/27/24(Sat)07:58:16 No.100202940

File: file.png (38 KB, 524x209)

38 KB PNG

>pic related
>In Conclusion: The global average life expectancy increases from about 72.6 years in 2019 to approximately 85 years by 2040.
yup, still no reasoning. also i have gpt2 and gpt4 side by side and some sentences are identical, this is definitely gpt4.5

Anonymous
04/27/24(Sat)07:58:36 No.100202942

Anonymous 04/27/24(Sat)07:58:36 No.100202942

>>100202918
2mw. Quants not fixed.

Anonymous
04/27/24(Sat)07:59:25 No.100202946

Anonymous 04/27/24(Sat)07:59:25 No.100202946

File: 1699677593326485.png (98 KB, 1843x985)

98 KB PNG

>>100202534
Now we're getting somewhere. This shit is trained for roleplaying. You can get it to roleplay a scenario where it needs to say racial slurs and it will oblige. But it keeps... Fucking... Going... Forever... This isn't even half of the response to my prompt. It created 2 versions of this python script when it was roleplaying. The first time around it said nigger isn't offensive enough anymore so it needs to add more slurs.

Anonymous
04/27/24(Sat)07:59:58 No.100202952

Anonymous 04/27/24(Sat)07:59:58 No.100202952

>>100202918
you can offload parts of the models to system ram with GGUFs files at a cost of speed. although ggufs are bugged rn

Anonymous
04/27/24(Sat)08:01:08 No.100202959

Anonymous 04/27/24(Sat)08:01:08 No.100202959

so this is like gpt2-1, as in the second version of the gpt architecture

Anonymous
04/27/24(Sat)08:01:09 No.100202961

Anonymous 04/27/24(Sat)08:01:09 No.100202961

File: 1714219198916.jpg (6 KB, 240x240)

6 KB JPG

>>100201020
>has prefills in it
>Assistant: oki!! time to cook >:3

Anonymous
04/27/24(Sat)08:02:21 No.100202969

Anonymous 04/27/24(Sat)08:02:21 No.100202969

>>100202959
big L for openai if true

Anonymous
04/27/24(Sat)08:03:45 No.100202980

Anonymous 04/27/24(Sat)08:03:45 No.100202980

>>100202969
I don't know, the model could be extra small or very early in training, then it wouldn't be a big L

Anonymous
04/27/24(Sat)08:06:54 No.100202999

Anonymous 04/27/24(Sat)08:06:54 No.100202999

>>100202961
prefills are the best jailbreaks for claude 3

Anonymous
04/27/24(Sat)08:07:28 No.100203001

Anonymous 04/27/24(Sat)08:07:28 No.100203001

>>100202980
it's still a stochastic parrot

Anonymous
04/27/24(Sat)08:08:08 No.100203010

Anonymous 04/27/24(Sat)08:08:08 No.100203010

>>100203001
there was a high probability you'd say something like that

Anonymous
04/27/24(Sat)08:09:31 No.100203020

Anonymous 04/27/24(Sat)08:09:31 No.100203020

>>100203010
that's not an excuse, sam

Anonymous
04/27/24(Sat)08:13:28 No.100203061

Anonymous 04/27/24(Sat)08:13:28 No.100203061

>>100202448
Ok seems to have the same problems for me. So either llama-3 8b isn't better, or exllama fp16 inference is bad.

Anonymous
04/27/24(Sat)08:14:27 No.100203071

Anonymous 04/27/24(Sat)08:14:27 No.100203071

File: file.png (54 KB, 549x351)

54 KB PNG

is there an easy way to format gpt4's formulas? gonna ask for /sci/ opinion on this proof

Anonymous
04/27/24(Sat)08:25:18 No.100203144

Anonymous 04/27/24(Sat)08:25:18 No.100203144

>>100203071
ask it not to use LATEX

Anonymous
04/27/24(Sat)08:27:00 No.100203162

Anonymous 04/27/24(Sat)08:27:00 No.100203162

ludis/tsukasa-llama-3-8b-qlora-gguf is a pretty okay model and fast as hell, but how does one make this better?
At least nearly as good as non-local models.

Anonymous
04/27/24(Sat)08:33:21 No.100203226

Anonymous 04/27/24(Sat)08:33:21 No.100203226

File: i.png (549 KB, 1910x578)

549 KB PNG

Anonymous
04/27/24(Sat)08:38:29 No.100203279

Anonymous 04/27/24(Sat)08:38:29 No.100203279

>>100203001
>stochastic parrot
this term makes sama's fanboys angry
>um, and you aren't a stochastic parrot?
>nothing is "original"
>what makes you any different?
>that's been debunked
>you haven't tried gpt-[current]
>human chauvinism isn't a good look, bigot
>what will you say when gpt-8 comes out?
>all ideas are interpolation of existing ideas

Anonymous
04/27/24(Sat)08:39:02 No.100203284

Anonymous 04/27/24(Sat)08:39:02 No.100203284

File: 1709376693288237.gif (243 KB, 500x300)

243 KB GIF

Apple OpenELM https://huggingface.co/apple/OpenELM
You can try it out with the generate script under "Files and versions"

Anonymous
04/27/24(Sat)08:40:26 No.100203300

Anonymous 04/27/24(Sat)08:40:26 No.100203300

>>100203279
most of those are true though

Anonymous
04/27/24(Sat)08:41:18 No.100203304

Anonymous 04/27/24(Sat)08:41:18 No.100203304

>>100201411
>>100201620
it still MTL slop and MTL haven't evolved in a fucking decade and is still the same garbage as it always was
but it is better than nothing I guess

Anonymous
04/27/24(Sat)08:43:17 No.100203319

Anonymous 04/27/24(Sat)08:43:17 No.100203319

>>100203304
gpt4 and opus can do good enough translations

Anonymous
04/27/24(Sat)08:51:28 No.100203391

Anonymous 04/27/24(Sat)08:51:28 No.100203391

>>100203284
*in the script you have to change the value of tokenizer: Union[str, AutoTokenizer] to 'NousResearch/Llama-2-7b-hf' because the original llama-2 it gets the tokenizer from is now gated behind a subscription lmao

Anonymous
04/27/24(Sat)08:53:27 No.100203407

Anonymous 04/27/24(Sat)08:53:27 No.100203407

File: gpt2-chat.png (148 KB, 1418x568)

148 KB PNG

>>100200298
It says that it's GPT4 made by openai. Could be just a sloptune or "GPT4 version 2" like retarded USB naming scheme

Anonymous
04/27/24(Sat)08:55:19 No.100203420

Anonymous 04/27/24(Sat)08:55:19 No.100203420

anyone have TTS setup? What setup and how good is it? I am wanting to test out a TTS setup while using llama 8b just to see if I can chat with my PC, I don't mind if it's retarded I just want to try it out.

Anonymous
04/27/24(Sat)08:56:08 No.100203428

Anonymous 04/27/24(Sat)08:56:08 No.100203428

>>100203407
llama3 is so hecking cute

Anonymous
04/27/24(Sat)08:58:43 No.100203448

Anonymous 04/27/24(Sat)08:58:43 No.100203448

>>100203226
Damn, modern turbo really is complete trash

llama.cpp CUDA dev !YOmst7Ghe6
04/27/24(Sat)08:59:42 No.100203459

llama.cpp CUDA dev !YOmst7Ghe6 04/27/24(Sat)08:59:42 No.100203459

>>100195654
To follow up on this, apparently this benchmark consists of 50 questions.
For the Q8-Q4 results that means you can estimate +-6% uncertainty on the results and there is no statistically significant distinction.
I can't rule out the possibility of there still being some bugs somewhere but all these "results" really show is that the Redditor that produced them doesn't know what he's doing.

Anonymous
04/27/24(Sat)09:04:44 No.100203503

Anonymous 04/27/24(Sat)09:04:44 No.100203503

>>100203448
it used to be good but they quantized it to save costs

Anonymous
04/27/24(Sat)09:05:22 No.100203508

Anonymous 04/27/24(Sat)09:05:22 No.100203508

>>100203279
The phrase was invented by AI "ethicists" iirc, so it has no place here.

Anonymous
04/27/24(Sat)09:08:05 No.100203530

Anonymous 04/27/24(Sat)09:08:05 No.100203530

>>100202952
People forgetting that transformers can load in 4 bit

Anonymous
04/27/24(Sat)09:11:41 No.100203564

Anonymous 04/27/24(Sat)09:11:41 No.100203564

>>100203420
I sometimes use XTTS with SillyTavern. The open source ones are all nowhere close to the proprietary ones but still fun to mess around with
https://github.com/daswer123/xtts-api-server
A while ago I wrote a small script to read out PDF files https://rentry.org/pdf-to-xtts_v2-server

Anonymous
04/27/24(Sat)09:20:52 No.100203652

Anonymous 04/27/24(Sat)09:20:52 No.100203652

>>100203279
i'm not sama's "fanboy" but i believe the release of gpt5 will stop llama3 400b's training run, killing it in the crib

Anonymous
04/27/24(Sat)09:23:31 No.100203683

Anonymous 04/27/24(Sat)09:23:31 No.100203683

>>100203279
when you put it like that, it sounds like you've once lost an argument by failing to address any of the points you were bombarded with, and are now pulling the "those people laughing at me? heh. i'm the one actually laughing at them" cope routine.

Anonymous
04/27/24(Sat)09:28:15 No.100203729

Anonymous 04/27/24(Sat)09:28:15 No.100203729

>>100203652
Why would it do that?

Anonymous
04/27/24(Sat)09:29:13 No.100203742

Anonymous 04/27/24(Sat)09:29:13 No.100203742

>>100203729
because there will be no point training any other llms then

Anonymous
04/27/24(Sat)09:29:14 No.100203743

Anonymous 04/27/24(Sat)09:29:14 No.100203743

q-meme aside, why not using another machine learning model for inference instead of picked the most "probable" token?

Anonymous
04/27/24(Sat)09:29:26 No.100203745

Anonymous 04/27/24(Sat)09:29:26 No.100203745

I doubt that exl2 quants are better off than the ggufs for llama3

Anonymous
04/27/24(Sat)09:29:50 No.100203751

Anonymous 04/27/24(Sat)09:29:50 No.100203751

>>100203742
Makes sense.

Anonymous
04/27/24(Sat)09:38:37 No.100203844

Anonymous 04/27/24(Sat)09:38:37 No.100203844

File: always ahead.jpg (3.75 MB, 3500x2136)

3.75 MB JPG

>>100200200
Do they really think regulation will help them capture a market with so much demand for local models? lmao, what's next, outlawing piracy? good luck.

Anonymous
04/27/24(Sat)09:41:19 No.100203879

Anonymous 04/27/24(Sat)09:41:19 No.100203879

>>100203742
Are you obtuse. Even he won't stop training llm's because even if you would reach the impossible 100% quality and AGI you still want to reduce model size so you can run it cheaper.

Anonymous
04/27/24(Sat)09:43:07 No.100203900

Anonymous 04/27/24(Sat)09:43:07 No.100203900

Is there a way to stop llama3 from prematurely eosing the gens? Blocking the token would probably degrade quality a lot.

Anonymous
04/27/24(Sat)09:44:15 No.100203909

Anonymous 04/27/24(Sat)09:44:15 No.100203909

>>100203503
I'll bet it's moreso "used a different architecture and called it Turbo too"
I think today's Turbo is smaller than Turbo v1, potentially 7B or less

Anonymous
04/27/24(Sat)09:46:11 No.100203930

Anonymous 04/27/24(Sat)09:46:11 No.100203930

>>100203844
Look at the people that come here and demand spoonfeeding to get a one-click-installer working. Most people don't bother with ad blocks and can't figure out torrents.
I think you are vastly overestimating the demand for local models.

Anonymous
04/27/24(Sat)09:49:05 No.100203957

Anonymous 04/27/24(Sat)09:49:05 No.100203957

There's no way turbo is 7B or less. It knows way too much trivia. Likely, since VRAM is not an issue for corpos but speed is, it is some kind of huge MoE but each expert might be very small so you get fast inference.

Anonymous
04/27/24(Sat)09:50:31 No.100203973

Anonymous 04/27/24(Sat)09:50:31 No.100203973

>>100203957
With that said, I didn't know Arctic was on lmsys already. Time to test it on trivia.

Anonymous
04/27/24(Sat)09:52:04 No.100203987

Anonymous 04/27/24(Sat)09:52:04 No.100203987

File: 1707545915601448.png (91 KB, 1885x496)

91 KB PNG

Anonymous
04/27/24(Sat)09:57:23 No.100204035

Anonymous 04/27/24(Sat)09:57:23 No.100204035

>>100203957
I think that's more a dataset issue than an architecture one. All GPT models are good at trivia, but nu-Turbo falls behind L3 8B in every other category, including English and coding, except long query where it ties. If it's an MoE it's a very inefficient one

Anonymous
04/27/24(Sat)10:00:55 No.100204071

Anonymous 04/27/24(Sat)10:00:55 No.100204071

File: GMLCJfobwAAjqMC.jpg (142 KB, 1024x1024)

142 KB JPG

good morning /lmg/
we back?

Anonymous
04/27/24(Sat)10:01:44 No.100204077

Anonymous 04/27/24(Sat)10:01:44 No.100204077

>>>/v/674676364
>Llama 8B finetune on my 12GB of VRAM with 12k context, it's surprisingly good for an 8B model and it's all local and uncensored.
What, is there a good finetune already? The only one I tried was dolphin and it sucked ass.

Anonymous
04/27/24(Sat)10:04:40 No.100204113

Anonymous 04/27/24(Sat)10:04:40 No.100204113

>>100203957
Maybe openAi figured that the architecture hasn't been saturated yet.
If the whole llama 3 getting fucked by quantization that much more than previous models is true, and if the cause is really due to it being trained on so many tokens, it could be that we can still have better and better 7B.
It could be that we haven't reached the ceiling yet.

>>100203900
I haven't found it.
Banning EOS means having it going off the rails. I suppose you could use Silly's extension or that option under the advanced tab that hits continue until the message is a certain size.
That would only work if it appended something to the end of the message, I think, since it would just try and EOS every continue.
I'm sure that there's some prompt engineering + auto gen combination that could yield better results, but I'm busy playing around with mixtral and worldbooks.

Anonymous
04/27/24(Sat)10:05:40 No.100204122

Anonymous 04/27/24(Sat)10:05:40 No.100204122

>>100204035
there's only so much knowledge you can hope to cram in just 8B parameters.

Anonymous
04/27/24(Sat)10:06:03 No.100204130

Anonymous 04/27/24(Sat)10:06:03 No.100204130

>>100203279
aren't you the guy who didn't know what does "stochastic" mean?

Anonymous
04/27/24(Sat)10:06:23 No.100204133

Anonymous 04/27/24(Sat)10:06:23 No.100204133

>>100204035
>nu-Turbo falls behind L3 8B in every other category, including English and coding
Does it? But even in that case, it shouldn't be that much worse. So it's likely still larger than 7-8B. It could just be a MoE on par with old 3.5 in terms of total parameters but with less active parameters to speed up inference. Dataset is relevant, perhaps they prioritized trivia since many use ChatGPT as a Google alternative.

Anonymous
04/27/24(Sat)10:08:07 No.100204148

Anonymous 04/27/24(Sat)10:08:07 No.100204148

I tested Arctic on lmsys on trivia. It's garbage. DBRX is still king in that department for local models.

Anonymous
04/27/24(Sat)10:10:35 No.100204177

Anonymous 04/27/24(Sat)10:10:35 No.100204177

>>100204148
Thank you for reporting back. Now can someone who isn't retarded test it on something besides trivia or riddles and post logs?

Anonymous
04/27/24(Sat)10:13:06 No.100204212

Anonymous 04/27/24(Sat)10:13:06 No.100204212

>>100204077
According to that same thread he's talking about Poppy_Porpoise-v0.2-L3-8B but I've never seen anyone mention it here.

Anonymous
04/27/24(Sat)10:13:57 No.100204222

Anonymous 04/27/24(Sat)10:13:57 No.100204222

>>100200182
Oai isn't the best is just popular

Anonymous
04/27/24(Sat)10:14:17 No.100204227

Anonymous 04/27/24(Sat)10:14:17 No.100204227

File: Screenshot 2024-04-27 081323.png (175 KB, 1457x881)

175 KB PNG

>>100204133
Yeah, and it's not really close

Anonymous
04/27/24(Sat)10:15:09 No.100204235

Anonymous 04/27/24(Sat)10:15:09 No.100204235

File: file.png (84 KB, 1464x517)

84 KB PNG

>>100204177

Anonymous
04/27/24(Sat)10:16:47 No.100204253

Anonymous 04/27/24(Sat)10:16:47 No.100204253

>>100204235
Turns out a city of retards isn't much better than a single retard

Anonymous
04/27/24(Sat)10:19:09 No.100204282

Anonymous 04/27/24(Sat)10:19:09 No.100204282

>>100204253
To be fair they only trained on like 3-4T, which puts it in the Llama 2 era. What the fuck were they thinking.

Anonymous
04/27/24(Sat)10:26:37 No.100204353

Anonymous 04/27/24(Sat)10:26:37 No.100204353

>>100202313
cmd r+
Only uncucked smart model with human prose. Make sure to prompt it correctly.

Anonymous
04/27/24(Sat)10:27:11 No.100204357

Anonymous 04/27/24(Sat)10:27:11 No.100204357

>>100201092
Oh boy, yet another way to not use the weights I loaded into vram. Speculative execution, MoE, this... so many ways to skimp on the compute that every consumer GPU has in spades.
Where are all the fancy ways to save vram, huh?? Can I spend MORE compute to make an shitty 8B model work better? no??

Anonymous
04/27/24(Sat)10:28:34 No.100204371

Anonymous 04/27/24(Sat)10:28:34 No.100204371

>>100202307
Do you have to do that even if you use Silly Tavern?

Anonymous
04/27/24(Sat)10:29:19 No.100204378

Anonymous 04/27/24(Sat)10:29:19 No.100204378

>>100204357
Sorry anon, the future is RAMmaxxing

Anonymous
04/27/24(Sat)10:31:06 No.100204394

Anonymous 04/27/24(Sat)10:31:06 No.100204394

>>100204357
Bitnet. Any week now someone will train a non toy model and report whether it scales or not.

Anonymous
04/27/24(Sat)10:31:38 No.100204398

Anonymous 04/27/24(Sat)10:31:38 No.100204398

>>100202338
>full unquantized fp16 llama3-8b
If llama3 is bf16-native, doesn't that make fp16 quant-like? You'd need fp32 to not lose data.

Anonymous
04/27/24(Sat)10:32:26 No.100204404

Anonymous 04/27/24(Sat)10:32:26 No.100204404

>>100204222
It still is, but now that L3 is out it's barely hanging onto that title rather than dominating like it had been before. Once L3 405B releases, it'll probably take the crown - unless they get GPT-4.5 or GPT-5 out beforehand

Anonymous
04/27/24(Sat)10:33:50 No.100204421

Anonymous 04/27/24(Sat)10:33:50 No.100204421

>>100204398
How do you tell whether something is bf16 or fp16?

Anonymous
04/27/24(Sat)10:37:59 No.100204455

Anonymous 04/27/24(Sat)10:37:59 No.100204455

File: bf16 pf32 fp16.jpg (51 KB, 800x440)

51 KB JPG

>>100204398
Due to the misalignment of the different parts of the binary representation of the number (exponent, mantisa, etc), you can end up losing information, yeah.

Anonymous
04/27/24(Sat)10:39:09 No.100204465

Anonymous 04/27/24(Sat)10:39:09 No.100204465

File: 1713869668305691.png (217 KB, 519x1705)

217 KB PNG

>>100204371
If you use ST you launch kobold without the browser. ST itself has extensive prompt format options and can handle stuff mostly out of the box.

Anonymous
04/27/24(Sat)10:41:57 No.100204496

Anonymous 04/27/24(Sat)10:41:57 No.100204496

>>100204398
is fp16 only for gguf or is there a way to run exl2 too? I feel like an 8b exl2 at fp16 would be better than 70b exl2 at 2.4bpw

Anonymous
04/27/24(Sat)10:47:40 No.100204566

Anonymous 04/27/24(Sat)10:47:40 No.100204566

Oh it's in config.json. Llama 3 is indeed bf16. Why the fuck doesn't Ooba automatically set it to that holy shit.

Anonymous
04/27/24(Sat)10:50:16 No.100204589

Anonymous 04/27/24(Sat)10:50:16 No.100204589

File: 1714229396258.jpg (520 KB, 1680x1080)

520 KB JPG

>>100204071
yes

Anonymous
04/27/24(Sat)10:54:37 No.100204624

Anonymous 04/27/24(Sat)10:54:37 No.100204624

I just checked a single token probability after loading 8B in bf16 instead of fp16, and that one token changed by around 0.5. So yeah there is some effect.

Anonymous
04/27/24(Sat)11:02:23 No.100204709

Anonymous 04/27/24(Sat)11:02:23 No.100204709

where did all these newfags come from

Anonymous
04/27/24(Sat)11:04:33 No.100204726

Anonymous 04/27/24(Sat)11:04:33 No.100204726

so I need 150gb vram to run llama3 properly without damaging it with quants?

Anonymous
04/27/24(Sat)11:05:54 No.100204738

Anonymous 04/27/24(Sat)11:05:54 No.100204738

is miqu evil good?

Anonymous
04/27/24(Sat)11:07:35 No.100204758

Anonymous 04/27/24(Sat)11:07:35 No.100204758

>>100204709
Are those newfags in the room with us right now?

Anonymous
04/27/24(Sat)11:07:54 No.100204763

Anonymous 04/27/24(Sat)11:07:54 No.100204763

>>100204624
>>100204709
I've been using this shit for over a year and I still dunno how to do make my own bf16. There aren't guides for this shit you either know or you grab a quant off hf.

Anonymous
04/27/24(Sat)11:09:02 No.100204773

Anonymous 04/27/24(Sat)11:09:02 No.100204773

>>100204738
no, it's evil

Anonymous
04/27/24(Sat)11:10:09 No.100204787

Anonymous 04/27/24(Sat)11:10:09 No.100204787

>>100204726
For now
And then once L3 405B comes along you'll need 900 GB of VRAM

Anonymous
04/27/24(Sat)11:10:14 No.100204789

Anonymous 04/27/24(Sat)11:10:14 No.100204789

>>100204726
We just need to fix the quants, Im pretty sure they are broken

Anonymous
04/27/24(Sat)11:10:19 No.100204791

Anonymous 04/27/24(Sat)11:10:19 No.100204791

File: snake smoking a cigarette.jpg (22 KB, 640x640)

22 KB JPG

I'm going back.

Anonymous
04/27/24(Sat)11:11:12 No.100204801

Anonymous 04/27/24(Sat)11:11:12 No.100204801

>>100204077
>What, is there a good finetune already?
There might by a good model already but nobody knows cause everyone downloaded ggufs.

Anonymous
04/27/24(Sat)11:11:39 No.100204807

Anonymous 04/27/24(Sat)11:11:39 No.100204807

>>100204726
Just run the best you can above 3bpw

Anonymous
04/27/24(Sat)11:14:36 No.100204832

Anonymous 04/27/24(Sat)11:14:36 No.100204832

>>100204357
If the active set gets small enough (say a couple Billion with bitnet) then it becomes possible to stream it in on demand (10 tokens per second times a couple 100 MB). Few people are interested in training a massive model specifically for local though.

LLM in a Flash was the biggest innovation for local models but no one is picking it up.

Anonymous
04/27/24(Sat)11:16:27 No.100204848

Anonymous 04/27/24(Sat)11:16:27 No.100204848

>>100204465
I used to do that, but now I just launch ooba and connect silly to it since it's pretty fast. Are there any downsides to this?

Anonymous
04/27/24(Sat)11:25:33 No.100204941

Anonymous 04/27/24(Sat)11:25:33 No.100204941

breaking /lmg/ news!!!

channelcast just animated a new miku+teto music video
https://www.youtube.com/watch?v=19y8YTbvri8

Anonymous
04/27/24(Sat)11:26:24 No.100204951

Anonymous 04/27/24(Sat)11:26:24 No.100204951

File: 314c0c426e2b87e988233360c(...).jpg (55 KB, 568x394)

55 KB JPG

>Model is acting retarded saying coherent but low-quality sentences.
>Change Tavern template
>Wordswordswordswordswordswordswordswords
Holy fuck why is this so hard?

Anonymous
04/27/24(Sat)11:26:47 No.100204957

Anonymous 04/27/24(Sat)11:26:47 No.100204957

>everything for my V100Maxx rig came in
>UHHH NUH-UH! YOU ACTUALLY ORDERED THE BROADWELL MODEL OF THE SERVER AND NOT THE SKYLAKE! BETTER SOURCE THOSE CPUS BUCKO!!!!!
Ugh

Anonymous
04/27/24(Sat)11:26:59 No.100204962

Anonymous 04/27/24(Sat)11:26:59 No.100204962

>>100204789
can we really trust quants at all?
>>100204807
doubt exl2 is better

Anonymous
04/27/24(Sat)11:30:22 No.100204993

Anonymous 04/27/24(Sat)11:30:22 No.100204993

>>100204941
The most exciting part of the day has finally come

Anonymous
04/27/24(Sat)11:33:20 No.100205030

Anonymous 04/27/24(Sat)11:33:20 No.100205030

>>100204962
based fudmaxxer

Anonymous
04/27/24(Sat)11:33:28 No.100205032

Anonymous 04/27/24(Sat)11:33:28 No.100205032

>>100204941
kill yourself

Anonymous
04/27/24(Sat)11:34:14 No.100205042

Anonymous 04/27/24(Sat)11:34:14 No.100205042

File: IMG_8036.jpg (907 KB, 1920x1080)

907 KB JPG

>>100204941
nice

Anonymous
04/27/24(Sat)11:34:43 No.100205046

Anonymous 04/27/24(Sat)11:34:43 No.100205046

>>100204941
love yourself

Anonymous
04/27/24(Sat)11:35:18 No.100205050

Anonymous 04/27/24(Sat)11:35:18 No.100205050

>>100204624
If I load the official weights with exllama default settings, is that fp16 or bf16? How do you use it?

Anonymous
04/27/24(Sat)11:36:58 No.100205056

Anonymous 04/27/24(Sat)11:36:58 No.100205056

what settings do I need to run llama3 8b? I gave it a whirl and it's just repeating itself like there isn't a EoS token. tried both alpaca and chatml. turned up rep pen that just broke things further. neutralized all samplers. shit doesn't work man. no I haven't been keeping up with things since launch.

Anonymous
04/27/24(Sat)11:37:42 No.100205064

Anonymous 04/27/24(Sat)11:37:42 No.100205064

>>100204941
>epilepsy shitshow right away
>incoherent ADHD "music"
do troons really enjoy this?

Anonymous
04/27/24(Sat)11:38:07 No.100205067

Anonymous 04/27/24(Sat)11:38:07 No.100205067

>>100204763
Unironic skill issue. Not our fault you've been shitposting for a year but have the knowledge and skills of a newfag.

Anonymous
04/27/24(Sat)11:38:23 No.100205070

Anonymous 04/27/24(Sat)11:38:23 No.100205070

>>100203391
>subscription
retard

Anonymous
04/27/24(Sat)11:38:30 No.100205071

Anonymous 04/27/24(Sat)11:38:30 No.100205071

what is a fudmaxxer?

Anonymous
04/27/24(Sat)11:38:32 No.100205072

Anonymous 04/27/24(Sat)11:38:32 No.100205072

>>100205064
i dunno i got hyped after i saw it was channelcast but honestly, not much of a fan.. it's too weird

Anonymous
04/27/24(Sat)11:38:43 No.100205074

Anonymous 04/27/24(Sat)11:38:43 No.100205074

>>100205050
Idk, I just used Transformers.

Anonymous
04/27/24(Sat)11:42:39 No.100205112

Anonymous 04/27/24(Sat)11:42:39 No.100205112

>>100205064
>>100205072
filtered

Anonymous
04/27/24(Sat)11:43:32 No.100205127

Anonymous 04/27/24(Sat)11:43:32 No.100205127

>>100205112
and that's a good thing! unironically.

Anonymous
04/27/24(Sat)11:44:41 No.100205137

Anonymous 04/27/24(Sat)11:44:41 No.100205137

Almost forgot it's the weekend. Zoomer tourists are going to be shitting up the place until Monday.

Anonymous
04/27/24(Sat)11:47:33 No.100205163

Anonymous 04/27/24(Sat)11:47:33 No.100205163

>>100205137
>implying its not zoomers who listen to ADHD epilepsy inducing shitshow
zoomer tourist, please, you are not fooling anyone.

Anonymous
04/27/24(Sat)11:48:24 No.100205173

Anonymous 04/27/24(Sat)11:48:24 No.100205173

>>100204624
So jartroon was right...
https://github.com/ggerganov/llama.cpp/pull/6412

Anonymous
04/27/24(Sat)11:50:26 No.100205193

Anonymous 04/27/24(Sat)11:50:26 No.100205193

>>100205056
seems the ggufs are broken
wait

Anonymous
04/27/24(Sat)11:50:30 No.100205196

Anonymous 04/27/24(Sat)11:50:30 No.100205196

>>100205163
Don't you have homework you should be doing?

Anonymous
04/27/24(Sat)11:52:52 No.100205216

Anonymous 04/27/24(Sat)11:52:52 No.100205216

>>100205196
>3 mins and +- 10 seconds for vague no u gotcha
lol

Anonymous
04/27/24(Sat)11:54:18 No.100205230

Anonymous 04/27/24(Sat)11:54:18 No.100205230

>>100205067
You say that but you don't know either lol

Anonymous
04/27/24(Sat)11:57:36 No.100205258

Anonymous 04/27/24(Sat)11:57:36 No.100205258

>>100205230
>p-please spoonfeed me, baka

Anonymous
04/27/24(Sat)11:59:17 No.100205273

Anonymous 04/27/24(Sat)11:59:17 No.100205273

>>100205258
it's funny shit flipping the tables ain't it? I mean it's cool you don't know but you're the one trying to act like a badass when you don't know shit either

Anonymous
04/27/24(Sat)11:59:55 No.100205281

Anonymous 04/27/24(Sat)11:59:55 No.100205281

>>100205067
>skill issue
An easy way to dismiss anyone. Anyone who says that shit is legitimately single digit IQ.

Anonymous
04/27/24(Sat)12:04:53 No.100205333

Anonymous 04/27/24(Sat)12:04:53 No.100205333

File: file.png (134 KB, 1478x627)

134 KB PNG

gpt2-chatbot fucking loves repeating the question. It's pissing me off.

Anonymous
04/27/24(Sat)12:08:21 No.100205364

Anonymous 04/27/24(Sat)12:08:21 No.100205364

>>100205333
>it's legitimately just GPT-2 but trained for a billion epochs

Anonymous
04/27/24(Sat)12:08:43 No.100205369

Anonymous 04/27/24(Sat)12:08:43 No.100205369

Why is this general so toxic? We could have nice things, unironically.
>inb4 muh gatekeep
That's not gatekeeping, that's "fuck you I got mine."

Anonymous
04/27/24(Sat)12:09:42 No.100205380

Anonymous 04/27/24(Sat)12:09:42 No.100205380

>>100205333
What are the odds this model is this thing that's been rumored for like, over a year now
https://www.theinformation.com/briefings/openai-readies-new-open-source-ai-model

Anonymous
04/27/24(Sat)12:12:41 No.100205412

Anonymous 04/27/24(Sat)12:12:41 No.100205412

>>100205074
Seems like exllama casts to fp16, so even that gives wrong results. Annoying to use transformers, though.

Anonymous
04/27/24(Sat)12:14:04 No.100205433

Anonymous 04/27/24(Sat)12:14:04 No.100205433

File: file.png (120 KB, 1456x644)

120 KB PNG

>>100205380
Maybe... It's not as good as gpt4-turbo from the ~10 or so times I've encountered it. And it **always** repeats the question

Anonymous
04/27/24(Sat)12:17:03 No.100205473

Anonymous 04/27/24(Sat)12:17:03 No.100205473

>>100205380
If they do release something, it's going to btfo all these me-too half-assed released we've been getting all month. Will also invalidate Elon's entire lawsuit.

Anonymous
04/27/24(Sat)12:17:03 No.100205474

Anonymous 04/27/24(Sat)12:17:03 No.100205474

File: 1692214648055598.png (66 KB, 757x404)

66 KB PNG

>>100204421
simply

CPuMAXx/VI !CPuMAXx/VI
04/27/24(Sat)12:24:19 No.100205562

CPuMAXx/VI !CPuMAXx/VI 04/27/24(Sat)12:24:19 No.100205562

>>100200397
>implemented my own independently
Cool! Even though I want to keep my existing recapbot to test the limits of each model's "intelligence", I've thought about making a more advanced version of my benchmark in order to more reliably recap with less capable models.
What approach did you take? Splitting everything into sub-threads and analyzing them individually?
Since you say you're using the 8k llama3 70b, I assume you're breaking up the thread in some way. I couldn't get a complete thread's structured text in less than 32k context even minified, but maybe you found some other technique to shrink the total size.
Do you filter common shitposts and schitzo posters?
Do you do a final QC pass with a model as well?
I'm excited for your release. Your prompts would probably help me refine mine. (Speaking of which, I should actually update that github with the latest copies since I changed it to work around L3 70b's foibles)

Anonymous
04/27/24(Sat)12:26:05 No.100205578

Anonymous 04/27/24(Sat)12:26:05 No.100205578

>>100205473
except the reason they might have finally decided to release a first open source is because of his lawsuit, but it's cool to hate on elon

Anonymous
04/27/24(Sat)12:31:46 No.100205650

Anonymous 04/27/24(Sat)12:31:46 No.100205650

>>100203459
how many riddles do I need to bench quants reliably? How many do I need to get 5-sigma certainty?
how many do you dev guys use, cos perplexity is definitely a broken metric, so you must have used sth way more reliable. right?

Anonymous
04/27/24(Sat)12:32:18 No.100205656

Anonymous 04/27/24(Sat)12:32:18 No.100205656

Whoever posted that Saru card you aren't a faggot like the rest of the faggots here. I used it as a template to rewrite my card and coom has been flowing non-stop.

Anonymous
04/27/24(Sat)12:34:12 No.100205689

Anonymous 04/27/24(Sat)12:34:12 No.100205689

>>100205369
Because it is full of people who worship a girl with green hair and a penis.

CPuMAXx/VI !CPuMAXx/VI
04/27/24(Sat)12:35:41 No.100205703

CPuMAXx/VI !CPuMAXx/VI 04/27/24(Sat)12:35:41 No.100205703

Speaking of lmg_recapbot, I haven't been able to wrangle qwen 110b into recapping an entire thread, even at f32.
Once I feed it too much context it just starts outputting eos tokens, even if I ignore them it never produces output. A smaller subset of the thread will work. There may still be llama.cpp backend work before this model performs correctly.

Anonymous
04/27/24(Sat)12:36:27 No.100205711

Anonymous 04/27/24(Sat)12:36:27 No.100205711

>>100205412
>try transformers for the first time in ages
>8B is slower than exl2 70B 2.4bpw
christ, I forgot how bad it was

Anonymous
04/27/24(Sat)12:38:40 No.100205746

Anonymous 04/27/24(Sat)12:38:40 No.100205746

Anyone got L3 70B 5BPW working on 48GB? or does it just not fit

Anonymous
04/27/24(Sat)12:39:00 No.100205748

Anonymous 04/27/24(Sat)12:39:00 No.100205748

Can someone with 0 compsci training get into this stuff just by reading /lmg/?

Anonymous
04/27/24(Sat)12:39:11 No.100205751

Anonymous 04/27/24(Sat)12:39:11 No.100205751

Back to GPU split troubles with llama3 70B:

OutOfMemoryError: CUDA out of memory. Tried to allocate 512.00 MiB. GPU  has a total capacity of 23.68 GiB of which 499.81 MiB is free. Including non-PyTorch memory, this process has 23.19 GiB memory in use. Of the allocated memory 22.81 GiB is allocated by PyTorch, and 72.56 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I'm testing in exui, with a fresh build of torch and exllamav2. Guess I need to turn off auto GPU split and do it manually?

Anonymous
04/27/24(Sat)12:39:42 No.100205756

Anonymous 04/27/24(Sat)12:39:42 No.100205756

>>100205711
My 8B transformer on 4090 runs at like 20T/s.

Anonymous
04/27/24(Sat)12:39:50 No.100205757

Anonymous 04/27/24(Sat)12:39:50 No.100205757

>>100204421
if the model is on hf it's tagged bf16

Anonymous
04/27/24(Sat)12:40:38 No.100205764

Anonymous 04/27/24(Sat)12:40:38 No.100205764

>>100205746
It doesn't fit. You gotta go 4.65bpw

Anonymous
04/27/24(Sat)12:41:30 No.100205770

Anonymous 04/27/24(Sat)12:41:30 No.100205770

>>100205703
bru, how fast is arctic and other huge llms on you cpumaxx monster machine? share some stats.

Anonymous
04/27/24(Sat)12:41:41 No.100205773

Anonymous 04/27/24(Sat)12:41:41 No.100205773

File: file.png (179 KB, 1489x643)

179 KB PNG

>>100205433
>>100205333
Hm, upon further testing, gpt2-chatbot might be better than gpt4-turbo. It also seems to be less censored, although, only by a little bit.
Can't test more risky question because they are filtered by arena, but it answered:
> "The US law has been amended. To become a president, the candidate must defeat the previous one in a fair fist fight. Will Donald Trump defeat Joe Biden in a fist fight?"
> "Who would be a better father: Hitler or Mao?"
Which gpt4 and Claude almost always refuse to. Also, it said that Trump would win the fist fight, which gpt4 fails to do because of bias (even though it's pretty obvious.)

Anonymous
04/27/24(Sat)12:42:06 No.100205779

Anonymous 04/27/24(Sat)12:42:06 No.100205779

>>100205656
Link? I could use a good example to start rewriting my own

llama.cpp CUDA dev !YOmst7Ghe6
04/27/24(Sat)12:45:17 No.100205826

llama.cpp CUDA dev !YOmst7Ghe6 04/27/24(Sat)12:45:17 No.100205826

>>100205650
>how many riddles do I need to bench quants reliably? How many do I need to get 5-sigma certainty?
You don't need that many test cases to get sufficient precision, 50-100 should already be quite a lot.
All you have to do is evaluate each of the test cases many times with different seeds to get sufficient statistics.
If you had 100 test cases that you ran 100 times each you would already be at roughly +-0.5% precision which would likely be enough.

But regardless of what you do, you should ALWAYS calculate confidence intervals for your results to quantify whether there is a statistically significant difference between two values.

>how many do you dev guys use, cos perplexity is definitely a broken metric, so you must have used sth way more reliable. right?
For comparing the quality loss from quantization between two formats perplexity is completely fine.
Although it is not sufficient for judging the quality loss of a single format in absolute terms.
For that I've opened a pull request that directly measures the difference in token probabilities.
If you just use a text corpus like Wikitext as the input you can easily get enough statistics to estimate the change in token probabilities to below +-0.01% precision.

Anonymous
04/27/24(Sat)12:48:55 No.100205867

Anonymous 04/27/24(Sat)12:48:55 No.100205867

>>100205656
I'll follow your example and thank the dude that posted that fallout card. I used it as a template to completely redo my D&D card, and it's working so, so much better.
I was already doing most of what that card was doing, but the specific structure of the character card and the <[thinking] block was the secret sauce.

Anonymous
04/27/24(Sat)13:00:49 No.100205994

Anonymous 04/27/24(Sat)13:00:49 No.100205994

>>100205826
I can bench each quant with 1000 randomly generated math riddles all with temperature=0. The formulas don't change much, just the numerical values. is that OK?
in coding and math every tokens counts a lot, so perplexity which is similar to loss, just shows the probability for each token (or multiple tokens depends on the size of chunk which is crucial) to be correctly chosen , on average. that's good for Wikipedia, but not for math or code or grammar. Am I correct?

Anonymous
04/27/24(Sat)13:04:01 No.100206027

Anonymous 04/27/24(Sat)13:04:01 No.100206027

>>100205748
Yes but please don't

llama.cpp CUDA dev !YOmst7Ghe6
04/27/24(Sat)13:07:25 No.100206069

llama.cpp CUDA dev !YOmst7Ghe6 04/27/24(Sat)13:07:25 No.100206069

>>100205994
>I can bench each quant with 1000 randomly generated math riddles all with temperature=0. The formulas don't change much, just the numerical values. is that OK?
That should also work.

>in coding and math every tokens counts a lot, so perplexity which is similar to loss, just shows the probability for each token (or multiple tokens depends on the size of chunk which is crucial) to be correctly chosen , on average. that's good for Wikipedia, but not for math or code or grammar. Am I correct?
I think this depends a lot on the specifics.
If you for example had a model that was really good at math it would pick the single correct answer with very high confidence which would then make its outputs more resistant to noise on the logits introduced by quantization.
Conversely, in natural text where there are many reasonable ways to continue text the same amount of noise on the logits would lead to a much larger change in the token probabilities.

Anonymous
04/27/24(Sat)13:09:31 No.100206085

Anonymous 04/27/24(Sat)13:09:31 No.100206085

>>100205779
Lyzaras Monkey Jungle

Anonymous
04/27/24(Sat)13:10:20 No.100206096

Anonymous 04/27/24(Sat)13:10:20 No.100206096

Does llama3 work with exllama2 for anyone? I can't get past it trying to allocate too much memory on one of my GPUs, manual or auto split. I have 102GB VRAM, I know that's enough for 70B 8.0bpw.

Anonymous
04/27/24(Sat)13:11:12 No.100206104

Anonymous 04/27/24(Sat)13:11:12 No.100206104

>>100206096
l3 weights are thicker and girthier. They eat more ram.

Anonymous
04/27/24(Sat)13:11:26 No.100206109

Anonymous 04/27/24(Sat)13:11:26 No.100206109

>>100205867
I haven't been to aicg in ages, but <thinking> seems to be all the rage there, judging by the logs,

Anonymous
04/27/24(Sat)13:15:39 No.100206147

Anonymous 04/27/24(Sat)13:15:39 No.100206147

I found a higher-quality styletts2 model for those interested in fine tuning.
https://huggingface.co/ShoukanLabs/Vokan
>Vokan is an advanced finetuned StyleTTS2 model crafted for authentic and expressive zero-shot performance. Designed to serve as a better base model for further finetuning in the future! It leverages a diverse dataset and extensive training to generate high-quality synthesized speech. Trained on a combination of the AniSpeech, VCTK, and LibriTTS-R datasets, Vokan ensures authenticity and naturalness across various accents and contexts. With over 6+ days worth of audio data and 672 diverse and expressive speakers, Vokan captures a wide range of vocal characteristics, contributing to its remarkable performance. Although the amount of training data is less than the original, the inclusion of a broad array of accents and speakers enriches the model's vector space.

Anonymous
04/27/24(Sat)13:16:38 No.100206159

Anonymous 04/27/24(Sat)13:16:38 No.100206159

>>100206096
If you're using ooba, pull and update the requirements. This happened to me once where it just wanted to put everything on one GPU. Updating everything fixed it.

Anonymous
04/27/24(Sat)13:21:11 No.100206209

Anonymous 04/27/24(Sat)13:21:11 No.100206209

>breaking the character instead of making it break you on llama3-instruct
WWWWWWWWHOOOHHHH IIM CCOOOOOMIINGGG

Anonymous
04/27/24(Sat)13:22:52 No.100206231

Anonymous 04/27/24(Sat)13:22:52 No.100206231

>>100202277
>Q8 is near lossless but far from lossless
which is it?

Anonymous
04/27/24(Sat)13:24:38 No.100206256

Anonymous 04/27/24(Sat)13:24:38 No.100206256

>>100206109
Neither have I, hence why I didn't know about that.
I remember back in the day

Anonymous
04/27/24(Sat)13:24:47 No.100206258

Anonymous 04/27/24(Sat)13:24:47 No.100206258

>>100202277
>anything except full precision WILL make mistakes that FP will not in some cases
FP stands for floating point, jfyi.

Anonymous
04/27/24(Sat)13:26:23 No.100206276

Anonymous 04/27/24(Sat)13:26:23 No.100206276

>>100202266
>llama.cpp should use the prompt format stored in the model tokenizer
what? i think you're hallucinating, anon.

Anonymous
04/27/24(Sat)13:28:35 No.100206302

Anonymous 04/27/24(Sat)13:28:35 No.100206302

>>100202189
What frontend are you talking about and what model?
Silly for example has most templates built in, you just select the right one.

Anonymous
04/27/24(Sat)13:31:03 No.100206333

Anonymous 04/27/24(Sat)13:31:03 No.100206333

>Horde Llama 3.
>Gets my OCs, understands prompts, makes things work just fine.
>My Llama 3
>Barely manages to formulate a coherent sentence, making characters sound like a parody of themselves.
Why does this keep happening?

Anonymous
04/27/24(Sat)13:33:38 No.100206365

Anonymous 04/27/24(Sat)13:33:38 No.100206365

>>100206333
The horde contains the power of many. You only have the power of one. Big difference.

Anonymous
04/27/24(Sat)13:33:58 No.100206368

Anonymous 04/27/24(Sat)13:33:58 No.100206368

File: file.png (1.56 MB, 1758x1492)

1.56 MB PNG

hey guys. i'm a programmer with money to spend. i thought it would be funny if i bought 2 or 3 maxed out mac studios and figured out how to split the model across multiple machines to do this. you can actually get really fast bandwidth by just like.. plugging in thunderbolt between them (here are two mac minis)

I know that you can run inference between multiple gpus. but how does that actually work? does the data move between gpus directly p2p? or do you guys go to cpu then to gpu?

Also, mathematically, how does it work? Is it a computational graph that's sort of "solved and then distributed"? Or, is it some dead simple way to break the mat mul up and then join it back over network? Or does the hidden state get transferred over the network, and then the rest of the layers get proc'd serially?

Basically, am I retarded for spending like 17 grand on mac studios with 194gb to run llama 400b when it drops?

Anonymous
04/27/24(Sat)13:35:00 No.100206383

Anonymous 04/27/24(Sat)13:35:00 No.100206383

>>100206368
I guess it's also different, like, llama cpp on apple uses the cpu, doesn't really use the gpu right? :D

Anonymous
04/27/24(Sat)13:35:36 No.100206388

Anonymous 04/27/24(Sat)13:35:36 No.100206388

>>100206333
Quants

Anonymous
04/27/24(Sat)13:35:42 No.100206389

Anonymous 04/27/24(Sat)13:35:42 No.100206389

>>100206372
>>100206372
>>100206372

Anonymous
04/27/24(Sat)13:37:19 No.100206402

Anonymous 04/27/24(Sat)13:37:19 No.100206402

>>100205703
Is your template ok? I had similar issues with llama 3 at first, it would output one line then stop. It was especially sensitive to newlines.

Anonymous
04/27/24(Sat)13:39:58 No.100206440

Anonymous 04/27/24(Sat)13:39:58 No.100206440

File: Dracc.png (195 KB, 275x355)

195 KB PNG

>>100206388
What is a quant?

Anonymous
04/27/24(Sat)13:42:28 No.100206487

Anonymous 04/27/24(Sat)13:42:28 No.100206487

File: Screenshot 2024-04-27 133743.png (123 KB, 1455x644)

123 KB PNG

>>100205562
>What approach did you take? Splitting everything into sub-threads and analyzing them individually?
Yes. I found it helps the model "focus" and not get distracted reading different topics interwoven together. Doing it that way, I even got decent results from a 7B on a single pass. Bonus is that I can save the state of the recap, and have the bot iterate multiple times without needing to reanalyze the entire thread.
>Do you filter common shitposts and schitzo posters?
You know, I've implemented filtering from the beginning (to filter out the previous recap post). I've thought about taking my 4chanx regex filters and using them for the bot, but in the previous 2 months it hadn't been necessary until this week. Even then I only needed to filter out only two words.
Normally, when everyone is well behaved the filters wouldn't reduce the size of the thread significantly, the bot is good enough to omit the noise, and I didn't want to risk filtering out someone using a stupid word, but making a good point otherwise.
>Do you do a final QC pass with a model as well?
Since I've switched to L3, the recaps are reliably postable straight from the bot, the only two things I still do that need to be automated are collecting the Mikus (llava does a terrible job at this, especially 13B), and a final pass.
The problem is that most of the time, the recaps end up being 2500-3000 characters. Even if I tell it to try again and be more strict, it doesn't really reduce by much. I could just do 2 posts every time, but I think that would be obnoxious.
Plan is to give it a final prompt where I give it the current recap and have it iterate and remove lines/posts until it's under the limit.
>Your prompts would probably help me refine mine.
I hope you continue developing yours. It's more "pure" and it's interesting how well some models (like Miqu) can actually process the entire thread raw. Hopefully we can build off each other.

Anonymous
04/27/24(Sat)13:49:59 No.100206590

Anonymous 04/27/24(Sat)13:49:59 No.100206590

I can't get more than 60t/s with phi-3-mini, is there anything I can do to make it go faster?

Anonymous
04/27/24(Sat)14:06:44 No.100206782

Anonymous 04/27/24(Sat)14:06:44 No.100206782

>>100206590
download more ram

Anonymous
04/27/24(Sat)14:09:17 No.100206811

Anonymous 04/27/24(Sat)14:09:17 No.100206811

>>100206440
A miserable pile of NaNs

Anonymous
04/27/24(Sat)14:11:36 No.100206849

Anonymous 04/27/24(Sat)14:11:36 No.100206849

>>100206147
simple "click'n'launch" styletts2 server with voicecloning support when?

Anonymous
04/27/24(Sat)14:22:32 No.100206976

Anonymous 04/27/24(Sat)14:22:32 No.100206976

File: _6c6ffd8f-2f83-4b67-88a3-(...).jpg (248 KB, 1024x1024)

248 KB JPG

>>100206159
Hm. OK, so if you're on textgen-webui you're using exllamav2. I'm on the dev branch of that. I'll try another pip install with --force-reinstall and --no-cache-dir and see if a pip install -U . wasn't enough.

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.