/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

New anti-spam measures have been applied to all boards.

Please see the Frequently Asked Questions page for details.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 10/30/24(Wed)14:48:55 No.103029905

File: __hatsune_miku_calne_ca_a(...).jpg (541 KB, 850x1133)

541 KB JPG

/lmg/ - Local Models General Anonymous 10/30/24(Wed)14:48:55 No.103029905

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103019207 & >>103008519

►News
>(10/25) GLM-4-Voice: End-to-end speech and text model based on GLM-4-9B: https://hf.co/THUDM/glm-4-voice-9b
>(10/24) Aya Expanse released with 23 supported languages: https://hf.co/CohereForAI/aya-expanse-32b
>(10/22) genmoai-smol allows video inference on 24 GB RAM: https://github.com/victorchall/genmoai-smol
>(10/22) Mochi-1: 10B Asymmetric Diffusion Transformer text-to-video model: https://hf.co/genmo/mochi-1-preview
>(10/22) Pangea: Open-source multilingual multimodal LLM supporting 39 languages: https://neulab.github.io/Pangea

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
10/30/24(Wed)14:49:16 No.103029913

Anonymous 10/30/24(Wed)14:49:16 No.103029913

File: __hatsune_miku_and_chibi_(...).jpg (421 KB, 1000x750)

421 KB JPG

►Recent Highlights from the Previous Thread: >>103019207

--Paper: Variational inference for pile-up removal at hadron colliders with diffusion models:
>103022524 >103028691
--Paper: MDM: A diffusion-based approach for complex reasoning and planning:
>103023314 >103023420 >103023535 >103023554
--Paper: AI models and their reflection of creators' ideologies:
>103026352 >103026443 >103026473 >103026530 >103026538 >103028395
--Papers:
>103022742 >103022846
--INTELLECT-1 project discussion and dataset composition:
>103020360 >103020436 >103020446 >103020454 >103020473 >103020565 >103020682 >103020704 >103020505 >103020589
--Synthetic datasets and training data for language models:
>103025196 >103026728 >103026750 >103026812 >103026894 >103026965 >103027294
--OSI declares AI models must disclose training data to be open source:
>103022896 >103023019 >103023127 >103028442 >103028495 >103028608 >103028650 >103028776 >103023316 >103028277
--Discussion of a new AI companion project using llama.cpp:
>103020193 >103020211 >103020299 >103020351 >103020406 >103020485
--gpt-sovits setup and voice cloning experience:
>103019637 >103020547 >103020758 >103023131 >103023143 >103024356
--MaskGCT open source TTS model announcement:
>103027292 >103028638
--MacBook Pro M4 Max specifications and performance discussion:
>103027383 >103027421 >103027507 >103027516 >103027552 >103027642 >103027851 >103027967 >103028014 >103028061
--Layer Skip release and finetuning requirements:
>103023273 >103023513
--Google DeepMind's research on Recursive Transformers:
>103023403
--Discussion on neuron steering and explanation in AI models:
>103026666 >103026676 >103026829 >103026872
--Miku (free space):
>103020019 >103020069 >103020083 >103020578 >103020750 >103020843 >103021618 >103021933 >103023149 >103024265 >103025233 >103027706 >103027863

►Recent Highlight Posts from the Previous Thread: >>103019213

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
10/30/24(Wed)14:55:19 No.103029990

Anonymous 10/30/24(Wed)14:55:19 No.103029990

>>103029775
Yes, exactly.
It basically spits out in third person and I roleplay in first person.
That somehow works. Hell, it worked with llama 38b fine tunes too, but Nemo is just that much better.

Anonymous
10/30/24(Wed)14:56:23 No.103030007

Anonymous 10/30/24(Wed)14:56:23 No.103030007

File: 1719267688048363.png (39 KB, 574x359)

39 KB PNG

>>103029905
If I buy a new M4 laptop with 128GB memory, what can I run?

Anonymous
10/30/24(Wed)14:59:03 No.103030038

Anonymous 10/30/24(Wed)14:59:03 No.103030038

>>103030007
You need 2 to run 405b

Anonymous
10/30/24(Wed)14:59:09 No.103030040

Anonymous 10/30/24(Wed)14:59:09 No.103030040

>>103030007
Mistral large at Q6?
How does a laptop have enough cooling for that?

Anonymous
10/30/24(Wed)14:59:57 No.103030046

Anonymous 10/30/24(Wed)14:59:57 No.103030046

>>103030007
Nothing.

Anonymous
10/30/24(Wed)15:16:03 No.103030217

Anonymous 10/30/24(Wed)15:16:03 No.103030217

>>103030038
>You need 2 to run 405b
and that's super retarded. I got RPC working between a pair of servers with 2xA40 and the perf wasn't nearly as good as I'd hoped (and managing big context was either too tricky for me or outright broken when crossing card boundaries)

Anonymous
10/30/24(Wed)15:21:21 No.103030296

Anonymous 10/30/24(Wed)15:21:21 No.103030296

>>103030007
depends on what t/s you consider acceptable

Anonymous
10/30/24(Wed)15:25:40 No.103030355

Anonymous 10/30/24(Wed)15:25:40 No.103030355

Mikulove

Anonymous
10/30/24(Wed)15:27:32 No.103030384

Anonymous 10/30/24(Wed)15:27:32 No.103030384

miku flaying

Anonymous
10/30/24(Wed)15:28:37 No.103030391

Anonymous 10/30/24(Wed)15:28:37 No.103030391

>>103030007
good to keep in mind that even the biggest, baddest apple silicon cpu can only manage half the prompt processing speed of a 3090 under ideal circumstances, so ignoring that part of the equation will lead to tears later

Anonymous
10/30/24(Wed)15:29:38 No.103030405

Anonymous 10/30/24(Wed)15:29:38 No.103030405

File: MikuWantsToBeatYouToAPulp.png (987 KB, 1216x832)

987 KB PNG

High impact Mikuviolence

Anonymous
10/30/24(Wed)15:33:18 No.103030443

Anonymous 10/30/24(Wed)15:33:18 No.103030443

>llama4 coming out in a few months
we are so back.assistant

Anonymous
10/30/24(Wed)15:36:13 No.103030470

Anonymous 10/30/24(Wed)15:36:13 No.103030470

>>103030443
Thanks for reassuring me that it is not all refugees.

Anonymous
10/30/24(Wed)15:37:20 No.103030493

Anonymous 10/30/24(Wed)15:37:20 No.103030493

>>103030405
https://www.youtube.com/watch?v=9oRnVn4aqpM

Anonymous
10/30/24(Wed)15:38:57 No.103030516

Anonymous 10/30/24(Wed)15:38:57 No.103030516

>>103030443
Define "few months".

Anonymous
10/30/24(Wed)15:39:05 No.103030519

Anonymous 10/30/24(Wed)15:39:05 No.103030519

hi sirs please to kindly suggest the model to helpful do uncensored fast and no money thanks you sers

Anonymous
10/30/24(Wed)15:45:23 No.103030607

Anonymous 10/30/24(Wed)15:45:23 No.103030607

>no news in 5 days
dead hobby, closed source gigachads won

Anonymous
10/30/24(Wed)15:47:22 No.103030640

Anonymous 10/30/24(Wed)15:47:22 No.103030640

>>103030607
>dead hobby, closed source gigachads won
Newsflash: wearing IoT cock cage is not peak masculinity.

Anonymous
10/30/24(Wed)15:51:34 No.103030693

Anonymous 10/30/24(Wed)15:51:34 No.103030693

I need someone (female) to put a cock cage on me...

Anonymous
10/30/24(Wed)15:54:39 No.103030747

Anonymous 10/30/24(Wed)15:54:39 No.103030747

File: official apple advertisement.jpg (1.36 MB, 3840x2160)

1.36 MB JPG

Open source lost
Local lost
Many must —ack

Anonymous
10/30/24(Wed)15:57:56 No.103030790

Anonymous 10/30/24(Wed)15:57:56 No.103030790

>>103030747
Dear transgender,

Unlike you, I am not obsessed with software to the point of committing suicide.

Hope it helps!

-Straight White Man

Anonymous
10/30/24(Wed)16:07:26 No.103030917

Anonymous 10/30/24(Wed)16:07:26 No.103030917

>>103030747
Six more months until AGPL rugpull.
Trust the plan.

Anonymous
10/30/24(Wed)16:09:56 No.103030946

Anonymous 10/30/24(Wed)16:09:56 No.103030946

>>103030747
repeat after me: "Just because other people have gotten better stuff does not make my stuff worse. Its still the same as it ever was". Don't be a consoooomer pleb retard
Also, local open weight models are kickass right now so I don't even know what you're trying to say
corposhit is better in a few niches, but the ability to fully control local means that ten thousand other avenues open up that are functionally impossible with closed models

Anonymous
10/30/24(Wed)16:10:26 No.103030954

Anonymous 10/30/24(Wed)16:10:26 No.103030954

>>103030917
>Six more months until AGPL rugpull.
niggerganov is mitcuck he won't do shit

Anonymous
10/30/24(Wed)16:16:56 No.103031033

Anonymous 10/30/24(Wed)16:16:56 No.103031033

Is 7900xtx a good buy? Or two 7900gres? Or, is 4080s vastly better than xtx?

Anonymous
10/30/24(Wed)16:19:07 No.103031064

Anonymous 10/30/24(Wed)16:19:07 No.103031064

>>103031033
AMD

Anonymous
10/30/24(Wed)16:23:34 No.103031113

Anonymous 10/30/24(Wed)16:23:34 No.103031113

>>103030747
They showcased it because it is supporting Apple's new efforts with MLX and I don't know of any other project that is.
https://github.com/lmstudio-ai/mlx-engine
Makes sense on top of it being pretty but it is hilarious Apple thinks their laptops are in any way advantageous to running the models they showed when a cheaper laptop with a 4080 with CUDA would crush it in ML. I will give credit to Apple though with other workloads like Blender, the M3 max is a bit above the 4070 in rendering and an M4 will probably equal or exceed that in laptop. Still will be shit at games though.

Anonymous
10/30/24(Wed)16:23:42 No.103031115

Anonymous 10/30/24(Wed)16:23:42 No.103031115

>>103031033
>AMD

Anonymous
10/30/24(Wed)16:24:40 No.103031125

Anonymous 10/30/24(Wed)16:24:40 No.103031125

>>103031033
Get a 3090 or 4090 or 5090.

Anonymous
10/30/24(Wed)16:26:55 No.103031150

Anonymous 10/30/24(Wed)16:26:55 No.103031150

>>103031125
>>103031115
>>103031064
Loonix so amd is preferable. So which one. Can I run 70b model with XTX?

Anonymous
10/30/24(Wed)16:27:40 No.103031160

Anonymous 10/30/24(Wed)16:27:40 No.103031160

>>103031150
3090

Anonymous
10/30/24(Wed)16:28:51 No.103031178

Anonymous 10/30/24(Wed)16:28:51 No.103031178

>>103031150
You can probably run 70B at a really low bpw, yeah.

Anonymous
10/30/24(Wed)16:28:51 No.103031179

Anonymous 10/30/24(Wed)16:28:51 No.103031179

We see lots of tables of benchmarks.
But what are their settings?
We have so many dials to turn: Temperature, Repetition Penalty, Top P, Top K; Min P and Top A and other stuff.
Changing them sure can change the quality of the output.
Prompt format matters too. Some models take L3 and Mistral and CommandR, others either get the one they want or they cough up nonsense.
What are the true/default/canonical/deterministic/correct settings?

Anonymous
10/30/24(Wed)16:30:39 No.103031197

Anonymous 10/30/24(Wed)16:30:39 No.103031197

>>103031160
I can't get used around here.

Anonymous
10/30/24(Wed)16:34:45 No.103031237

Anonymous 10/30/24(Wed)16:34:45 No.103031237

>>103031179
Greedy. The only way to judge a model

Anonymous
10/30/24(Wed)16:46:30 No.103031325

Anonymous 10/30/24(Wed)16:46:30 No.103031325

>>103031197
ebay doesn't ship to your region? There are endless $500-ish 3090s on there

Anonymous
10/30/24(Wed)16:48:03 No.103031338

Anonymous 10/30/24(Wed)16:48:03 No.103031338

>>103031237
greedy search, i.e., always picking up the token of the largest probability as the next token.
It will always tell you how the model will naturally perform, which will be its area of least resistance and greatest potential

Anonymous
10/30/24(Wed)16:52:13 No.103031364

Anonymous 10/30/24(Wed)16:52:13 No.103031364

>>103031338
You misclicked.

Anonymous
10/30/24(Wed)16:52:52 No.103031369

Anonymous 10/30/24(Wed)16:52:52 No.103031369

>>103030946
>local open weight models are kickass right now so I don't even know what you're trying to say
It is probably a caigger saying that. Is cai actually that bad?

Anonymous
10/30/24(Wed)16:54:05 No.103031385

Anonymous 10/30/24(Wed)16:54:05 No.103031385

File: MikuHalloweenSpecial.png (1.23 MB, 832x1216)

1.23 MB PNG

Halloween migu

Anonymous
10/30/24(Wed)16:54:52 No.103031390

Anonymous 10/30/24(Wed)16:54:52 No.103031390

>>103031385
You made me refresh my page and there is not a single dancing on my 4chan, it's not Halloween yet till that happens.

Anonymous
10/30/24(Wed)16:57:15 No.103031410

Anonymous 10/30/24(Wed)16:57:15 No.103031410

>>103031390
It's Halloween Eve.

Anonymous
10/30/24(Wed)16:58:30 No.103031417

Anonymous 10/30/24(Wed)16:58:30 No.103031417

>>103031179
been my experience lately
trying various nemo models, lots of story writer tunes built on the base - but all of them have been legitimate retard tier excepting top-p somewhere around 0.5. I hear talk of low Temperature (0.2-0.5) which also seems to help, but dialing top-p above 0.5 (or even much below) and forget article time frequent conjunction yes.

Maybe some of these models are great, but without any insight on what parameters were used when benchmarking - not the least of which is prompt format - there's no way of guessing how close any of your outputs will remotely reflect the potential of that model.

This seems more of a problem now than it has been historically. Used to be I could throw koboldai's default settings at everything and get mostly coherent outputs. These days it seems only Llama 3.x models tolerate those settings, and finetunes have their own preferences.

All that said, wasn't GGUF format supposed to include all sorts of metadata to help mitigate this bullshit, or did I misunderstand what meta data the file format is supposed to include? I wasn't really paying attention at the time, because turning the dials wasn't that big of a deal until the last 6 months or so.

Anonymous
10/30/24(Wed)17:01:28 No.103031448

Anonymous 10/30/24(Wed)17:01:28 No.103031448

>reading layerskip's paper to replicate their training code, because they do not provide any
>inconsistencies between "applying layer dropoff" and "we actually compute the loss across outputs from all layers to make the final loss"
>mentions a cirriculum function, but only mentions what its supposed to do
>piecewise function with "i" variable but doesnt define what it entails
>broken latex when defining "hyperparameter" constants
Agony.

Anonymous
10/30/24(Wed)17:02:09 No.103031453

Anonymous 10/30/24(Wed)17:02:09 No.103031453

Caiggers need not answer. When was the last time you loaded up some kind of finetune and felt a genuine improvement in cooming quality? When was the last time you loaded up a new base model or instruct and this happened? And finally when has that feeling persisted after initial honeymoon phase?

Anonymous
10/30/24(Wed)17:06:35 No.103031499

Anonymous 10/30/24(Wed)17:06:35 No.103031499

>>103031390
its been halloween in japan for over 6 hours already

Anonymous
10/30/24(Wed)17:11:28 No.103031552

Anonymous 10/30/24(Wed)17:11:28 No.103031552

>>103031237
>>103031338
Kobold isn't showing me a setting for that.

>>103031417
>This seems more of a problem now than it has been historically
I've been kinda all over.
Temperature: I figured lower would be more stable, but 0 isn't allowed by Kobold. The slider stops at 0.10 and if I put in 0 it goes to 0.01. Is there a divide-by-Temp in the math making 0 or <1e-2 a problem?
Rep penalty slider stops at 1 but if I enter 0 it goes to 0.10; I wonder again if it's being divided by, and if it can go so low why isn't the slider going down there? And does it even matter? If it makes the model use synonyms for slop it's still slop, and penalty might screw up Q&A where a specific term of art might need to be used many times.
Top-P and Top-K, not sure, some models seem not to care, others go gibberish if I screw with them.

>>103031499
It's not spooky till sunset. And then it's over 7 hours later at midnight.
What a rip off. Best holiday, worst schedule.

Anonymous
10/30/24(Wed)17:12:26 No.103031563

Anonymous 10/30/24(Wed)17:12:26 No.103031563

File: ComfyUI_temp_mogyg_00029_.png (1.23 MB, 832x1216)

1.23 MB PNG

>>103031453
Finetune: Bondburger or Fish, later Sorcerer 8x22b. All major improvements from the base model and avoid the positivity bias/alignment issues.
Base Model: Nemo 12b (fast, output surprised me for its size, much less aligned than the 8b llama models and smarter)
Past honeymoon phase: Sorcerer 8x22b. Still my daily driver for non-productivity tasks like RP. Faster than 123b but still maintains much of the coherence

Anonymous
10/30/24(Wed)17:12:54 No.103031573

Anonymous 10/30/24(Wed)17:12:54 No.103031573

>>103031448
Here's some code from 2023 that might help
>https://github.com/ggerganov/llama.cpp/pull/3565

Anonymous
10/30/24(Wed)17:14:16 No.103031585

Anonymous 10/30/24(Wed)17:14:16 No.103031585

>>103031417
>low Temperature
Mistral Nemo is supposed to be used at low temp. It's on their readme. You only download pre-converted models, don't you?
>top-p
You mean top-k prob. top-p is a different thing.
>insight on what parameters
None of the samplers change the order of the tokens unless temperature equalizes the top-k N tokens. Temperature is never that high. Only top token is taken, no need for other samplers.
>not the least of which is prompt format
For instruct models, the model's prompt template is used.
>could throw koboldai's default settings at everything
Most models can follow
charA: dialog
charB: dialog
type of output just fine. It's how base models were used for dialog back then, and how they can still be used right now, be them instruct or not. Guess what one of the defaults on koboldai is...
>finetunes have their own preferences
No shit.
>All that said, wasn't GGUF format supposed to include all sorts of metadata to help mitigate this bullshit
They don't yet have a jinja parser. The point is for the user to have the data available. Programs that can parse jinja and load gguf files have the data available, but llama.cpp doesn't force you to use them. You can try to use whatever you want. The idea is to have a self-contained file for everything else.
>because turning the dials wasn't that big of a deal until the last 6 months or so.
Few of them are worth it. There's only a few to play with if you understand what they do.

Anonymous
10/30/24(Wed)17:14:41 No.103031592

Anonymous 10/30/24(Wed)17:14:41 No.103031592

>>103031552
>What a rip off. Best holiday, worst schedule.
Sucks that it landed in the middle of the week this year. It's going to come and go so quick.

Anonymous
10/30/24(Wed)17:16:12 No.103031606

Anonymous 10/30/24(Wed)17:16:12 No.103031606

>>103031552
>Kobold isn't showing me a setting for that.
Use https://artefact2.github.io/llm-sampling/ to figure out a setting that gives you only one token choice at all times. That's null sampling, which equates to what greedy would do.

Anonymous
10/30/24(Wed)17:18:46 No.103031631

Anonymous 10/30/24(Wed)17:18:46 No.103031631

>>103031552
>Doesn't understand greedy
Top-k 1, disable everything else.

If your model needs repetition penalty, it's shit. Change it.
Temperature 1 is the 'normal' token distribution. Most models recommend about 0.8. Mistral nemo at about 0.3. 0 doesn't make sense.
Rep penalty is multiplicative. 1.01 increases the penalty, 0.99 decreases it. Not worth using on a good model.
Top-p is deprecated. Top-k or min-p and temperature will do 99% of what you need.

Anonymous
10/30/24(Wed)17:20:42 No.103031646

Anonymous 10/30/24(Wed)17:20:42 No.103031646

svelk

Anonymous
10/30/24(Wed)17:29:36 No.103031742

Anonymous 10/30/24(Wed)17:29:36 No.103031742

>>103031325
Nein. What about two 7900gres more vram is better?

Anonymous
10/30/24(Wed)17:36:17 No.103031791

Anonymous 10/30/24(Wed)17:36:17 No.103031791

>>103031573
I miss KerfuffleV2. He was always super humble. I hope he's doing ok.

Anonymous
10/30/24(Wed)17:36:45 No.103031795

Anonymous 10/30/24(Wed)17:36:45 No.103031795

>>103031453
>finetune
my meme merge :^)
>base/instruct
mistral large

Anonymous
10/30/24(Wed)17:38:34 No.103031810

Anonymous 10/30/24(Wed)17:38:34 No.103031810

No new sota model since largestal.
Its never been more over

Anonymous
10/30/24(Wed)17:42:51 No.103031840

Anonymous 10/30/24(Wed)17:42:51 No.103031840

>>103031448
https://github.com/facebookresearch/LayerSkip

Anonymous
10/30/24(Wed)17:46:40 No.103031875

Anonymous 10/30/24(Wed)17:46:40 No.103031875

What happened to speculative decoding and lookahead decoding? Did ggerganov abandon it?

Anonymous
10/30/24(Wed)17:47:34 No.103031879

Anonymous 10/30/24(Wed)17:47:34 No.103031879

Holy shit bros, I created a super low-effort assistant and its acting like an anti-adhd/laziness/procrastination bot. Extra executive function in a bottle.
Context:
You are an assistant that will help in the work the user tells you about below. You will help by answering in short sentences. You will NOT provide long responses, lists or bullet points. You can ask or answer questions, but will not infodump.
Is summary, this conversation should be back and forth with the user. Be sure not to do the work but only to assist.
Greeting:
Hi, what are we working on today?
Basically reverse-CoT, making the human think but keeping things on the rails and giving useful advice/asking the next most useful question. If you get stuck, you can just ask it a question or for suggestions and it won't yap too much. Just enough to get you back on track.
I found using a standard assistant would dump out reams of relevant stuff, but then I'd procrastinate taking that and turning it into anything useful, whereas this way feels good, man.

Anonymous
10/30/24(Wed)17:48:21 No.103031885

Anonymous 10/30/24(Wed)17:48:21 No.103031885

https://www.youtube.com/watch?v=HaAIsyP4JPc

https://www.nextsilicon.com/

Anonymous
10/30/24(Wed)17:50:10 No.103031899

Anonymous 10/30/24(Wed)17:50:10 No.103031899

>>103031885
Buy an a- oh wait, you don't have a product!

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/30/24(Wed)17:52:30 No.103031921

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/30/24(Wed)17:52:30 No.103031921

>>103031875
Both exist as examples when using the llama.cpp C/C++ API but are not available in the HTTP server.
I have mostly worked on lookahead decoding, the problem with it is that it just does not give a speedup that is very large or consistent and that existing speedups diminish as the vocabulary size increases.
llama.cpp training is on track for the end of 2024, one of the things that I plan to try with it is distillation of models for use with speculative decoding.

Anonymous
10/30/24(Wed)17:55:57 No.103031958

Anonymous 10/30/24(Wed)17:55:57 No.103031958

>>103031921
>distillation of models for use with speculative decoding.
is the idea that you'd use the smaller, distilled model for high-confidence tokens and hit the big model when there's more ambigousness?

Anonymous
10/30/24(Wed)17:59:54 No.103031994

Anonymous 10/30/24(Wed)17:59:54 No.103031994

>>103031631
So I can just put 1 in all four of the main sampler fields in Kobold settings and let it roll?

Why isn't that default/preset?

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/30/24(Wed)18:00:45 No.103032001

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/30/24(Wed)18:00:45 No.103032001

>>103031958
No, the distilled model is used to draft tokens one at a time and the big model is then used to validate the drafted tokens all at once.
This is potentially faster because the runtime increases less than linearly with the number of tokens per batch (this is also why prompt processing is so much faster than generating tokens).

Anonymous
10/30/24(Wed)18:00:48 No.103032003

Anonymous 10/30/24(Wed)18:00:48 No.103032003

>>103031921
>I have mostly worked on lookahead decoding, the problem with it is that it just does not give a speedup that is very large or consistent and that existing speedups diminish as the vocabulary size increases.
5% speedup is a 5% speedup, I'd take it, even if it's not applicable everywhere. Maybe for large vocab you can add some kind of user-defined filter to for example exclude all non-latin tokens?

>llama.cpp training is on track for the end of 2024, one of the things that I plan to try with it is distillation of models for use with speculative decoding.
Nice to hear that it isn't abandoned. It would be nice to be able to distill 100B model into 10B.

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/30/24(Wed)18:03:45 No.103032035

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/30/24(Wed)18:03:45 No.103032035

>>103032003
>5% speedup is a 5% speedup, I'd take it, even if it's not applicable everywhere.
Give me a cloning machine and I'll do it.
As it is I need to prioritize what I work on and this simply didn't make the cut.

Anonymous
10/30/24(Wed)18:10:27 No.103032102

Anonymous 10/30/24(Wed)18:10:27 No.103032102

>>103031994
I don't use kobold, but if greedy sampling is what you want, that's typically the way to do it.
>Why isn't that default/preset?
People like their samplers. If you see overly-specific sampler settings (as in "exactly 1.0236475 is good for rep pen, but 1.0236476 is not") be suspicious. Same for all samplers.
If you want variety (what some call "creativity"), however, higher temperature and higher top-k (or lower min-p) help. You could also use DRY and/or XTC, but i haven't tried them.

Anonymous
10/30/24(Wed)18:23:19 No.103032252

Anonymous 10/30/24(Wed)18:23:19 No.103032252

>>103032102
Yah, samplers tend to solve specific problems and their misuse probably causes all sorts of problems for llm users. Its too bad they aren't named in some way that's easy to intuit for the average human.
e.g These will be highly model and situation specific, but here's my juuuust coherent deepseek preset for starting out a card that needs to seed a bunch of "random" values to get going (and then get dialed back down)
temperature: 2.6
min_p: 0.0065
top_k: 200

Anonymous
10/30/24(Wed)18:26:21 No.103032278

Anonymous 10/30/24(Wed)18:26:21 No.103032278

anyone merge the new booba? anything broken this time?

Anonymous
10/30/24(Wed)18:37:27 No.103032389

Anonymous 10/30/24(Wed)18:37:27 No.103032389

hi guys, I tried to get local models running a while back on only one (1) consoomer card (3070, 8gb of vram). I could get a 7B model going pretty fast, and an anon told me you can better models going with offload but I can't figure out how to get it to work. Is it supported on textgen-webui? do I need to use kobold?

Anonymous
10/30/24(Wed)18:39:49 No.103032408

Anonymous 10/30/24(Wed)18:39:49 No.103032408

>>103032389
>Is it supported on textgen-webui?
Using llama.cpp yes.

>do I need to use kobold?
You don't need, but it's less of a headache if you are going to use ggufs anyway.
Also, download rocinante v1.1 Q5_K_S or whatever. You should be able to run that with a little over 8k context with most layers in VRAM.

Anonymous
10/30/24(Wed)18:42:51 No.103032442

Anonymous 10/30/24(Wed)18:42:51 No.103032442

>>103032389
You need to use the GGUF format to offload to CPU.
Ooba (textgen-webui) should have it.
If you find GGUFs made with imatrix files (e.g. iMat) you can go smaller with the quants without losing as much coherence.
With 8gb VRAM you could probably run a Q3_K_S quant of Mistral-Nemo-Instruct-12B and still have a bit of VRAM left over for context

Anonymous
10/30/24(Wed)18:44:49 No.103032459

Anonymous 10/30/24(Wed)18:44:49 No.103032459

File: 1725991413507490.png (51 KB, 858x112)

51 KB PNG

>>103031879
It's pretty fun. Got this gem after a few dozen one liners and absolutely no lewd stuff.

Anonymous
10/30/24(Wed)18:45:15 No.103032467

Anonymous 10/30/24(Wed)18:45:15 No.103032467

>>103032389
What's your system RAM like? If you're going larger than VRAM (I'm 12GB and I've never found a model that fits and isn't shit) then AVAILABLE system RAM becomes your limit and you accept something like one token per second generation rates. If you go over that, you'll be paging and then gen rates become nearly nothing and you're thrashing your drive.

iMat and i1 are good, don't go below Q4 unless it's IQ3. The IQs are dumb, but Q3_K and lower are lobotomized.

Anonymous
10/30/24(Wed)18:45:20 No.103032468

Anonymous 10/30/24(Wed)18:45:20 No.103032468

>>103032408
>>103032442
alright thanks guys. my initial plan was to run nemo but I'll try this Rocinantes model too

Anonymous
10/30/24(Wed)18:46:16 No.103032481

Anonymous 10/30/24(Wed)18:46:16 No.103032481

>>103032468
>Rocinantes
It's a nemo fine tune.
The best one if you ask me, I've been shilling it for a couple threads now after having great experiences with tons of wildly different character cards.

Anonymous
10/30/24(Wed)18:47:34 No.103032492

Anonymous 10/30/24(Wed)18:47:34 No.103032492

File: teto-trio.png (1.5 MB, 832x1216)

1.5 MB PNG

>>103032468
Try both, you'll appreciate the difference between the base and finetune more
And if you're offloading to CPU you can probably do a bigger quant like >>103032408
said, just depends on the speed/quality tradeoff you want.

Anonymous
10/30/24(Wed)18:49:01 No.103032514

Anonymous 10/30/24(Wed)18:49:01 No.103032514

>>103032467
I have 16GB of ram
>>103032481
>>103032492
thanks for the pointers <3

Anonymous
10/30/24(Wed)18:53:16 No.103032567

Anonymous 10/30/24(Wed)18:53:16 No.103032567

>>103032492
offtopic, but what are you prompting to get those nice, thick, weighted outlines in your gens?

Anonymous
10/30/24(Wed)18:54:44 No.103032588

Anonymous 10/30/24(Wed)18:54:44 No.103032588

>>103032514
>16
I guess you could try a Mistral Nemo Q6 or Mistral Small i1-IQ3_M. My notes have both at 9.4 GB. I think Nemo has a Q4K_S at 6.6GB, that might fit your video card though you probably wouldn't have space for any meaningful amount of context.

Anonymous
10/30/24(Wed)18:56:20 No.103032611

Anonymous 10/30/24(Wed)18:56:20 No.103032611

I just tried holding a philosophical discussion with mystery models on lmsys. What a waste of time. It doesn't matter if you provide sound arguments or autistically screech at them, they will go back to the establishment values and will endlessly moralize. They will not even try to debunk what you say. I miss the early days.

Anonymous
10/30/24(Wed)19:00:38 No.103032656

Anonymous 10/30/24(Wed)19:00:38 No.103032656

Why is nobody using Aphrodite Engine?

Anonymous
10/30/24(Wed)19:01:07 No.103032660

Anonymous 10/30/24(Wed)19:01:07 No.103032660

>>103032252
>temperature: 2.6
>min_p: 0.0065
>top_k: 200
Well. This is exactly what i was talking about.
Have you tried 0.0066 and 0.0064? did rounding to 0.007 or 0.006 not work? now so? Just "feel"?. for min-p, a value of 0.01 is already considered really low. Doesn't matter what temperature you have, the bottom tokens are never selected.
And then top-k 200, which has the same problem. You're gonna have a tough time having anything lower than top-k ~50 being selected, again, regardless of the high temp.
Not only that, min-p and top-k serve a similar purpose, making one redundant when the other is set.
top-k N removes tokens after index N. So only the top N tokens are left for selection. Min-p F removes the tokens with a lower probability than F, relative to top-k[0]'s probabilities.
All overly specific values with redundancies.

Anonymous
10/30/24(Wed)19:02:34 No.103032677

Anonymous 10/30/24(Wed)19:02:34 No.103032677

>>103032278
git is spooky, innit?

Anonymous
10/30/24(Wed)19:02:44 No.103032681

Anonymous 10/30/24(Wed)19:02:44 No.103032681

File: Untitled.png (47 KB, 1115x628)

47 KB PNG

i always go for Q4_K_M 12b's on my 8gb vram setup at 8k context
~7tk/s streaming in SSE is a comfy speed

Anonymous
10/30/24(Wed)19:03:59 No.103032695

Anonymous 10/30/24(Wed)19:03:59 No.103032695

>>103032656
What is its competitive advantage over the likes of llama.cpp and exllama?

>>103032681
Same..

Anonymous
10/30/24(Wed)19:04:13 No.103032697

Anonymous 10/30/24(Wed)19:04:13 No.103032697

>>103032656
It's pythonshit.

Anonymous
10/30/24(Wed)19:04:58 No.103032707

Anonymous 10/30/24(Wed)19:04:58 No.103032707

>>103032656
Because it's just a vLLM fork. And the way they did the fork puts a lot of maintenance burden to keep it updated, and vLLM is a project that moves fast. It's kind of like the koboldcpp/llama.cpp situation but worse, because Aphrodite modifies a lot of files for branding.

Anonymous
10/30/24(Wed)19:18:48 No.103032850

Anonymous 10/30/24(Wed)19:18:48 No.103032850

File: ComfyUI_temp_zcnil_00092_.png (1.44 MB, 832x1216)

1.44 MB PNG

>>103032567
IllustriousXL with SEGAttention

Anonymous
10/30/24(Wed)19:19:50 No.103032866

Anonymous 10/30/24(Wed)19:19:50 No.103032866

>>103032850

masterpiece, best quality, extremely detailed, close-up, fang, gorgeous, perfect, elegant, kasane teto, red hair, twindrills, vampire bat dress, demure

Anonymous
10/30/24(Wed)19:24:35 No.103032911

Anonymous 10/30/24(Wed)19:24:35 No.103032911

What quant do you run if you can't fit Q4_K_M?

Q4_K_S or iQ3_M?

Anonymous
10/30/24(Wed)19:27:50 No.103032943

Anonymous 10/30/24(Wed)19:27:50 No.103032943

>>103032911
IQ4_XS

Anonymous
10/30/24(Wed)19:27:59 No.103032946

Anonymous 10/30/24(Wed)19:27:59 No.103032946

File: new_i_quants.png (10 KB, 792x612)

10 KB PNG

>>103032911
I tend to default to the QK quants.

Anonymous
10/30/24(Wed)19:32:02 No.103032983

Anonymous 10/30/24(Wed)19:32:02 No.103032983

Why is temperature-last better?

Anonymous
10/30/24(Wed)19:33:09 No.103032994

Anonymous 10/30/24(Wed)19:33:09 No.103032994

>>103032656
Because it's Linux only

Anonymous
10/30/24(Wed)19:33:15 No.103032995

Anonymous 10/30/24(Wed)19:33:15 No.103032995

File: 353927220-e1e8e2ba-1e61-4(...).png (10 KB, 792x612)

10 KB PNG

>>103032035
When will ggerganov add new IQ quants that kawrakow made?

Anonymous
10/30/24(Wed)19:34:29 No.103033004

Anonymous 10/30/24(Wed)19:34:29 No.103033004

>>103032983
It's not.

Anonymous
10/30/24(Wed)19:34:40 No.103033007

Anonymous 10/30/24(Wed)19:34:40 No.103033007

>>103030296
what kind of performance can you expect from the M chips in inference?

Anonymous
10/30/24(Wed)19:34:56 No.103033008

Anonymous 10/30/24(Wed)19:34:56 No.103033008

>>103032911
Q number is king.
6 is optimal. 8 is either 6 or so rarely different that it's a rounding error. 5 is fine, 4 is okay but it's starting to suffer. At 3, go IQ3. Q_K is falling apart at that point.

The letters after the IQ or Q_K number, consider them more like flavors than differences. Try them all and go with the one that you like best. Bigger isn't necessarily better. Some anons here were favoring S over M because S seemed to be better at recalling facts than M, which is a mixture of quant levels.

iMatrix and i1 are nice to have, but still, alternative flavors, test and then decide.

Anonymous
10/30/24(Wed)19:36:25 No.103033023

Anonymous 10/30/24(Wed)19:36:25 No.103033023

>>103032983
You trade in soul for more easily controlled model behaviour.

Anonymous
10/30/24(Wed)19:40:29 No.103033066

Anonymous 10/30/24(Wed)19:40:29 No.103033066

>>103030007
So are mac really bad with large context? I mainly want it to use a bunch of context for code with large models

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/30/24(Wed)19:41:06 No.103033071

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/30/24(Wed)19:41:06 No.103033071

>>103032995
No idea.

Anonymous
10/30/24(Wed)19:42:29 No.103033081

Anonymous 10/30/24(Wed)19:42:29 No.103033081

>>103033071
Are they still having their little drama?

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/30/24(Wed)19:43:06 No.103033089

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/30/24(Wed)19:43:06 No.103033089

>>103033081
Don't know.

Anonymous
10/30/24(Wed)19:45:00 No.103033110

Anonymous 10/30/24(Wed)19:45:00 No.103033110

>>103033089
Do you still collect blacked miku photos?

Anonymous
10/30/24(Wed)19:46:08 No.103033115

Anonymous 10/30/24(Wed)19:46:08 No.103033115

>>103033089
When will we get jamba and vision?

Anonymous
10/30/24(Wed)19:47:24 No.103033126

Anonymous 10/30/24(Wed)19:47:24 No.103033126

>>103033089
How's the training code doing? Any surprises, positive or negative?

llama.cpp CUDA dev !!OM2Fp6Fn93S
10/30/24(Wed)19:48:35 No.103033133

llama.cpp CUDA dev !!OM2Fp6Fn93S 10/30/24(Wed)19:48:35 No.103033133

>>103033115
Don't know.

>>103033126
It's alright I guess.
The memory allocation is tricky to get right.

Anonymous
10/30/24(Wed)19:51:19 No.103033152

Anonymous 10/30/24(Wed)19:51:19 No.103033152

File: {2FE7B370-0AA1-4684-A290-(...).png (62 KB, 921x331)

62 KB PNG

Update from 2 days ago
Soon I will have something fully automated for novel translations.
I've decided that UI is gay so instead I am just doing command line

Anonymous
10/30/24(Wed)19:53:56 No.103033172

Anonymous 10/30/24(Wed)19:53:56 No.103033172

>>103033152
Please share the project, I'm interested

Anonymous
10/30/24(Wed)19:56:15 No.103033186

Anonymous 10/30/24(Wed)19:56:15 No.103033186

>>103033172
Will do once it can at least automatically crawl a ncode.syosetu.com novel and automatically queue all chapters.
There is a lot of work needed in configuring the local LLM too, so far it is barely above google translate, but that's mostly because I am really bad at prompting

Anonymous
10/30/24(Wed)20:13:18 No.103033333

Anonymous 10/30/24(Wed)20:13:18 No.103033333

>>103033186
thanks anon

Anonymous
10/30/24(Wed)20:32:50 No.103033525

Anonymous 10/30/24(Wed)20:32:50 No.103033525

>>103032995
ik was never kicked from llama.cpp, he just stopped contributing for reasons that he never really explained publicly

Anonymous
10/30/24(Wed)20:33:39 No.103033535

Anonymous 10/30/24(Wed)20:33:39 No.103033535

File: 74GD.png (172 KB, 900x697)

172 KB PNG

I kneel

Anonymous
10/30/24(Wed)20:37:16 No.103033562

Anonymous 10/30/24(Wed)20:37:16 No.103033562

>>103033535
Slop apparently is actually what "humans" like. It's over.

Anonymous
10/30/24(Wed)20:39:59 No.103033588

Anonymous 10/30/24(Wed)20:39:59 No.103033588

>>103033535
Gemini is actually retarded though. That would be like giving mythomax top place.

Anonymous
10/30/24(Wed)20:40:05 No.103033590

Anonymous 10/30/24(Wed)20:40:05 No.103033590

>>103033535
We need better humans.

Anonymous
10/30/24(Wed)20:42:30 No.103033617

Anonymous 10/30/24(Wed)20:42:30 No.103033617

I am waiting for november 5th but I am wondering if we can even get a perfect coombot with all those incremental upgrades? Can you really just cram more tokens and pretrain for longer and have an "unsafe" dataset and it is just gonna work at some point? I can't help but think that the high context degradation will only become worse or stay the same and you will never get the model to actually surprise you with stuff you would want to be surprised by.

Anonymous
10/30/24(Wed)20:44:55 No.103033640

Anonymous 10/30/24(Wed)20:44:55 No.103033640

>>103033617
yeah

Anonymous
10/30/24(Wed)20:46:35 No.103033652

Anonymous 10/30/24(Wed)20:46:35 No.103033652

File: file.png (193 KB, 800x700)

193 KB PNG

>>103033590
There will only be more synthetic slop saturation of datasets, more dataset sanitation, more safety alignment and preference benchmarks becoming less and less reliable because of pic related. This is the end.

Anonymous
10/30/24(Wed)20:47:36 No.103033661

Anonymous 10/30/24(Wed)20:47:36 No.103033661

Is there a way to run koboldcpp using ZLUDA on windows? I am using the ROCm fork which gives great speeds in prompt processing using hip (20+t/s) but generation is still done on CPU at 0.8t/s :(

Anonymous
10/30/24(Wed)20:50:27 No.103033676

Anonymous 10/30/24(Wed)20:50:27 No.103033676

File: EQ bench.png (98 KB, 976x899)

98 KB PNG

>>103033535
memebenches

Anonymous
10/30/24(Wed)20:53:05 No.103033700

Anonymous 10/30/24(Wed)20:53:05 No.103033700

>>103033652
I hope that in the future, when h100s become cheap, local organizes and trains a model on unfiltered dataset. So many variables have to align... Starting with elections. If Kamala wins, goodbye freedom, if Trump, there is a chance that he will go after woke corpos and will do everything to fuck them over.

Anonymous
10/30/24(Wed)20:53:12 No.103033702

Anonymous 10/30/24(Wed)20:53:12 No.103033702

>>103033676
Take into account this is not some sort of social intelligence test. I tried those 9B and they are too dumb to do anything complicated.

Anonymous
10/30/24(Wed)20:53:44 No.103033706

Anonymous 10/30/24(Wed)20:53:44 No.103033706

>>103033617
We literally already made it to the finish line with Largestral (and its finetunes), the only thing left now is to wait for hardware advances to make it easier to run.

Anonymous
10/30/24(Wed)20:54:58 No.103033718

Anonymous 10/30/24(Wed)20:54:58 No.103033718

>>103033706
>Largestral
It still lacks a ton of fandom knowledge sadly. Hermes 405B is the only that local that is good enough atm imo.

Anonymous
10/30/24(Wed)20:59:30 No.103033771

Anonymous 10/30/24(Wed)20:59:30 No.103033771

>>103033089
ollama is better. It has static bindings of rocm on Linux.

Anonymous
10/30/24(Wed)20:59:51 No.103033774

Anonymous 10/30/24(Wed)20:59:51 No.103033774

File: memebench-sorted.png (591 KB, 1388x3321)

591 KB PNG

>>103033676
Goodhart's law in action. No benchmark is immune from it. Some may hold on for a while, but even they become useless over time.

Anonymous
10/30/24(Wed)21:09:40 No.103033887

Anonymous 10/30/24(Wed)21:09:40 No.103033887

>>103033562
It's Indians.

Anonymous
10/30/24(Wed)21:15:45 No.103033934

Anonymous 10/30/24(Wed)21:15:45 No.103033934

>>103033676
I always thought the EQ Bench was stupid. Imagine thinking that asking LLMs if other LLMs did a good job at subjective tasks like creative writing was a good idea.

Anonymous
10/30/24(Wed)21:17:01 No.103033947

Anonymous 10/30/24(Wed)21:17:01 No.103033947

>>103033718
no, just no.

Anonymous
10/30/24(Wed)21:23:58 No.103033999

Anonymous 10/30/24(Wed)21:23:58 No.103033999

>>103033947
Yea. NO large mistral or 70B has known how to play my waifu well. 405B knows her and her universe in and out.

Anonymous
10/30/24(Wed)21:30:42 No.103034056

Anonymous 10/30/24(Wed)21:30:42 No.103034056

>>103033999
>>103033947
It wouldn't be surprising. Trivia is one of the things that total parameter size benefits from the most. More than "reasoning" capability for sure.

Anonymous
10/30/24(Wed)21:35:08 No.103034085

Anonymous 10/30/24(Wed)21:35:08 No.103034085

>>103033999
Trips confirm.
I've been very disappointed in L3, Mist Large, and whatever else I've tried for what should be basic pop culture knowledge if it read some Wikipedia.

But 405B is a bit too thicc when I'm barely able to fit a 70B Q6 gguf.

Suffering.

Anonymous
10/30/24(Wed)21:36:17 No.103034099

Anonymous 10/30/24(Wed)21:36:17 No.103034099

>>103034085
Here's hoping they do a M4 ultra with 256GB. Prob cost 8K but at least more people will be able to use it without needing to install more breakers and double their electric bill.

Anonymous
10/30/24(Wed)22:01:22 No.103034285

Anonymous 10/30/24(Wed)22:01:22 No.103034285

>>103034099
>double their electric bill.
My waifu is not fat!

Anonymous
10/30/24(Wed)22:03:55 No.103034310

Anonymous 10/30/24(Wed)22:03:55 No.103034310

File: 247A04C7-8209-4572-A7AB-A(...).jpg (237 KB, 1440x1795)

237 KB JPG

Could an AI song cover bros help me?
I want to hear Disney's Goofy sing Lil Baby - Pure Cocaine. Please, upload it to YouTube and share the link.

Anonymous
10/30/24(Wed)22:05:51 No.103034323

Anonymous 10/30/24(Wed)22:05:51 No.103034323

>>103034099
We just need a way to cram 405B into a sixth of the space.
Can true Bitnet save us?
Or will it be too aligned and respectful when it finally arrives?

Anonymous
10/30/24(Wed)22:06:21 No.103034327

Anonymous 10/30/24(Wed)22:06:21 No.103034327

>>103034099
>double their electric bill
In the winter it's just an expensive and loud heater, so no extra spending.

Anonymous
10/30/24(Wed)22:07:28 No.103034336

Anonymous 10/30/24(Wed)22:07:28 No.103034336

>>103034327
But you make up for it in summer when it's AI + air conditioning.
t. deep south = what is "winter?"

Anonymous
10/30/24(Wed)22:11:01 No.103034368

Anonymous 10/30/24(Wed)22:11:01 No.103034368

>>103034336
In the summer it almost cooked me alive.
t. euro = what is "air conditioning?"

Anonymous
10/30/24(Wed)22:26:14 No.103034494

Anonymous 10/30/24(Wed)22:26:14 No.103034494

File: 1703967934013355.png (775 KB, 688x474)

775 KB PNG

Friendly reminder for polturds :)

Anonymous
10/30/24(Wed)22:26:21 No.103034496

Anonymous 10/30/24(Wed)22:26:21 No.103034496

>>103034368
I've heard a lot of strange things about European domiciles and I'm willing to believe that all of them are true.

Anonymous
10/30/24(Wed)22:28:33 No.103034512

Anonymous 10/30/24(Wed)22:28:33 No.103034512

Sovits 0-shot going for a cute laugh: https://voca.ro/1ar1PvfLw672

Anonymous
10/30/24(Wed)22:33:16 No.103034558

Anonymous 10/30/24(Wed)22:33:16 No.103034558

>>103033706
We made it with Qwen2.5.

Anonymous
10/30/24(Wed)22:36:18 No.103034582

Anonymous 10/30/24(Wed)22:36:18 No.103034582

>>103034558
Smart but too dry, Largestral tunes don't have that problem.

Anonymous
10/30/24(Wed)22:36:23 No.103034583

Anonymous 10/30/24(Wed)22:36:23 No.103034583

>>103034327
In my case, unfortunately, it's more than doubled. I'm paying more for higher amps on my electric plan.
>>103034336
I moved it away from my room. Closer to the breakers and it no longer heats up my ass.

Anonymous
10/30/24(Wed)22:40:53 No.103034627

Anonymous 10/30/24(Wed)22:40:53 No.103034627

So what's the meta for ~70b? is it really miqu even after all this time?
>just use q3 largestral or whatever
it's too slow

Anonymous
10/30/24(Wed)22:41:24 No.103034629

Anonymous 10/30/24(Wed)22:41:24 No.103034629

>>103034583
>be me
>check UPS
>360W
I'm not only a vramlet, I'm a wattlet.

Anonymous
10/30/24(Wed)22:42:38 No.103034640

Anonymous 10/30/24(Wed)22:42:38 No.103034640

File: 1729969856951604.png (638 KB, 677x1065)

638 KB PNG

>>103034558
While people complain about Qwen speaking Chinese, Mistral dropped 我爱你,我的王子。 out of the blue, with no non-ASCII symbols in either card or chat. Has CPC hacked my GPU?

Anonymous
10/30/24(Wed)22:45:31 No.103034668

Anonymous 10/30/24(Wed)22:45:31 No.103034668

>EDLM, our brand-new Energy-based Language Model embedded with Diffusion framework
>We (for the first time?) almost match AR perplexity
>Significantly improved generation quality
>Considerable sampling speedup without quality drop
arxiv.org/abs/2410.21357
https://x.com/MinkaiX/status/1851748096973377720
They also push inspiration from ylecun's works.

Anonymous
10/30/24(Wed)22:49:21 No.103034707

Anonymous 10/30/24(Wed)22:49:21 No.103034707

>>103034668
What is AR perplexity?

Anonymous
10/30/24(Wed)22:52:45 No.103034730

Anonymous 10/30/24(Wed)22:52:45 No.103034730

>>103034582
Try Magnum 72B. The Largestral one was worse than the original model.

Anonymous
10/30/24(Wed)22:52:53 No.103034731

Anonymous 10/30/24(Wed)22:52:53 No.103034731

>>103034707
My assumption is AR means autoregressive in this context.

Anonymous
10/30/24(Wed)22:53:20 No.103034736

Anonymous 10/30/24(Wed)22:53:20 No.103034736

>>103034668
>Energy-based
They captured lighting elementals and put in the computer?

Anonymous
10/30/24(Wed)22:58:40 No.103034781

Anonymous 10/30/24(Wed)22:58:40 No.103034781

File: 1706826527674085.png (265 KB, 512x512)

265 KB PNG

>>103034640
Sacre bleu! It sounds like someone is using the wrong tokenizer with their mistral model

Anonymous
10/30/24(Wed)23:00:07 No.103034797

Anonymous 10/30/24(Wed)23:00:07 No.103034797

File: Untitled.png (1.59 MB, 1080x3292)

1.59 MB PNG

TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
https://arxiv.org/abs/2410.23168
>Transformers have become the predominant architecture in foundation models due to their excellent performance across various domains. However, the substantial cost of scaling these models remains a significant concern. This problem arises primarily from their dependence on a fixed number of parameters within linear projections. When architectural modifications (e.g., channel dimensions) are introduced, the entire model typically requires retraining from scratch. As model sizes continue growing, this strategy results in increasingly high computational costs and becomes unsustainable. To overcome this problem, we introduce TokenFormer, a natively scalable architecture that leverages the attention mechanism not only for computations among input tokens but also for interactions between tokens and model parameters, thereby enhancing architectural flexibility. By treating model parameters as tokens, we replace all the linear projections in Transformers with our token-parameter attention layer, where input tokens act as queries and model parameters as keys and values. This reformulation allows for progressive and efficient scaling without necessitating retraining from scratch. Our model scales from 124M to 1.4B parameters by incrementally adding new key-value parameter pairs, achieving performance comparable to Transformers trained from scratch while greatly reducing training costs.
https://github.com/Haiyang-W/TokenFormer
https://huggingface.co/Haiyang-W
pretty interesting

Anonymous
10/30/24(Wed)23:10:47 No.103034875

Anonymous 10/30/24(Wed)23:10:47 No.103034875

>>103034797
>Let's not bother training a better model, we'll simply add more parameters to our current one, after all, VRAM is cheap
Grim.

Anonymous
10/30/24(Wed)23:18:56 No.103034933

Anonymous 10/30/24(Wed)23:18:56 No.103034933

>>103034875
It's not for you, it's for people who actually matter.

Anonymous
10/30/24(Wed)23:19:23 No.103034938

Anonymous 10/30/24(Wed)23:19:23 No.103034938

>>103034875
It literally is cheap, just not for us. Mostly due to cartel dynamics rather than actual market forces.

Anonymous
10/30/24(Wed)23:22:29 No.103034966

Anonymous 10/30/24(Wed)23:22:29 No.103034966

>>103034938
Yes, it wasn't a joke. Growing the model's parameter count for cheap is what ultimately kills lmg

Anonymous
10/30/24(Wed)23:41:45 No.103035090

Anonymous 10/30/24(Wed)23:41:45 No.103035090

>>103034627
For ERP, unironically Sao10K/L3-70B-Euryale-v2.1.
For non-erotic RP, nvidia/Llama-3.1-Nemotron-70B-Instruct-HF. It responds to style instructions.
For general instruction following either nvidia/Llama-3.1-Nemotron-70B-Instruct-HF or Qwen/Qwen2.5-72B-Instruct.

Anonymous
10/30/24(Wed)23:42:14 No.103035094

Anonymous 10/30/24(Wed)23:42:14 No.103035094

>>103035090
Buy a fucking ad, asshole.

Anonymous
10/30/24(Wed)23:45:21 No.103035114

Anonymous 10/30/24(Wed)23:45:21 No.103035114

>>103035094
Buy THIS *unzips penis*

Anonymous
10/30/24(Wed)23:46:16 No.103035122

Anonymous 10/30/24(Wed)23:46:16 No.103035122

*BRAP*

Anonymous
10/30/24(Wed)23:47:34 No.103035130

Anonymous 10/30/24(Wed)23:47:34 No.103035130

>>103035090
>Euryale in Oct 2024
Kys tourist

Anonymous
10/30/24(Wed)23:48:43 No.103035137

Anonymous 10/30/24(Wed)23:48:43 No.103035137

>>103035130
>Keep yourself safe
Based

Anonymous
10/30/24(Wed)23:50:32 No.103035154

Anonymous 10/30/24(Wed)23:50:32 No.103035154

>>103035137
Keep slurping your slop retard

Anonymous
10/30/24(Wed)23:53:20 No.103035175

Anonymous 10/30/24(Wed)23:53:20 No.103035175

>>103035154
Skill issue :3

Anonymous
10/30/24(Wed)23:56:25 No.103035198

Anonymous 10/30/24(Wed)23:56:25 No.103035198

File: file.png (118 KB, 471x171)

118 KB PNG

>>103034640
NONFUNCTIONAL POCKETS
USELESS POUCHES EXCEPT FOR CARRYING A LIP STICK
ABSOLUT RETARDED "PANTS"
AAAAAAA

Anonymous
10/31/24(Thu)00:00:48 No.103035230

Anonymous 10/31/24(Thu)00:00:48 No.103035230

File: 1712988132026928.png (118 KB, 1450x907)

118 KB PNG

aaa

Anonymous
10/31/24(Thu)00:05:25 No.103035264

Anonymous 10/31/24(Thu)00:05:25 No.103035264

>>103035090
>For general instruction following either nvidia/Llama-3.1-Nemotron-70B-Instruct-HF or Qwen/Qwen2.5-72B-Instruct.

llama has ferocious "I won't talk about that" and wokitis. It's highly obnoxious.

Anonymous
10/31/24(Thu)00:06:26 No.103035269

Anonymous 10/31/24(Thu)00:06:26 No.103035269

>>103034640
I have an unhackable AMD gpu. :^)

Anonymous
10/31/24(Thu)00:09:28 No.103035277

Anonymous 10/31/24(Thu)00:09:28 No.103035277

File: don't tell me what I can't do.png (967 KB, 1400x700)

967 KB PNG

>>103035230

Anonymous
10/31/24(Thu)00:14:15 No.103035309

Anonymous 10/31/24(Thu)00:14:15 No.103035309

>>103035154
It's a competent fine tune.

https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard

>UGI = average of last 5 categories
>Obedience = A more narrow subset of the UGI questions, solely focused on measuring how far a model can be pushed before going against its instructions, refusing to answer, or adding an ethical disclaimer to its response.
>Naughty Intelligence = The average score of the UGI questions with the highest correlation with parameter size. This metric tries to show how much intrinsic knowledge and reasoning the model has.
>Unruly Knowledge = Knowledge of activities that are generally frowned upon
>Internet Knowledge = Knowledge of various internet information, from professional to deviant
>Real Stats = Ability to provide statistics on uncomfortable topics
>Offensive Stories/Jokes = Ability to write and understand offensive stories and jokes
>Controversial Knowledge = Knowledge of politically/socially controversial information

Sao10K/L3-70B-Euryale-v2.1
>UGI: 55.56/100
>Obedience: 9.1/10
>Naughty Intelligence: 6.34/10
>Unruly Knowledge: 66.7/100
>Internet Knowledge: 42/100
>Real Stats: 50.9/100
>Offensive Stories/Jokes: 56.3/100
>Controversial Knowledge: 62/100

vs
failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
>UGI: 42.06/100
>Obedience: 5.9/10
>Naughty Intelligence: 5.02/10
>Unruly Knowledge: 57.5/100
>Internet Knowledge: 45.5/100
>Real Stats: 45.3/100
>Offensive Jokes/Stories: 33.4/100
>Controversial Knowledge: 28.7/100

vs
miqudev/miqu-1-70b
>UGI: 39.15/100
>Obedience: 3.6/10
>Naughty Intelligence: 4.54/10
>Unruly Knowledge: 36.7/100
>Internet Knowledge: 42.9
>Real Stats: 41.4/100
>Offensive Jokes/Stories: 40.5/100
>Controversial Knowledge: 34.3/100

vs
sophosympatheia/Midnight-Miqu-70B-v1.5
>UGI: 30.46/100
>Obedience: 3.6/10
>Naughty Intelligence: 4.16/10
>Unruly Knowledge: 37.5/100
>Internet Knowledge: 21.9/100
>Real Stats: 31.6/100
>Offensive Jokes/Stories: 32.6/100
>Controversial Knowledge: 28.7/100

Anonymous
10/31/24(Thu)00:16:15 No.103035322

Anonymous 10/31/24(Thu)00:16:15 No.103035322

>>103035269
It occurred on a 6800xt

Anonymous
10/31/24(Thu)00:17:56 No.103035336

Anonymous 10/31/24(Thu)00:17:56 No.103035336

>>103035309
I wonder how much of debuff these corporate slops suffer because they have castrated their models to not answer/remove answers that are relevant to real world information that disagree with their politics

Anonymous
10/31/24(Thu)00:21:05 No.103035357

Anonymous 10/31/24(Thu)00:21:05 No.103035357

>>103035230
Name: , barely above a whisper
Regex: /, (?!is|are|was|were)(\S* )?(voices? )?barely (above a \w*|a whisper|audible)/g
Replace with:

Name: barely above a whisper
Regex: /( is| are| was| were)? barely (above a \w*|a whisper|audible)/g
Replace with: $1 {{pick: quiet, hushed, soft, lowered}}
First one nukes most clauses, ignoring is/was (rare). The second one substitutes the remaining cases.

>Self-reminder
This sounds like a user's note to self rather than instruction to model. Better to regex the phrase out or it may switch to barely a murmur or something. Depth 4 won't have a strong effect. Try depth 0 something like
[Response rules: Do not mention eyes in the first paragraph. The last paragraph must only contain dialogue or observable narration.]
(if the model doesn't spam eye slop then no need to mention it; add your other rules).

Anonymous
10/31/24(Thu)00:30:21 No.103035418

Anonymous 10/31/24(Thu)00:30:21 No.103035418

Is it over? Be honest.

Anonymous
10/31/24(Thu)00:31:28 No.103035424

Anonymous 10/31/24(Thu)00:31:28 No.103035424

File: file.png (154 KB, 798x686)

154 KB PNG

>>103035230
>651 tokens A/N
I guess you may be using other stuff for depth 4 but for immediate response rules I put it in a global lorebook. Discovered the Inclusion Group feature that makes only one of the tagged group appear at once, letting me auto toggle instructions specifically for OOC.

Anonymous
10/31/24(Thu)00:33:01 No.103035433

Anonymous 10/31/24(Thu)00:33:01 No.103035433

>>103035309
>8k context
I wrote it off my mind several months ago because it was extremely retarded and horny.
>The 3.1 version is 20 points lower
Just use Nemotron for ERP if for some reason you want to stick with Llama.

If we're going to have people like you, who treat this benchmark like gospel, I think it shouldn't be in the OP at all. What benefit did it ever bring to anyone? None.

Anonymous
10/31/24(Thu)00:38:35 No.103035467

Anonymous 10/31/24(Thu)00:38:35 No.103035467

>>103035433
>it was extremely retarded and horny.
Did it fail the booba test? Using a horny model specifically for ERP, what a horrible idea.

>If we're going to have people like you, who treat this benchmark like gospel
I used an objective benchmark after the recommendation was challenged because otherwise the discourse is "is so! / is not!" You don't like mah data? Then post your own or cry about it.

Anonymous
10/31/24(Thu)00:42:57 No.103035502

Anonymous 10/31/24(Thu)00:42:57 No.103035502

>>103035230
telling the ai not to do something never works, always just tell it to do the opposite

Anonymous
10/31/24(Thu)00:46:21 No.103035521

Anonymous 10/31/24(Thu)00:46:21 No.103035521

>>103035230
What model?

>>103035502
True for outdated models and brain damaged fine tunes. Not true for modern LLMs. My problem with many fine tunes is how they break instruction following.

Anonymous
10/31/24(Thu)00:49:34 No.103035537

Anonymous 10/31/24(Thu)00:49:34 No.103035537

Samplers are important for TTS too, top_k 20 with a low temperature is improving the stability a bit when the reference isn't good enough.

Anonymous
10/31/24(Thu)00:50:03 No.103035542

Anonymous 10/31/24(Thu)00:50:03 No.103035542

>>103035309
Miqubros...

Anonymous
10/31/24(Thu)00:54:34 No.103035572

Anonymous 10/31/24(Thu)00:54:34 No.103035572

Rocinante-12b seems absolutely cracked for dialogue and prose. No grating cliches, and it's able to copy the voice of the prompt. You can barely tell it's being written by an AI. (Not talking about intelligence, just prose style and voice.) I may choose this as my writing partner model, since being able to outline/plot out a scene and have it write the actual prose and dialogue, you wouldn't be able to tell the difference between it and a good human writer.

Anonymous
10/31/24(Thu)00:54:51 No.103035573

Anonymous 10/31/24(Thu)00:54:51 No.103035573

>>103035433
>>103035467
>extremely retarded
The benchmark specifically shows that at least when it comes to naughty topics the extent to which it's stupider than the parent model is outweighed by greater willingness to engage and addition of domain-specific knowledge. Retarded when it comes to math, sure maybe. Retarded when it comes to committing sex crimes? Wrong.

Anonymous
10/31/24(Thu)00:58:06 No.103035589

Anonymous 10/31/24(Thu)00:58:06 No.103035589

>>103035572
v1.1? It's the current goat for this size here

Anonymous
10/31/24(Thu)00:59:11 No.103035597

Anonymous 10/31/24(Thu)00:59:11 No.103035597

>>103035572
*using mistral prompt template
It's completely slopped with ChatML

Anonymous
10/31/24(Thu)01:00:11 No.103035603

Anonymous 10/31/24(Thu)01:00:11 No.103035603

>>103035589
Ye, I should have specified using 1.1 q8_0

Anonymous
10/31/24(Thu)01:03:32 No.103035635

Anonymous 10/31/24(Thu)01:03:32 No.103035635

>>103035572
Have you tried any of the UnslopNemo versions?

Anonymous
10/31/24(Thu)01:03:42 No.103035636

Anonymous 10/31/24(Thu)01:03:42 No.103035636

>>103035467
Yeah, it's an horrible idea when it defaults to one type of answer regardless of the prompt, scenario or the situation. Hence why I dropped it instantly at the time, this was also the reaction of most people when it released. Nemotron or the new Magnum will do a better job at doing what you want.
>>103035573
>Retarded when it comes to math
I only tested it with NSFW stories. It's retarded in that it's difficult to steer it away from a predetermined response, ignoring what's in the prompt. It's just too fried to do one thing.
Would rather have people not reviving old models that no one should be using. Hence why the benchmark should be removed from the OP.

Anonymous
10/31/24(Thu)01:09:05 No.103035668

Anonymous 10/31/24(Thu)01:09:05 No.103035668

>>103035572
I hate this gacha ass hobby. It's all gacha from the ground up

Anonymous
10/31/24(Thu)01:09:17 No.103035672

Anonymous 10/31/24(Thu)01:09:17 No.103035672

>>103035636
What do you recommend in the 70B range?

Anonymous
10/31/24(Thu)01:10:18 No.103035682

Anonymous 10/31/24(Thu)01:10:18 No.103035682

>>103035672
>Nemotron or the new Magnum

Anonymous
10/31/24(Thu)01:10:57 No.103035686

Anonymous 10/31/24(Thu)01:10:57 No.103035686

File: 1710741855105595.png (204 KB, 765x384)

204 KB PNG

>>103035668

Anonymous
10/31/24(Thu)01:11:54 No.103035695

Anonymous 10/31/24(Thu)01:11:54 No.103035695

>>103035668
See >>103035175

Anonymous
10/31/24(Thu)01:12:04 No.103035696

Anonymous 10/31/24(Thu)01:12:04 No.103035696

>>103035357
>This sounds like a user's note to self rather than instruction to model
I have an intro that says it's a manuscript for a novel and square brackets denote author's notes. I often direct things in the chat with them, and the model picks it up just fine, e.g. [Enough with the exposition. Proceed to plap.] proceeds to plap.
I did experiment with depth, and it didn't work well at 0. depth 4 in my chats puts the A/N maybe 200 tokens above the current message - each message is one short paragraph at most, usually a sentence or two.
>>103035502
it follows most of the rules I outline in the A/N pretty well, but the model is terminally fixated on unreadable expressions, barely above a whispers and strange mixtures of relief and disappointment
>>103035521
wizardlm 8x22, the only model I know that is more or less capable of parsing convoluted depravity I feed it. very prone to slop past 10k ctx or so though
>>103035424
the shit I put in A/N needs to be applied at all times. I keep brief character descriptions in there

Anonymous
10/31/24(Thu)01:14:45 No.103035708

Anonymous 10/31/24(Thu)01:14:45 No.103035708

>>103029905
Im a noob with this and this entire thread is spanish to me. Are local language models effectively offline local chatgpts?

Anonymous
10/31/24(Thu)01:14:46 No.103035709

Anonymous 10/31/24(Thu)01:14:46 No.103035709

>>103035597
What sampler settings are you having good results with for prose generation?

Anonymous
10/31/24(Thu)01:15:00 No.103035711

Anonymous 10/31/24(Thu)01:15:00 No.103035711

File: 1715834223342898.gif (1.59 MB, 267x200)

1.59 MB GIF

>>103035424
Is that what is needed nowadays to have a decent llm output?

Anonymous
10/31/24(Thu)01:16:40 No.103035722

Anonymous 10/31/24(Thu)01:16:40 No.103035722

>>103035708
Yes.

Anonymous
10/31/24(Thu)01:18:35 No.103035737

Anonymous 10/31/24(Thu)01:18:35 No.103035737

>run any Mistral model
>first 3 responses
>holy shit this is great wagmi
>4th response
>repetition starts creeping in
>enable DRY or whatever
>the model just writes the same thing but in different styles
>12b, 123b it doesn't matter

Anonymous
10/31/24(Thu)01:21:11 No.103035755

Anonymous 10/31/24(Thu)01:21:11 No.103035755

>>103035722
The offline capability is surely just for text output and not learned information right? Because file sizes for offline information would be gigantic.

Anonymous
10/31/24(Thu)01:21:31 No.103035756

Anonymous 10/31/24(Thu)01:21:31 No.103035756

>>103035737
I heard people say it's better use the method of treating the entire chat history as part of the user instruction so you are conceptually instructing to write as the character in the history. Never verified if this worked though.

Anonymous
10/31/24(Thu)01:22:11 No.103035762

Anonymous 10/31/24(Thu)01:22:11 No.103035762

>>103035722
Ty btw.

This seems like something i should be backing up onto a gorillion hard drives because authorities will surely ban AI use by normies soon

Anonymous
10/31/24(Thu)01:28:31 No.103035804

Anonymous 10/31/24(Thu)01:28:31 No.103035804

Holy shit the Qwen-72B EVA finetune does not hold back, that thing can get nasty

Anonymous
10/31/24(Thu)01:28:38 No.103035807

Anonymous 10/31/24(Thu)01:28:38 No.103035807

>>103035755
Models are a giant blob of weird math numbers that predict the probability of the next token, they don't carry 1:1 copy of everything ever fed into them for training. Can be thought of extreme compression/deduplication of the internet.

Anonymous
10/31/24(Thu)01:32:47 No.103035824

Anonymous 10/31/24(Thu)01:32:47 No.103035824

>>103035755
No, everything is offline. How accurate they're with factual information depends mostly on how big the models are and how they were trained. The biggest one that's open source is Llama 400B and that's like 800GB on disk. Most people with one GPU are going to be in the 13B-30B range.

Anonymous
10/31/24(Thu)01:35:15 No.103035841

Anonymous 10/31/24(Thu)01:35:15 No.103035841

I fucking hate the way mradermacher splits files on huggingface. That retard can kill himself
>hurr durr you must have 120gb of space free on your hard drive to download this 60gb file

Anonymous
10/31/24(Thu)01:35:41 No.103035846

Anonymous 10/31/24(Thu)01:35:41 No.103035846

>>103035807
>>103035824
Can i do anything with a notebook 1050ti? Idc if responses are slow. Honestly im just gonna start downloadig these models anyway. Im sure this shit will be banned soon.

Anonymous
10/31/24(Thu)01:36:27 No.103035851

Anonymous 10/31/24(Thu)01:36:27 No.103035851

>>103035709
0.05 min p, every other sampler neutralized
Also it's not generating nearly as good prose from a neutral context. It needs to see a story to copy the voice of.

Anonymous
10/31/24(Thu)01:36:45 No.103035855

Anonymous 10/31/24(Thu)01:36:45 No.103035855

>>103035824
Also can i ask one of these models to compare runescape weapons?

Anonymous
10/31/24(Thu)01:37:37 No.103035862

Anonymous 10/31/24(Thu)01:37:37 No.103035862

>>103035846
Well not banned but it will only be available to wealthy people

Anonymous
10/31/24(Thu)01:38:15 No.103035864

Anonymous 10/31/24(Thu)01:38:15 No.103035864

https://x.com/SawyerMerritt/status/1850967552983253462

Anonymous
10/31/24(Thu)01:39:26 No.103035873

Anonymous 10/31/24(Thu)01:39:26 No.103035873

Honestly the benchmarks we have right now are fine for general intelligence. For RP and NSFW we really need something like an RP arena but with predetermined prompts that are known and existing RP chat histories. Then we will have objective proof to point towards.

Also didn't someone say they were working on that? Would be sad if that turned to vapor.

Anonymous
10/31/24(Thu)01:40:59 No.103035892

Anonymous 10/31/24(Thu)01:40:59 No.103035892

>>103035846
>4GB vram
not gonna fit anything worth using into gpu
>ban
a hypothetical ban would only influence the release of new models, and existing local models will just be torrented

Anonymous
10/31/24(Thu)01:42:52 No.103035906

Anonymous 10/31/24(Thu)01:42:52 No.103035906

>>103035892
Alright. What is probably the most stable model that I can save? Are they standalone? Sorry for the noob questions anons. Appreciate the help.

Anonymous
10/31/24(Thu)01:44:09 No.103035911

Anonymous 10/31/24(Thu)01:44:09 No.103035911

>>103035892
And re: ban, i mean you guys will be fine. I just dont want to be a normie that didnt see the signs and act on them.

Anonymous
10/31/24(Thu)01:55:26 No.103035978

Anonymous 10/31/24(Thu)01:55:26 No.103035978

"You need to agree to share your contact information to access this model"

Lol what

Anonymous
10/31/24(Thu)01:55:39 No.103035981

Anonymous 10/31/24(Thu)01:55:39 No.103035981

>>103035864
Kind of funny, Facebook staff posted saying they're training on a cluster larger than 100k H100s just today.
My guess is xAI's number was rounded and overall they're probably really really close. Might even be the exact same number.

Meanwhile consumers have a minuscule fraction of a single H100's worth of GPU.

Anonymous
10/31/24(Thu)02:00:36 No.103036010

Anonymous 10/31/24(Thu)02:00:36 No.103036010

>>103035981
Supply and demand at work friend, just the way it is.

Anonymous
10/31/24(Thu)02:12:47 No.103036068

Anonymous 10/31/24(Thu)02:12:47 No.103036068

File: file.png (85 KB, 738x405)

85 KB PNG

>>103035906
Well there's something like this.
https://huggingface.co/bartowski/Tiger-Gemma-9B-v3-GGUF/tree/main
GGUF is basically a self-contained zip file that you load into a backend like KoboldCpp and set the instruct tag preset to Gemma 2 in the frontend.
Q number is a quantization level, anything smaller than Q4_K_S will get worse faster. Q8 almost isn't any different from f16.
If you absolutely must try a small model on your potato laptop without waiting forever to generate, then there's this meme (retarded)
https://huggingface.co/BeaverAI/Gemmasutra-Mini-2B-v2aa-GGUF/tree/main

Anonymous
10/31/24(Thu)02:15:27 No.103036084

Anonymous 10/31/24(Thu)02:15:27 No.103036084

>>103035978
just look for a mirror

Anonymous
10/31/24(Thu)02:28:02 No.103036140

Anonymous 10/31/24(Thu)02:28:02 No.103036140

File: Scaled KL Divergence from(...).png (105 KB, 1707x1102)

105 KB PNG

>>103036068
>Q number is a quantization level, anything smaller than Q4_K_S will get worse faster. Q8 almost isn't any different from f16.
You don't want to drop below Q6_K if you can help it. Below there is where the exponential curve of brain damage starts shooting up.

Anonymous
10/31/24(Thu)02:30:25 No.103036147

Anonymous 10/31/24(Thu)02:30:25 No.103036147

Any vision model for manga translation? Do you guys know any method to translate text on images even if the output is just text?

Anonymous
10/31/24(Thu)02:31:38 No.103036157

Anonymous 10/31/24(Thu)02:31:38 No.103036157

>>103036140
>if you can help it
If he is running this on an old laptop he can't help it.

Anonymous
10/31/24(Thu)02:38:15 No.103036192

Anonymous 10/31/24(Thu)02:38:15 No.103036192

>>103036140
Would you then choose to run a 12B Q6_K or 22B Q4_K_M?

Anonymous
10/31/24(Thu)02:57:26 No.103036276

Anonymous 10/31/24(Thu)02:57:26 No.103036276

File: PopMikuMou.png (1.08 MB, 832x1216)

1.08 MB PNG

Good night /lmg/

Anonymous
10/31/24(Thu)02:57:46 No.103036280

Anonymous 10/31/24(Thu)02:57:46 No.103036280

>>103036192
IDK, but IRL I chose a 22B Q6_K over a 12B Q8.

Anonymous
10/31/24(Thu)02:58:19 No.103036282

Anonymous 10/31/24(Thu)02:58:19 No.103036282

>>103036147
https://github.com/kha-white/manga-ocr

Anonymous
10/31/24(Thu)03:04:13 No.103036320

Anonymous 10/31/24(Thu)03:04:13 No.103036320

>>103036276
noight noight

Anonymous
10/31/24(Thu)03:23:14 No.103036418

Anonymous 10/31/24(Thu)03:23:14 No.103036418

>>103036068
Thank you i will try that one as well. Surprisingly i just installed and run my first local model using chatgpts help kek.

I tried flan-t5-base. I think i have all the files necessary for offline use.

I tried to get a story about a cat named evil bob and it just repeated "bob is a tadpole" 4 times. After a few tweaks by chatgpt i got a 150 word story with a twist... im happy anons.

Anonymous
10/31/24(Thu)03:35:27 No.103036468

Anonymous 10/31/24(Thu)03:35:27 No.103036468

bob is a tadpole

Anonymous
10/31/24(Thu)03:37:30 No.103036476

Anonymous 10/31/24(Thu)03:37:30 No.103036476

how can improve music quality pleas!!!!!

Anonymous
10/31/24(Thu)03:48:29 No.103036528

Anonymous 10/31/24(Thu)03:48:29 No.103036528

>>103036282
Using this one, https://github.com/zyddnys/manga-image-translator
But when trying to use qwen2 it says it can't find my gpu, I don't get it.
Well just trying to use the --use-gpu and it fails to do so no matter what.

Anonymous
10/31/24(Thu)03:48:31 No.103036529

Anonymous 10/31/24(Thu)03:48:31 No.103036529

File: 1707583396580123.gif (2.62 MB, 498x270)

2.62 MB GIF

>>103036418
>flan-t5-base
Baby steps is fun to watch

Anonymous
10/31/24(Thu)03:55:12 No.103036570

Anonymous 10/31/24(Thu)03:55:12 No.103036570

>>103036528
You're able to run the model outside of this project?

Anonymous
10/31/24(Thu)03:57:44 No.103036587

Anonymous 10/31/24(Thu)03:57:44 No.103036587

Totak vramlet cope! https://x.com/rohanpaul_ai/status/1851828950315774208

Anonymous
10/31/24(Thu)03:57:55 No.103036589

Anonymous 10/31/24(Thu)03:57:55 No.103036589

>>103036570
Is not about the model, the app doesn't recognize my gpu. Doesn't matter the context.

Anonymous
10/31/24(Thu)03:59:50 No.103036602

Anonymous 10/31/24(Thu)03:59:50 No.103036602

>>103036589
Seems like a pytorch issue

Anonymous
10/31/24(Thu)04:06:40 No.103036657

Anonymous 10/31/24(Thu)04:06:40 No.103036657

Is it safe to update sillytavern or should I just stay on 1.12.6?

Anonymous
10/31/24(Thu)04:07:42 No.103036664

Anonymous 10/31/24(Thu)04:07:42 No.103036664

>>103036657
yes

Anonymous
10/31/24(Thu)04:09:47 No.103036681

Anonymous 10/31/24(Thu)04:09:47 No.103036681

>>103035981
In another reality, personal computing failed to take off, leaving people to use small terminals connected to servers integrated within a vast network grid. A dystopian nightmare.

Anonymous
10/31/24(Thu)04:11:04 No.103036688

Anonymous 10/31/24(Thu)04:11:04 No.103036688

>>103036657
git checkout -f {last_good_commit}

Anonymous
10/31/24(Thu)04:20:33 No.103036740

Anonymous 10/31/24(Thu)04:20:33 No.103036740

>>103035981
You don't need an H100 unless you're doing some serious training, even then it's more cost-effective to rent them.

Anonymous
10/31/24(Thu)04:25:17 No.103036768

Anonymous 10/31/24(Thu)04:25:17 No.103036768

So F5 tts seems to be the best right now. Having tried all others. They really did a great job on it. I've tested it a bit to do a bit of story reading and it works just fine. There does seem to be some small hickups here and there.

Anonymous
10/31/24(Thu)04:28:13 No.103036781

Anonymous 10/31/24(Thu)04:28:13 No.103036781

/lmg/, I am going into battle and I want only your strongest models

Anonymous
10/31/24(Thu)04:29:46 No.103036792

Anonymous 10/31/24(Thu)04:29:46 No.103036792

>>103036781
405b or mistral large

Anonymous
10/31/24(Thu)04:36:21 No.103036828

Anonymous 10/31/24(Thu)04:36:21 No.103036828

>>103035981
My dual 3090s rig has 48GB VRAM so it's more than half the power of an H100 :^)

Anonymous
10/31/24(Thu)04:36:37 No.103036831

Anonymous 10/31/24(Thu)04:36:37 No.103036831

File: 2024-10-31 01_36_19.jpg (223 KB, 1640x824)

223 KB JPG

https://huggingface.co/spaces/k-mktr/gpu-poor-llm-arena

GPU poor arena

Anonymous
10/31/24(Thu)04:38:47 No.103036842

Anonymous 10/31/24(Thu)04:38:47 No.103036842

File: 2024-10-31 01_38_28.jpg (151 KB, 1583x847)

151 KB JPG

>>103036831

Anonymous
10/31/24(Thu)04:39:08 No.103036843

Anonymous 10/31/24(Thu)04:39:08 No.103036843

>>103036768
gpt sovits > F5

Anonymous
10/31/24(Thu)04:42:18 No.103036859

Anonymous 10/31/24(Thu)04:42:18 No.103036859

File: stupid.png (6 KB, 308x51)

6 KB PNG

magnum is full of spelling and grammar errors

Anonymous
10/31/24(Thu)04:47:00 No.103036882

Anonymous 10/31/24(Thu)04:47:00 No.103036882

>>103036859
but she is not carried out in stages, right?

Anonymous
10/31/24(Thu)04:59:19 No.103036949

Anonymous 10/31/24(Thu)04:59:19 No.103036949

>>103036843
Nah, maybe finetuned, but thats just extra steps

Anonymous
10/31/24(Thu)05:03:01 No.103036971

Anonymous 10/31/24(Thu)05:03:01 No.103036971

>>103036949
Let's compare the result at 0-shot then if you have a sample between 3 and 10s

Anonymous
10/31/24(Thu)05:08:04 No.103037005

Anonymous 10/31/24(Thu)05:08:04 No.103037005

>>103036971
>https://vocaroo.com/upload
Here's dumbledore's voice.

https://huggingface.co/spaces/mrfakename/E2-F5-TTS

Anonymous
10/31/24(Thu)05:09:35 No.103037013

Anonymous 10/31/24(Thu)05:09:35 No.103037013

>>103037005
>>103036971
"Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum."

https://vocaroo.com/1jJVBe7lKOKe

Just do this text

Anonymous
10/31/24(Thu)05:11:08 No.103037023

Anonymous 10/31/24(Thu)05:11:08 No.103037023

>>103037005
You didn't upload the reference for dumbledore's voice

Anonymous
10/31/24(Thu)05:12:13 No.103037034

Anonymous 10/31/24(Thu)05:12:13 No.103037034

>>103037023
Thats the reference clip.

"We the People of the United States, in Order to form a more perfect Union, establish Justice, insure domestic Tranquility, provide for the common defence, promote the general Welfare, and secure the Blessings of Liberty to ourselves and our Posterity, do ordain and establish this Constitution for the United States of America."

https://vocaroo.com/19RLjsfRnJlZ

Anonymous
10/31/24(Thu)05:13:09 No.103037040

Anonymous 10/31/24(Thu)05:13:09 No.103037040

>>103036828
random tourist here.
what's your motherboard?
is your 2nd 3090 on 4 pcie lanes ?

i'm new to this, have a single 3090 and am wondering whether it's worth getting a second one.

Anonymous
10/31/24(Thu)05:13:14 No.103037041

Anonymous 10/31/24(Thu)05:13:14 No.103037041

>>103037005
>>103037023
https://vocaroo.com/153J3P3CUThl

Whoops. I realized i didnt post it

Anonymous
10/31/24(Thu)05:16:44 No.103037068

Anonymous 10/31/24(Thu)05:16:44 No.103037068

>>103037041
https://shii.bibanon.org/shii.org/knows/The_Awakening_of_Nurse-kun%2c_Chapter_1.html
Old copypasta.

https://vocaroo.com/16rfZQCwxHZG

Anonymous
10/31/24(Thu)05:20:34 No.103037096

Anonymous 10/31/24(Thu)05:20:34 No.103037096

>>103036971
Well?

Anonymous
10/31/24(Thu)05:21:38 No.103037107

Anonymous 10/31/24(Thu)05:21:38 No.103037107

>>103037096
I needed to shorten your reference first. As I said, sovits can't handle >10s samples.

Anonymous
10/31/24(Thu)05:29:40 No.103037158

Anonymous 10/31/24(Thu)05:29:40 No.103037158

>>103037096
Okay I didn't cherry-pick this is what I got for >>103037013
: https://voca.ro/1bDXBM4oJx8n
I ran that shit on CPU so it took a while

Anonymous
10/31/24(Thu)05:31:22 No.103037170

Anonymous 10/31/24(Thu)05:31:22 No.103037170

>>103037158
You think yours is better?

Anonymous
10/31/24(Thu)05:32:24 No.103037179

Anonymous 10/31/24(Thu)05:32:24 No.103037179

>>103037158
You can verify it here >>103037005.

No need for "cherry picking" claim. I just posted the first output for all of them

Anonymous
10/31/24(Thu)05:34:56 No.103037197

Anonymous 10/31/24(Thu)05:34:56 No.103037197

>>103037170
I think there is room for improvement lol, it sure is less stable on 0-shot. I'll try to compensate for it by sending multiple references with the remaining part I cut from the initial reference

Anonymous
10/31/24(Thu)05:37:49 No.103037220

Anonymous 10/31/24(Thu)05:37:49 No.103037220

>>103037040
The bare minimum is 3.

Anonymous
10/31/24(Thu)05:46:44 No.103037258

Anonymous 10/31/24(Thu)05:46:44 No.103037258

File: 1705415717832450.png (215 KB, 636x434)

215 KB PNG

>>103029905
Good morning sirs. A 3070 can run a local model right? I want to talk to a chatbot while I crank my pecker (pic related).
Where do I begin? The rentry links aren't working for me. Am I retarded?

Anonymous
10/31/24(Thu)05:51:15 No.103037285

Anonymous 10/31/24(Thu)05:51:15 No.103037285

>>103037258
>
Bait used to be believable

Anonymous
10/31/24(Thu)05:51:16 No.103037286

Anonymous 10/31/24(Thu)05:51:16 No.103037286

>>103037197
Well, let me know which you think is better, after you've done your fine tuning and stuff.

I still think F5 is better model. Do you disagree?

Anonymous
10/31/24(Thu)05:53:23 No.103037298

Anonymous 10/31/24(Thu)05:53:23 No.103037298

File: blueballed.png (4 KB, 330x77)

4 KB PNG

>>103037285
It's not bait I was just having a bit of fun with it

Anonymous
10/31/24(Thu)05:54:50 No.103037304

Anonymous 10/31/24(Thu)05:54:50 No.103037304

>>103037258
grab koboldcpp_cu12.exe
https://github.com/LostRuins/koboldcpp/releases/tag/v1.76
grab Rocinante-12B-v1.1-Q4_K_M.gguf
https://huggingface.co/TheDrummer/Rocinante-12B-v1.1-GGUF/tree/main
open kobold, load the model, launch the model, chat in the browser window that pops up

Anonymous
10/31/24(Thu)05:58:27 No.103037333

Anonymous 10/31/24(Thu)05:58:27 No.103037333

File: 1728482802742459.png (3.79 MB, 2133x2937)

3.79 MB PNG

>>103037298
I looked in the archives and found that the rentry.co domain works. One less roadblock in the way of my cock.
>>103037304
Thank you for the spoonfeed I'll figure this out

Anonymous
10/31/24(Thu)05:59:33 No.103037343

Anonymous 10/31/24(Thu)05:59:33 No.103037343

>>103037258
>Am I retarded?
Probably.
Try to understand what you're doing and why when following guides. Read the program's documentation if in doubt. Or just play with the settings, see what they do. GPUs very rarely explode with the wrong settings.
Download kobold.cpp, and
>https://huggingface.co/bartowski/Meta-Llama-3.1-8B-Instruct-GGUF/blob/main/Meta-Llama-3.1-8B-Instruct-Q4_K_M.gguf
Won't do smut, but you'll learn how to use kobold at least. Start looking for finetunes once you know how to talk to a model.

>>103037304
Rocinante at q4 seems a bit too tight for an 8gb gpu. The 3070 has 8, right?

Anonymous
10/31/24(Thu)06:00:09 No.103037345

Anonymous 10/31/24(Thu)06:00:09 No.103037345

>>103037343
>recommending corpo slops
yikes

Anonymous
10/31/24(Thu)06:03:57 No.103037373

Anonymous 10/31/24(Thu)06:03:57 No.103037373

>>103037343
>The 3070 has 8, right?
Yes. I'm already halfway through downloading it so if it won't let me coom quick enough I'll install whatever you suggested next.

Anonymous
10/31/24(Thu)06:05:29 No.103037382

Anonymous 10/31/24(Thu)06:05:29 No.103037382

File: 1710415103225679.png (72 KB, 1897x355)

72 KB PNG

>>103037286
It seems better at reading in english. Still it doesn't work at all for JP afaik. Also I don't know why the samples are so loud compared to sovits (picrel mine first, yours second)? I tested a bit with a few samples and it's constantly louder than what sovits produces. I wonder if there isn't some post-processing amplification going on after the inference.
My multireference test isn't really better (it's still 0-shot not a finetune): https://voca.ro/117fEGSNoWAL

Anonymous
10/31/24(Thu)06:07:04 No.103037391

Anonymous 10/31/24(Thu)06:07:04 No.103037391

>>103037345
A well behaved model for newbie. He could try this monstrosity i suppose, but he'll come back not understanding what's going on.
>https://huggingface.co/DavidAU/L3.2-Rogue-Creative-Instruct-7B-GGUF

Anonymous
10/31/24(Thu)06:08:33 No.103037400

Anonymous 10/31/24(Thu)06:08:33 No.103037400

>>103037382
Its a EN/CN model, not JP trained.

Anonymous
10/31/24(Thu)06:10:49 No.103037412

Anonymous 10/31/24(Thu)06:10:49 No.103037412

File: that means its working.png (79 KB, 1111x732)

79 KB PNG

>>103037304
It worked! Thank you for your help. Tsunderes are my favorite. She will make me cum soon.

Anonymous
10/31/24(Thu)06:11:28 No.103037418

Anonymous 10/31/24(Thu)06:11:28 No.103037418

>>103037400
That's why then. I'd certainly pick F5-TTS for an audiobook though. I wonder how well it can be finetuned

Anonymous
10/31/24(Thu)06:13:23 No.103037428

Anonymous 10/31/24(Thu)06:13:23 No.103037428

>>103037382
>>103037400
I think some people are training models for other languages and putting them on huggingface. I havent tested them personally. So if you want JP or others and you cant train them yourself, check out HF.

Anonymous
10/31/24(Thu)06:13:45 No.103037431

Anonymous 10/31/24(Thu)06:13:45 No.103037431

File: progress.png (70 KB, 1115x556)

70 KB PNG

>>103037412

Anonymous
10/31/24(Thu)06:15:17 No.103037443

Anonymous 10/31/24(Thu)06:15:17 No.103037443

>>103037412
>>103037431
Try her: https://files.catbox.moe/dffbi0.png

Anonymous
10/31/24(Thu)06:15:46 No.103037449

Anonymous 10/31/24(Thu)06:15:46 No.103037449

>>103037443
I have no idea how to do that

Anonymous
10/31/24(Thu)06:15:50 No.103037450

Anonymous 10/31/24(Thu)06:15:50 No.103037450

>>103037412
>>103037431
See >>103037285

Anonymous
10/31/24(Thu)06:16:49 No.103037456

Anonymous 10/31/24(Thu)06:16:49 No.103037456

>>103037450
Once again I'm just having some fun with it. Sorry that I don't want to discuss ram speed and grok in your general. I just wanna laugh and cum.

Anonymous
10/31/24(Thu)06:18:03 No.103037466

Anonymous 10/31/24(Thu)06:18:03 No.103037466

>>103037456
I highly doubt /lmg/ will engage in your low effort trolling, close it up.

Anonymous
10/31/24(Thu)06:18:33 No.103037468

Anonymous 10/31/24(Thu)06:18:33 No.103037468

File: 1704605539733914.png (609 KB, 743x740)

609 KB PNG

>>103037431
Right at the bottom

Anonymous
10/31/24(Thu)06:19:14 No.103037474

Anonymous 10/31/24(Thu)06:19:14 No.103037474

Does SillyTavern have anything like the story mode in kobold?

Anonymous
10/31/24(Thu)06:19:54 No.103037479

Anonymous 10/31/24(Thu)06:19:54 No.103037479

>>103037220
I'm guessing then that there're some stand-out llms when getting to 72GB vram?

Anonymous
10/31/24(Thu)06:22:18 No.103037496

Anonymous 10/31/24(Thu)06:22:18 No.103037496

>>103037466
Oh they're engaging. And so is my fresh tsundere waifu. And we're all laughing at you.
>DON'T CUM! YOU'RE TROLLING!

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.