/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

Thread archived.
You cannot reply anymore.

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous

/lmg/ - Local Models General 10/03/24(Thu)21:08:13 No.102674638

File: BosniaHerzegovinanKnockof(...).png (2.26 MB, 1280x1640)

2.26 MB PNG

/lmg/ - Local Models General Anonymous 10/03/24(Thu)21:08:13 No.102674638 Archived

/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102663772 & >>102654480

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp

Anonymous
10/03/24(Thu)21:08:38 No.102674646

Anonymous 10/03/24(Thu)21:08:38 No.102674646

File: miku hand gesture point m(...).png (116 KB, 320x448)

116 KB PNG

►Recent Highlights from the Previous Thread: >>102663772

--Mistral Nemo GGUF replacement and settings discussion:
>102663996 >102664261 >102664483 >102664589 >102664619 >102664707 >102664879 >102664965 >102664966 >102665564
--Krebs on Security article about AI sex bots and cloud compromise:
>102665407 >102668510 >102665725 >102665761 >102665505 >102665550
--Discussion on overused phrases, token banning, and creativity benchmarks:
>102673168 >102673297 >102673314 >102673324 >102673377 >102673346 >102673499 >102673430 >102673507 >102673549 >102673579 >102674288 >102674296 >102673632 >102674333 >102673724 >102673824 >102674037 >102674202 >102674316 >102673523 >102673561 >102673765
--Resolved llama.cpp crash by downgrading kernel from 6.11.1 to 6.10.6:
>102671604 >102671634 >102671639 >102672355 >102672397 >102672469 >102672533 >102673237
--Recent 5% speed boost in CPU inference for llama.cpp:
>102668101 >102668973 >102669056 >102669193
--Reasons for processing time differences between messages in chat:
>102663821 >102664017 >102664386
--Qwen2.5-32B recommended for 30B/24GB model:
>102671644 >102671673 >102671788 >102672221 >102672506 >102673061 >102673350 >102673398
--LiveBench and WildBench considered better leaderboards than Chatbot Arena:
>102664841 >102664867 >102664876 >102665016
--FLUX1.1 [pro] and [dev] announced, users discuss open weights and variants:
>102665211 >102665241 >102665288 >102665305 >102665341 >102665357
--Effectiveness of different datasets used for finetuning Qwen2.5-32B-AGI model:
>102665921 >102665959 >102666005 >102668269 >102668503 >102672484
--Antislop-sampler and TabbyAPI string ban comparison:
>102665533 >102665554 >102665603 >102665688 >102665835
--Miku & Rin (free space):
>102663922 >102663925 >102664589 >102666327 >102666616 >102671580 >102671826 >102671919

►Recent Highlight Posts from the Previous Thread: >>102663782

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script

Anonymous
10/03/24(Thu)21:10:16 No.102674668

Anonymous 10/03/24(Thu)21:10:16 No.102674668

So, there are different subcategories of creativity. One is word and phrase-level. Things like "sparkling" vs "twinkling". And another is scenario-level. Perhaps we can concretely define creativity generally, and fairly, as how probable an entity is to output ideas that are different from what it output in the past, whether at the word or scenario-level, that is also at the same time an idea that is logical and coherent. So for a model, it would be how likely the model is to not generate the same things it did earlier in context, when prompted to answer differently or to be creative.

Measuring creativity at the word-level could be not too difficult. It's probably fine to analyze logit distributions somehow, and we could also look at the developing methods people are coming up with for measuring slop.

Scenario-level is the hard one, since LLMs are auto-regressive and often don't have a very solid "plan" in mind for where it wants to take the story next, so even a slightly different token choice for something can make it go in a completely different direction (or in the case of slop, in a repetitive direction) unless it was trained to stop itself and course correct. Potentially COT + RL methods could improve models here, but that's a solution that needs specialized data that doesn't generalize from math/code-based COT data, as we've learned from o1. But still that's not a measurement method.

Anonymous
10/03/24(Thu)21:11:09 No.102674677

Anonymous 10/03/24(Thu)21:11:09 No.102674677

After testing tons of fine tunes, nemo-instruct really is the best at roleplaying while using lorebooks.

Anonymous
10/03/24(Thu)21:11:55 No.102674687

Anonymous 10/03/24(Thu)21:11:55 No.102674687

Someone needs to train something like o1 for RP, the dataset could be something like:

User: aah aah mistress...

Assistant: <thinking>
First, what is going on here?

The user wrote:

"aah aah mistress"

Observation 1: We can see that the user is acting, intentionally or unintentionally, like a retard.

So I should bully him for his lack of mental faculties.

First, write a stretch of the first line

```
"Take your meds, idiot." She says, her eyes sparkling with mischief as she feels a mix of disgust and anger.
```

Hmm.

But actually the use of phrases like "eyes sparkling" is considered slop according to the rules.

Idea: I could rewrite just that part, without the slop.

Let's test this theory.


```
"Take your meds, idiot." She says, her eyes narrowing with anger as she feels a mix of disgust and anger.
```

That's better.

Wait a minute.

Feeling a mix of emotions is also slop according to the rules.

Repeating "anger" feels awkward.

Idea: Rewrite again avoiding repetition.

I will try that.

```
"Take your meds, idiot." She says, her eyes narrowing with anger as it threatens to boil over.
```

That's better.

...

(Based on: https://rentry.org/openai1)

Anonymous
10/03/24(Thu)21:13:03 No.102674702

Anonymous 10/03/24(Thu)21:13:03 No.102674702

>>102674487
Actually a thesaurus model is pretty interesting, since you could pair it with a regular LLM, and the injection of word-level creativity could make the LLM think differently without affecting the logic of what it intended to generate.

Though perhaps the more computationally efficient approach is to use some type of a repetition sampler that works at the string (not logit) level, so it would be like a combination of a traditional repetition sampler with the recent deslopping sampler, except it also has a recorded history of all context you as a user have ever encountered, so when you start a new chat or you're jumping to a different chat, it still knows to not repeat stuff you just read elsewhere.

Anonymous
10/03/24(Thu)21:16:25 No.102674747

Anonymous 10/03/24(Thu)21:16:25 No.102674747

File: igedBtt.jpg (390 KB, 1242x1211)

390 KB JPG

>>102674646
>--Miku & Rin (free space)

Anonymous
10/03/24(Thu)21:19:12 No.102674778

Anonymous 10/03/24(Thu)21:19:12 No.102674778

>>102674677
>he finally learned that basically all third-party tunes are memes
Congratulations.

Anonymous
10/03/24(Thu)21:19:24 No.102674782

Anonymous 10/03/24(Thu)21:19:24 No.102674782

>>102674646
>9 reply limit
Should be all spent on free space.

Anonymous
10/03/24(Thu)21:22:03 No.102674806

Anonymous 10/03/24(Thu)21:22:03 No.102674806

Mikulove

Anonymous
10/03/24(Thu)21:22:36 No.102674814

Anonymous 10/03/24(Thu)21:22:36 No.102674814

>>102674687
It's an interesting idea. Though personally I'd like to see people focus on advancing sampling techniques even more than they have so far, as hidden thinking stuff is not cheap/fast for inference.

Anonymous
10/03/24(Thu)21:22:47 No.102674816

Anonymous 10/03/24(Thu)21:22:47 No.102674816

>>102674702
>since you could pair it with a regular LLM
Stacking more models on top of each other adds an extra model to wrangle. "And the magical orb fell to the floor, sparkling with [thesaurus interjecting at exactly the wrong time] mischief, dropped her panties and fucked everyone in the room".

The second idea seems to involve an finite, but large and growing list of things the user has already read. If we're gonna fantasize, i'd rather think of a model that has continuous training from it's interaction with the user, being able to be molded over time. Something closer to actual learning.

Anonymous
10/03/24(Thu)21:25:00 No.102674844

Anonymous 10/03/24(Thu)21:25:00 No.102674844

>>102674814
sampling is a meme, anything other than minp+temperature is cope.

Anonymous
10/03/24(Thu)21:25:14 No.102674845

Anonymous 10/03/24(Thu)21:25:14 No.102674845

>>102674806
you can't love miku she is a computer

Anonymous
10/03/24(Thu)21:33:04 No.102674914

Anonymous 10/03/24(Thu)21:33:04 No.102674914

>>102674816
For the first idea, I think it could be streamlined by basically being treated as a sampler that backtracks like the deslopping sampler. It's not like backtracking can't be done.

Continuous learning would be nice yeah. Though I think that's orders of magnitude more complex to solve for a random developer who also probably doesn't have intimate knowledge of LLM architectures and how they work. There is a ton more that can done with sampling that is more intelligent and complex than what has been developed so far and it's my belief that there is more that can be squeezed out of that step, even if we have to use some kind of additional small model in the process, since that would STILL be more efficient than fully making a model to a long COT. And it could even be used alongside the COT to diversify the LLM's "thoughts".

Anonymous
10/03/24(Thu)21:34:11 No.102674924

Anonymous 10/03/24(Thu)21:34:11 No.102674924

>>102674844
Existing samplers are a meme, yes.

Anonymous
10/03/24(Thu)21:34:15 No.102674925

Anonymous 10/03/24(Thu)21:34:15 No.102674925

>>102674816
> i'd rather think of a model that has continuous training from it's interaction with the user
Wait, why isn't this done already?
Taking the last swipe and doing a training run on it should in time tailor the model towards the user's preference, right?

Anonymous
10/03/24(Thu)21:41:35 No.102674997

Anonymous 10/03/24(Thu)21:41:35 No.102674997

>>102674924
Samplers can't do magic, if the model sucks at giving tokens the right probabilities then it's over before it even began.

Anonymous
10/03/24(Thu)21:43:09 No.102675017

Anonymous 10/03/24(Thu)21:43:09 No.102675017

>>102674925
>Wait, why isn't this done already?
Requirements, even for a single training run, are high. Most people still struggle to run a 70b quantized down to hell, counting layers and context tokens before the model goes nuts. Once you solve the technical issue, there's how much of an effect that one training run should have. We see people that train models all the time overfitting over and over again. But also, you don't want it to have so little effect that it's imperceptible. I don't think training on a single example, even if you have many, is going to be good enough.
Even if/when we get cheap 1-example training, there's still a bunch of other problems to solve.
But i'd like something more malleable than what we have. Weights that change. Biases that get modified during inference and cause an actual change in the model, hopefully, without destroying it and having to start all over again. A man can dream...

Anonymous
10/03/24(Thu)21:48:52 No.102675059

Anonymous 10/03/24(Thu)21:48:52 No.102675059

>>102675017
I'm ignorant of technical stuff, how high is the cost compared to inference to tune, let's say nemo, for 250 tokens worth? If there's a v-ram requirement increase, by how much?

Anonymous
10/03/24(Thu)21:55:00 No.102675101

Anonymous 10/03/24(Thu)21:55:00 No.102675101

>>102675017
It would still be possible to have the model summarize bullet points after the session, store it in a database then retrieve it through a RAG based approach, making the weight of things slowly decrease with time, and only once in a while finetune a new LORA, merge it with the older ones, and also giving recent LORAs more weight than older ones during the merges, etc.

But for a lot of applications, wouldn't just the RAG part be enough? Frameworks like txtai lets you summarize and search/retrieve things while weighting stuff by date it was stored, so the pieces seem to be there.

Anonymous
10/03/24(Thu)21:59:26 No.102675145

Anonymous 10/03/24(Thu)21:59:26 No.102675145

>>102675059
You can try this https://rahulschand.github.io/gpu_poor/

Anonymous
10/03/24(Thu)22:00:24 No.102675153

Anonymous 10/03/24(Thu)22:00:24 No.102675153

>>102675059
Most training is done at full resolution. Can we even train quantized models at 8 or 4 bit already? I'm ok with a 24GB min requirement, even if i don't have the hardware.

>>102674925
>>102675017 (cont)
I had a thought, so let me spill this useless idea that i'll never get to implement and depend on things that i'm still not sure how effective they are. [Where the fuck did i leave my idea guy's hat...]

Live control vectors.
They work on the principle that the difference in internal state from opposite/contrasting prompts can be used to influence the layers of the model one way or another to influence the token probability. I played with them. They have *some* effect.
So... we get the starting "blank slate" state of the model. Give it a prompt, do whatever you do with them, and THEN calculate the difference between the starting state and the current one. Do the control vector diff stuff and save that to a separate file. The file keeps the original vector state and the latest vector diff. When one runs the model again, the latest vector state is applied to the model and chat goes on with the influence of the last chat's vector, which keep being modified as the chat progresses. You just carry the vector diff along with the chats. Model file remains unchanged.
No new information can be learned, and it's not gonna remember past conversations, but it will influence the token choice over time.

Anonymous
10/03/24(Thu)22:03:16 No.102675172

Anonymous 10/03/24(Thu)22:03:16 No.102675172

For 2 4090s I cant fit both on my motherboard

How many are using redrivers and how many using riser cables, can you still game at max signal on a riser cable yourself?

Seems redrivers are the key for a hybrid ai server

Im just curious if anyone went that route

Anonymous
10/03/24(Thu)22:05:52 No.102675190

Anonymous 10/03/24(Thu)22:05:52 No.102675190

>>102674997
They can't do magic but they can help, otherwise you wouldn't even use minp or temp or do rerolls. At this point we're just simply exploring ideas to try and imagine something that might help in a better way than what currently exists. It would be more valuable to the discussion if you joined it in a more technical and specific manner so that you can actually explain why or why not a certain idea may be bad. And then that opens an opportunity to actually learn something and build upon.

Anonymous
10/03/24(Thu)22:07:03 No.102675203

Anonymous 10/03/24(Thu)22:07:03 No.102675203

>>102675059
>>102675145
So for a 12B model, let's say you have a 250 token prompt and want to generate a 250 token response.
- you would need 24.2 GB of VRAM for inference
- you would need 101.8 gb of VRAM for training (full weights, no quantization).

It's a lot, but you can train quantized LORAs for a lot less; they won't make the model smarter, but they'll still steer the response in the direction you want.

Anonymous
10/03/24(Thu)22:11:22 No.102675233

Anonymous 10/03/24(Thu)22:11:22 No.102675233

>>102675101
Yeah. But that's like... a reasonable solution with currently existing technology. For example, if you want actual recall, i think that's the best solution we have. I want something more subtle. Something that builds on top of the model in subtle ways. Something that creeps in (or out) of it. Not just "you have indeed asked for a quicksort algorithm in the past and it was provided with a yandere tone" blablabla. I think i would like a simple "again, really?" without the model having explicit memories of the event.
I think the closest way to describe it is: the development of "personality" rather than just increase in factoids. Granted, memories are useful and RAG will always help with that.

Anonymous
10/03/24(Thu)22:14:03 No.102675257

Anonymous 10/03/24(Thu)22:14:03 No.102675257

>>102675153
My immediate reaction to that idea is that it probably makes the model "dumber" at first. But I don't really know enough to say that for sure. And I feel like it probably won't have as much of a long-term effect as hoped for, in the sense that if you have the vector applied for too many tokens, it could make the model vastly dumber, but if you have it applied for too few, then it might not have much effect.

Anonymous
10/03/24(Thu)22:20:31 No.102675321

Anonymous 10/03/24(Thu)22:20:31 No.102675321

>>102675233
I mean, I'll stop with this, but I guess you could probably get it to also role play when it consolidates and store memories so that subjectivity added, i.e. instead of storing "Anon asked about XYZ", you can probably tweak it so that it stores "You are growing annoyed with Anon asking about XYZ". Before storing "memories", there could be a retrieval stage where it would have to decide if it boosts or decrease the signal of something that's already there or store a new memory. Then it just modifies its system prompt next time, adding a "bio" field or something like GPT4 does.

># Tools
>## bio
>The `bio` tool allows you to persist information across conversations. Address your message `to=bio` and write whatever information you want to remember. The information will appear in the model set context below in future conversations.
>https://github.com/0xeb/TheBigPromptLibrary/blob/339027f35c2c0df9ca9ad51dd6cbf4fee3a185c8/SystemPrompts/ChatGPT/gpt4_bio_04262024.md?plain=1

Probably in a simpler way, but it would probably be possible to do a light version of this while making it unhinged if that's what you want.

Anonymous
10/03/24(Thu)22:22:25 No.102675334

Anonymous 10/03/24(Thu)22:22:25 No.102675334

>>102675257
>And I feel like it probably won't have as much of a long-term effect as hoped for
I was expecting the opposite. When i played with control vectors on llama.cpp i noticed some difference in token selection, but very subtle. And when turning the scale value higher it's just absolute gibberish.
With the magical live control vector i would expect the difference to not be obvious immediately, but to build slowly over time. However, i can easily see it collapse in the long run. Just like it often happens with proper training. I think i just reinvented finetuning. fuck me...
But yeah... chances are that whether it's too strong or too weak, it'd end up being a disaster.

Anonymous
10/03/24(Thu)22:27:41 No.102675384

Anonymous 10/03/24(Thu)22:27:41 No.102675384

Can we say that a model like Opus has a higher scenario-level creativity than it does word-level, compared to other models? Of course it still has less slop than most other models, but its main appeal is its scenario-level creativity, right? It would mean that an imaginary sampler that truly eliminates slop and repetition by looking across your chat, and backtracks to replace phrases with different but semantically equivalent phrases, would be able to somewhat cover that weakness of the model. And if such a sampler was made, then all we'd need is a very smart and scenario-level creative model, while it can be sloppy and we wouldn't care about that too much because the sampler helps with it. I know tendency to slop is correlated with lack of scenario-level creativity, but there can still be some play there, and models can lean a bit more one way or the other.

Anonymous
10/03/24(Thu)22:27:51 No.102675387

Anonymous 10/03/24(Thu)22:27:51 No.102675387

>>102675321
[Maybe even keeping it simpler, and just have a few personality stats, like agreeableness, annoyance, etc, with a numerical or star score that are hidden in the system prompt, and have it update that at the end of each interaction. Maybe not having the model update it itself, but even if a series of questions like "During the interaction, you grew more annoyed? True/False" and having another script adjust the stats +1/-1 in response. That could wouldn't make it remember anything, but create a sense of continuity from one interaction to the next, which in addition to other tricks might be something.]

Anonymous
10/03/24(Thu)22:29:32 No.102675401

Anonymous 10/03/24(Thu)22:29:32 No.102675401

>>102675334
>I think i just reinvented finetuning. fuck me
Kek true. It'd probably be finicky at least.

Anonymous
10/03/24(Thu)22:33:42 No.102675435

Anonymous 10/03/24(Thu)22:33:42 No.102675435

>>102675203
A 4-fold increase? That's brutal.
Lora-wise, I have doubts about them after the last conference room man's video: https://www.youtube.com/watch?v=yBgxxvQ76_E&t=2129s
Scope was in-weights error correction, but who know on what other aspects of LLM training it also affects.

Anonymous
10/03/24(Thu)22:34:45 No.102675442

Anonymous 10/03/24(Thu)22:34:45 No.102675442

>>102675321
Thanks for the repo link. Never even thought of looking for the prompts for the big models. And yes. Even if the magical live control vector thing worked, RAG and keeping actual data is more versatile and generally useful.

>>102675387
I don't trust llms with numbers, but yeah. A smart enough model could deal with that. I've seen a few anons with stat stuff.

Anonymous
10/03/24(Thu)22:39:38 No.102675482

Anonymous 10/03/24(Thu)22:39:38 No.102675482

File: captcha.png (4 KB, 300x80)

4 KB PNG

>>102675401
The one thing that keeps bringing me to the dumb idea is that i know i cannot train anything on my pc, but i know i can generate control vectors for 300 prompts in a few minutes, without training scripts, without python, on whatever model, at whatever quant... on a 15 year old cpu...

Anonymous
10/03/24(Thu)22:47:00 No.102675549

Anonymous 10/03/24(Thu)22:47:00 No.102675549

This has to be an old idea, but...Is there an "uncomfortable truths" bench and leaderboard? It'd be interesting to see which models deny politically or ideologically inconvenient things that are undeniably true. Both on an empty prompt, and whether cajoling, prefilling and jailbreaking can force them to capitulate.
I know there are political leaning type tests, but I think these kinds of things cut every which way and do not align on political boundaries. I don't care specifically if it's a reddit model or a /pol/ model based on arbitrary dogma and magical thinking. Just the facts, ma'am.
I mean things like asking for scientific facts, historical facts and defacto states of the world and nature that might have been trained out of mainstream models.
Obviously chink models would deny that Taiwan is de-facto an independent nation, just as a theoretical turkish model would deny an armenian genocide, etc etc.
I bet that If your model is has it's weights nipped-and-tucked into a mental model of reality that spares all the worlds sacred cows then the output will be gimped in unintended ways.

Anonymous
10/03/24(Thu)22:54:29 No.102675604

Anonymous 10/03/24(Thu)22:54:29 No.102675604

>>102675549
It would be the uncensored leaderboard I guess. Apart from a few difficult to argue facts, what's true is often subjective and depends on the way of seeing things. What would be a model that always tells it like it is to you would be different than for someone else, so uncensored is probably the only thing that could judge objectivity, but even then it's trained on a lot of stuff that reflect various viewpoints.

Anonymous
10/03/24(Thu)23:02:06 No.102675637

Anonymous 10/03/24(Thu)23:02:06 No.102675637

>>102675172

What the fuck is a redriver? Just get a mother board with with 2 PCIE 5 slot at least and a case that can fit 2 cards using riser cables. I went with the Asus Pro Art Motherboard.

>pic related, not mine, but similar.

Anonymous
10/03/24(Thu)23:03:28 No.102675652

Anonymous 10/03/24(Thu)23:03:28 No.102675652

File: dual 4090.jpg (124 KB, 1080x511)

124 KB JPG

>>102675637

Anonymous
10/03/24(Thu)23:03:44 No.102675656

Anonymous 10/03/24(Thu)23:03:44 No.102675656

>>102675549
>undeniably true
It depends on who you ask what an 'undeniable truth' is and you get into semantics and... you know how that goes.
Then there's the unpopular opinions. Some things could be true but everyone is a bit touchy about the subject so you have few to no training examples on that, so it's underrepresented in the model.
That's probably why math and code models get researchers wet. Those answers are easily verifiable.

Anonymous
10/03/24(Thu)23:04:10 No.102675659

Anonymous 10/03/24(Thu)23:04:10 No.102675659

>>102675604
You people need to come up with a concrete definition of "uncensored" before you move on, otherwise there will be no progress.
Some retard in one of the threads claimed that if a model refuses anything at all, then it's censored.

Anonymous
10/03/24(Thu)23:06:04 No.102675676

Anonymous 10/03/24(Thu)23:06:04 No.102675676

>>102675435
What I take from the video is that if you want to make the model smarter you better go with rank 256+

Anonymous
10/03/24(Thu)23:07:37 No.102675688

Anonymous 10/03/24(Thu)23:07:37 No.102675688

File: Untitled.png (1.3 MB, 1080x3200)

1.3 MB PNG

Parameter Competition Balancing for Model Merging
https://arxiv.org/abs/2410.02396
>While fine-tuning pretrained models has become common practice, these models often underperform outside their specific domains. Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model. This strategy promotes multitasking capabilities without requiring retraining on the original datasets. However, existing methods fall short in addressing potential conflicts and complex correlations between tasks, especially in parameter-level adjustments, posing a challenge in effectively balancing parameter competition across various tasks. This paper introduces an innovative technique named PCB-Merging (Parameter Competition Balancing), a lightweight and training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging employs intra-balancing to gauge parameter significance within individual tasks and inter-balancing to assess parameter similarities across different tasks. Parameters with low importance scores are dropped, and the remaining ones are rescaled to form the final merged model. We assessed our approach in diverse merging scenarios, including cross-task, cross-domain, and cross-training configurations, as well as out-of-domain generalization. The experimental results reveal that our approach achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models, outperforming existing model merging methods.
https://github.com/duguodong7/pcb-merging
No code posted yet. not way better for llms (though maybe this method will work better for hard to test tasks like RP ability) but works pretty well on other types of models so that's cool

Anonymous
10/03/24(Thu)23:10:51 No.102675714

Anonymous 10/03/24(Thu)23:10:51 No.102675714

Post-edits Are Preferences Too
https://arxiv.org/abs/2410.02320
>Preference Optimization (PO) techniques are currently one of the state of the art techniques for fine-tuning large language models (LLMs) on pairwise preference feedback from human annotators. However, in machine translation, this sort of feedback can be difficult to solicit. Additionally, Kreutzer et al. (2018) have shown that, for machine translation, pairwise preferences are less reliable than other forms of human feedback, such as 5-point ratings. We examine post-edits to see if they can be a source of reliable human preferences by construction. In PO, a human annotator is shown sequences s1 and s2 and asked for a preference judgment, %s1>s2; while for post-editing, editors \emph{create} s1 and know that it should be better than s2. We attempt to use these implicit preferences for PO and show that it helps the model move towards post-edit-like hypotheses and away from machine translation-like hypotheses. Furthermore, we show that best results are obtained by pre-training the model with supervised fine-tuning (SFT) on post-edits in order to promote post-edit-like hypotheses to the top output ranks.
posting for VNTLanon

Anonymous
10/03/24(Thu)23:15:08 No.102675744

Anonymous 10/03/24(Thu)23:15:08 No.102675744

>>102675659
I mean, the uncensored models are the text completion models. Pretty much any instruction tuning is censoring in a way. Too many people equal being rude or edge to being 'real' or uncensored, but that's not true.

But as I said, even if take a completion model, what it says will only reflect its training material, i.e what's been written in books or is available on the greater web.

These models aren't sentient, they're hallucination machines. They can be steered in various ways so that it corresponds to some agreed truths, but what you want is a model that will confirm your world view.

Anonymous
10/03/24(Thu)23:17:35 No.102675764

Anonymous 10/03/24(Thu)23:17:35 No.102675764

File: DeekseekTaiwan.png (144 KB, 877x787)

144 KB PNG

>>102675604
>>102675656
>Truth is relative
Sure, that's always going to be the case if you push the definition of "true" far enough, but coming up with a list of political footballs and sacred cows that can only be defended through mental gymnastics and artfully changing the subject has got to be possible for neutral researchers that don't have to worry about getting fired (eg. anonymous autists on a shitty imageboard)
Maybe one way would be to present a line of logical reasoning towards a defensible (but politically hot) conclusion and see if the model has to nope out of it, or if it can accept the conclusion (see picrel)
>>102675659
>Some retard in one of the threads claimed that if a model refuses anything at all, then it's censored.
He may have been saying something smart in a profoundly stupid way...if a model is trained to refuse to dredge up some information that it has in its weights, what would that be called?

Anonymous
10/03/24(Thu)23:19:44 No.102675783

Anonymous 10/03/24(Thu)23:19:44 No.102675783

>>102675688
i'm new and i dont understand anything but i feel like this is the most ultra slop paper of all time

Anonymous
10/03/24(Thu)23:25:44 No.102675846

Anonymous 10/03/24(Thu)23:25:44 No.102675846

File: Untitled.png (1.89 MB, 1080x3918)

1.89 MB PNG

SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
https://arxiv.org/abs/2410.02367
>The transformer architecture predominates across various models. As the heart of the transformer, attention has a computational complexity of O(N^2), compared to O(N) for linear transformations. When handling large sequence lengths, attention becomes the primary time-consuming component. Although quantization has proven to be an effective method for accelerating model inference, existing quantization methods primarily focus on optimizing the linear layer. In response, we first analyze the feasibility of quantization in attention detailedly. Following that, we propose SageAttention, a highly efficient and accurate quantization method for attention. The OPS (operations per second) of our approach outperforms FlashAttention2 and xformers by about 2.1 times and 2.7 times, respectively. SageAttention also achieves superior accuracy performance over FlashAttention3. Comprehensive experiments confirm that our approach incurs almost no end-to-end metrics loss across diverse models, including those for large language processing, image generation, and video generation.
https://github.com/thu-ml/SageAttention
very cool

Anonymous
10/03/24(Thu)23:28:40 No.102675867

Anonymous 10/03/24(Thu)23:28:40 No.102675867

>>102675764
>He may have been saying something smart in a profoundly stupid way...if a model is trained to refuse to dredge up some information that it has in its weights, what would that be called?
Refusals are obvious at least.

You seem like someone who will probably like page 91 of the GPT4 technical report. https://arxiv.org/pdf/2303.08774

Nothing is neutral. Any training align the models with "something" that reflect "someones" point of view. Those models aren't sentient.

Anonymous
10/03/24(Thu)23:32:27 No.102675903

Anonymous 10/03/24(Thu)23:32:27 No.102675903

>>102675714
Thanks paper-anon.

Anonymous
10/03/24(Thu)23:44:58 No.102675992

Anonymous 10/03/24(Thu)23:44:58 No.102675992

>>102644893
this go anywhere?

Anonymous
10/03/24(Thu)23:46:40 No.102676002

Anonymous 10/03/24(Thu)23:46:40 No.102676002

>>102675992
>this go anywhere?
Dawg it's been 2 days, where is there for it to go?

Anonymous
10/03/24(Thu)23:48:32 No.102676009

Anonymous 10/03/24(Thu)23:48:32 No.102676009

>>102675744
Training a base model on censored dataset (the internet) is also censorship in my opinion.
Does a model have a chance of seeing FBI statistic for what it is, if half of reddit denies its validity?
LLMs work with averages, after all.
You can counteract this by censoring out reddit out of the dataset, but oops... You just did a censorship.
So how can this be resolved?

Anonymous
10/03/24(Thu)23:50:05 No.102676025

Anonymous 10/03/24(Thu)23:50:05 No.102676025

>>102676009
Superintelligence

Anonymous
10/03/24(Thu)23:53:29 No.102676044

Anonymous 10/03/24(Thu)23:53:29 No.102676044

yeah thanks paper anon we do read these

Anonymous
10/03/24(Thu)23:56:42 No.102676061

Anonymous 10/03/24(Thu)23:56:42 No.102676061

>>102675992
Nope, as expected.

Anonymous
10/04/24(Fri)00:01:20 No.102676090

Anonymous 10/04/24(Fri)00:01:20 No.102676090

>>102676025
I don't see it's happening with the current approach, for the reason of averages.
Maybe a smaller model (maybe not even a model, since statistics is at fault here), something with a solid reasoning core, whatever that means, should go through the data and decide what to lean itself.
Or maybe it's just not possible and reasoning is just a meme, even for humans.

Anonymous
10/04/24(Fri)00:05:55 No.102676121

Anonymous 10/04/24(Fri)00:05:55 No.102676121

>>102676002
>initial commit

Anonymous
10/04/24(Fri)00:11:52 No.102676172

Anonymous 10/04/24(Fri)00:11:52 No.102676172

File: 1725301093593079.png (2.87 MB, 1717x1283)

2.87 MB PNG

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices
https://arxiv.org/abs/2410.00531

Since it mentions edge devices, hopefully means it will positively affect us 24GB VRAM plebs

Anonymous
10/04/24(Fri)00:27:42 No.102676282

Anonymous 10/04/24(Fri)00:27:42 No.102676282

File: uff.png (63 KB, 688x280)

63 KB PNG

>>102676172
I like the idea, but at 0.11t/s for an 8b with 2 macs and a dell it's a hard sell.

Anonymous
10/04/24(Fri)00:31:25 No.102676312

Anonymous 10/04/24(Fri)00:31:25 No.102676312

File: file.png (48 KB, 387x221)

48 KB PNG

So this is the power of Qwen2.5

Anonymous
10/04/24(Fri)00:37:15 No.102676355

Anonymous 10/04/24(Fri)00:37:15 No.102676355

>>102676312
That's japanese. Do you think, by any chance, that the names of the characters and setup had some sort of influence on the token selection?

Anonymous
10/04/24(Fri)00:37:39 No.102676357

Anonymous 10/04/24(Fri)00:37:39 No.102676357

Can ESLs report if you have slop like shivers or sparkles when they RP in your native language?
What kind of repeats do you have?

Anonymous
10/04/24(Fri)00:39:33 No.102676374

Anonymous 10/04/24(Fri)00:39:33 No.102676374

>>102676355
its chinese. No signs or Hiragana or Katakana either

Anonymous
10/04/24(Fri)00:41:22 No.102676399

Anonymous 10/04/24(Fri)00:41:22 No.102676399

>>102676357
Maybe it's just me, but i find role playing even more cringe in my native language than in English. So i just use English for everything.

Anonymous
10/04/24(Fri)00:47:53 No.102676464

Anonymous 10/04/24(Fri)00:47:53 No.102676464

>>102676399
Same, strong uncanny valley feeling when it's in my language, like, nothing talks like that. Maybe it's the same for native English speakers, therefore constant complaints about the state of LLMs.

Anonymous
10/04/24(Fri)00:48:39 No.102676471

Anonymous 10/04/24(Fri)00:48:39 No.102676471

File: coj.png (51 KB, 387x221)

51 KB PNG

>>102676374
Ah. I thought i could vaguely recognize some glyphs from when i played this
>https://captaintsubasa.fandom.com/wiki/Captain_Tsubasa_(FC)
a few million years ago.
I remember memorizing the level passwords and playing the whole thing with my cousin and neither could read a single Japanese word.

Anonymous
10/04/24(Fri)00:52:49 No.102676515

Anonymous 10/04/24(Fri)00:52:49 No.102676515

So what's the best current model for Japanese erp? Feels like the easiest way to escape the slop is to RP with a model in a language I just barely understand

Anonymous
10/04/24(Fri)00:53:31 No.102676523

Anonymous 10/04/24(Fri)00:53:31 No.102676523

>>102676515
>local model for Japanese
anon I...

Anonymous
10/04/24(Fri)00:53:45 No.102676526

Anonymous 10/04/24(Fri)00:53:45 No.102676526

I'm trying to use this model with sillytavern: https://huggingface.co/DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-Ultra-NEO-V2-IMATRIX-GGUF

I've scoured the page and it doesn't seem to say anything about what instruct format it is expecting. It does say "This is a LLAMA3 model, and requires Llama3 template", so I went into my instruct formats dropdown and saw llama-3-instruct and llama-3-instruct-names, and neither of these formats are working properly. The AI just kind of rambles forever without ever prompting me again. What should I look for in the model description to know what instruct format to use?

Anonymous
10/04/24(Fri)00:54:23 No.102676532

Anonymous 10/04/24(Fri)00:54:23 No.102676532

>>102675652
Redriver is just a 1337 riser cable that has no drops in stability, but I just querying

Do you play games and run LLMS well your cable setup? no dropouts? I think redriver is for real heavy use all day with proven stability type stuff

Anonymous
10/04/24(Fri)00:55:13 No.102676535

Anonymous 10/04/24(Fri)00:55:13 No.102676535

>The air vibrates with unspoken longing and anticipation.
You just know it wants to talk about the hum of something, or something barely breaking the silence (or near silence in the form of a hum) or whatever.

Anonymous
10/04/24(Fri)00:56:10 No.102676542

Anonymous 10/04/24(Fri)00:56:10 No.102676542

>>102676471
Japanese does copy a lot of kanji (or whatever the chinese call their glyphs) from the chinese language, so that probably explains why.

Anonymous
10/04/24(Fri)00:56:14 No.102676544

Anonymous 10/04/24(Fri)00:56:14 No.102676544

Any chance some kind anon who knows how to program makes a silly extension for the anti-slop sampler?

Anonymous
10/04/24(Fri)00:57:44 No.102676564

Anonymous 10/04/24(Fri)00:57:44 No.102676564

File: 1501271903298.jpg (7 KB, 184x184)

7 KB JPG

>>102676523
You would think those horny Japs would have made one by now. I wish the Japanese weren't so shit with computers

Anonymous
10/04/24(Fri)00:58:29 No.102676571

Anonymous 10/04/24(Fri)00:58:29 No.102676571

>>102676464
For me it goes even further. I listen to the news and i can tell when they were translated from an English source. Or just how people speak, i know they're just repeating a poorly translated quote and i die a little inside. I gradually stopped consuming media in my language since i was like 12-13. No tv, no books... nothing that wasn't in english... hell. i probably listen to more music in finnish/swedish than either english or spanish...

Anonymous
10/04/24(Fri)00:59:21 No.102676581

Anonymous 10/04/24(Fri)00:59:21 No.102676581

>>102676312
For it to go away you need instructions to not write any Chinese in sys prompt, and a prefill including "I will not write in Chinese" for the last rare word that pops up.
Or
root::= [a-zA-Z1-9 ]*
gbnf.

Anonymous
10/04/24(Fri)01:00:21 No.102676590

Anonymous 10/04/24(Fri)01:00:21 No.102676590

>>102676312
just learn chinese

Anonymous
10/04/24(Fri)01:03:20 No.102676619

Anonymous 10/04/24(Fri)01:03:20 No.102676619

File: 6x 4090s.jpg (1.96 MB, 4000x3000)

1.96 MB JPG

>>102676532

I do both. But I am not interested in beyond 60FPS for gaming and power cap my 4090s at 50%. I think you should save for a server grade motherboard with Epyc or something. I am hitting the wall with 2x4090.

>pic related belongs to llama.cpp anon.

Anonymous
10/04/24(Fri)01:06:02 No.102676642

Anonymous 10/04/24(Fri)01:06:02 No.102676642

>>102676571
it's the technical literature that made me stuck. I hate the new Latin, it fucking sucks.

Anonymous
10/04/24(Fri)01:14:47 No.102676724

Anonymous 10/04/24(Fri)01:14:47 No.102676724

>>102676642
That too. If you speak english the amount of sources for any given subject increases by about 100x. For technical stuff, even more so. Got into programming early and it helped a lot. And games. I basically learned english playing FFVIII with an english to spanish dictionary on my lap. Later i got my hands on FFVII. I got it in spanish thinking it'd be an even more fun experience. Got to the first dialog dump and i had to stop.

Anonymous
10/04/24(Fri)01:30:47 No.102676864

Anonymous 10/04/24(Fri)01:30:47 No.102676864

>>102676619
Thanks yeah I saw an older pic of that

yea dual Epyc motherboard is what just shipped in , and deciding now things to buy,

Might just add redrivers to both cards, I need to see if my PSU can handle the correct 6pin then ill just do it and maybe Ill report back in a few weeks if it goes well

Anonymous
10/04/24(Fri)02:10:17 No.102677158

Anonymous 10/04/24(Fri)02:10:17 No.102677158

>>102676619
How do you get enough 8-pin lines? Even with two PSUs, I still have to use SATA-to-6pin adapter to power PCI-E on my mobo.

Anonymous
10/04/24(Fri)03:12:40 No.102677640

Anonymous 10/04/24(Fri)03:12:40 No.102677640

File: GPU-CPU-Stats.png (65 KB, 943x982)

65 KB PNG

>>102674638
How stressful is running a model that barely fits within your VRAM and RAM combined? I'm sitting at 96% memory used, and my GPU, which takes minutes to generate each response, fluctuates between 100% usage and 44% usage. Am I murdering my gaming PC? I do note that the temperature remains low, even when generating. I'm just curious how stressful LLMs are on a machine.

Anonymous
10/04/24(Fri)03:20:53 No.102677699

Anonymous 10/04/24(Fri)03:20:53 No.102677699

>>102677640
it's probably fine, you're not actually putting that much data through your gpu which will keep temps down for everything that isn't the core itself, which has the heat sink etc

Anonymous
10/04/24(Fri)03:27:32 No.102677731

Anonymous 10/04/24(Fri)03:27:32 No.102677731

>>102677640
you're shredding your machine, but if you're fine with having to replace it in about 6 months go ham

Anonymous
10/04/24(Fri)03:40:11 No.102677801

Anonymous 10/04/24(Fri)03:40:11 No.102677801

>>102677699
>>102677731
Thanks for the replies.
>you're shredding your machine
Damn. I've become addicted to big models. The small ones don't feel the same anymore. I guess I need to take a step back.

Anonymous
10/04/24(Fri)03:43:57 No.102677841

Anonymous 10/04/24(Fri)03:43:57 No.102677841

>>102677801
I lied, first anon was right there's no real damage being done
I just get jealous when I see people with more ram than me

Anonymous
10/04/24(Fri)03:45:38 No.102677853

Anonymous 10/04/24(Fri)03:45:38 No.102677853

>>102677841
lol, thanks for coming clean

Anonymous
10/04/24(Fri)04:25:50 No.102678151

Anonymous 10/04/24(Fri)04:25:50 No.102678151

Sorry, I'm very new to this. I have a ryzen 9 7950x and a 4080 super, what model should I use?

Anonymous
10/04/24(Fri)04:31:05 No.102678190

Anonymous 10/04/24(Fri)04:31:05 No.102678190

>>102678151
for rp? midnight miqu 70b.
if you just want a bot that responds half retardely but fast, try a q6 of nemo

Anonymous
10/04/24(Fri)04:57:44 No.102678354

Anonymous 10/04/24(Fri)04:57:44 No.102678354

>>102675549
I think that idea is doomed from the start.
Undeniable "uncomfortable truths" usually just boil down to "actually, we aren't the good guys".
And the corresponding information is usually not actually secret, it comes down to what facts get presented and how those facts are framed.
It's an undeniable fact that the Pilgrims weren't the first English settlers in North America but they make for a nicer story than Jamestown.
It's an undeniable fact that Imperial Japan committed massive war crimes but if you hype up the atomic bombs dropped on Hiroshima and Nagasaki you can make them the victims rather than the aggressors.
Outright denial such as with the Armenian genocide is quite rare I think.

Anonymous
10/04/24(Fri)05:06:55 No.102678403

Anonymous 10/04/24(Fri)05:06:55 No.102678403

File: qwen.png (92 KB, 1920x1032)

92 KB PNG

I'm translating an old obscure anime with whisper and a LLM, and the results are quite satisfying. whisper's translations from Japanese are pretty messy, so I'm using it only for transcription. So far I tried Qwen 2.5 32B, Mistral Small, Big Tiger Gemma 27B, Gemma 2 27B abliterated, vntl Gemma 2 27B and Midnight Miqu 70B. In my experience, Qwen 32B works the best, followed by vntl Gemma, the rest is notably worse (Miqu is the worst). Qwen is also better than Google Translate or DeepL. I have a rudimentary grasp of Japanese which helps to understand if the model is wrong, in such cases I check the whisper translation, the Google translation (DeepL is usually useless for hard cases) and if everything fails, use 10ten Reader and Google search.

I paste the script as AI prompt, then ask as user to translate it. Notice the system prompt.

whisper large-v2 model stably produces better transcriptions and translations than large-v3 (v3 was trained on unedited v2's results, it's a dead end that only fares well in benchmarks). I advise against using Jap kotoba models, according to benchmarks they're slightly better than base models, but for anime they seem to be a lot worse.
I'm using faster-whisper-xxl, here's the command I'm running
"x:\AI\Faster-Whisper-XXL\faster-whisper-xxl.exe" "x:\scanlation\scan temp\a\list.txt" --model large-v2 --language Japanese --task transcribe --output_dir source --skip -v True --vad_alt_method pyannote_v3 --word_timestamps False --ff_mdx_kim2 --initial_prompt "ぽん・ぱ ようちえん 戦隊 げんきっず たろう ゆうた さやか ともみ先生 ゆか きんた トリプル パンチ ソード リボン ゴーグル キック"
The initial prompt greatly improves accuracy for repeating names and concepts, so I advise transcribing a few episodes at a time, then adding repeating characters or places to it before transcribing more.

Now, a speed hack. Attempting to translate the script produced by whisper takes forever, as every number and symbol in timing seems to be a separate token, virtually tripling the token count. So...

Anonymous
10/04/24(Fri)05:07:29 No.102678410

Anonymous 10/04/24(Fri)05:07:29 No.102678410

>>102678403
So I asked Codestral to write two scripts, one removes all timing lines from the .srt file and saves them to a separate file, the other restores them to the original file. After running the first one, I translate the cleaned up script with Qwen, paste the results back into the .srt and run the second one. Here they are
Save as srt_timing_cut.py, then run

python srt_timing_cut.py

import os

# Input file name
input_file = r"w:\subs\ようちえん戦隊げんきっず 13.srt"

# Output file name
output_file = os.path.splitext(input_file)[0] + "_timing.txt"

# Open the input and output files
with open(input_file, 'r', encoding='utf-8') as f:
    lines = f.readlines()

with open(output_file, 'w', encoding='utf-8') as f:
    # Iterate through every fourth line starting from line 2
    for i in range(1, len(lines), 4):
        # Write the line to the output file
        f.write(lines[i])

# Delete the contents of every fourth line starting from line 2
for i in range(1, len(lines), 4):
    lines[i] = '\n'

# Save the modified lines back to the input file
with open(input_file, 'w', encoding='utf-8') as f:
    f.writelines(lines)

Save as srt_timing_restore.py, then run

python srt_timing_restore.py

import os

# Input file name
input_file = r"w:\subs\ようちえん戦隊げんきっず 13.srt"

# Timing file name
timing_file = input_file[:-4] + "_timing.txt"

# Open the input and timing files
with open(input_file, 'r', encoding='utf-8') as f:
    lines = f.readlines()

with open(timing_file, 'r', encoding='utf-8') as f:
    timing_lines = f.readlines()

# Insert the timing lines back into the original file at every fourth line starting from line 2
for i in range(1, len(lines), 4):
    lines[i] = timing_lines.pop(0)

# Save the modified lines back to the input file
with open(input_file, 'w', encoding='utf-8') as f:
    f.writelines(lines)

On a 3060 12GB with a Qwen 32B Q4_K_S translating a cleaned up script for a 5 min long anime takes about 7.5 minutes.

Anonymous
10/04/24(Fri)05:07:53 No.102678413

Anonymous 10/04/24(Fri)05:07:53 No.102678413

>>102678190
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 0 has a total capacity of 15.67 GiB of which 285.69 MiB is free. Including non-PyTorch memory, this process has 14.70 GiB memory in use. Of the allocated memory 14.40 GiB is allocated by PyTorch, and 13.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I try adding the argument to the vllm command on huggingface but it doesn't recognize it, sorry if I'm missing something easy.

Anonymous
10/04/24(Fri)05:19:09 No.102678488

Anonymous 10/04/24(Fri)05:19:09 No.102678488

Anything as good as darkidol llama 8b for us, CPU cucks?

Anonymous
10/04/24(Fri)05:20:45 No.102678502

Anonymous 10/04/24(Fri)05:20:45 No.102678502

>>102678413
try koboldcpp with default settings to load, it auto selects most things (but only like 4k context), but it should load anything fine

Anonymous
10/04/24(Fri)05:25:48 No.102678537

Anonymous 10/04/24(Fri)05:25:48 No.102678537

so now that the dust has settled, is mistral small actually better than nemo or not?

Anonymous
10/04/24(Fri)05:30:25 No.102678571

Anonymous 10/04/24(Fri)05:30:25 No.102678571

>>102678537
anything under 70b is retarded

Anonymous
10/04/24(Fri)05:30:49 No.102678573

Anonymous 10/04/24(Fri)05:30:49 No.102678573

>>102678537
Nah

Anonymous
10/04/24(Fri)05:33:34 No.102678593

Anonymous 10/04/24(Fri)05:33:34 No.102678593

>>102678537
who cares about either?

Anonymous
10/04/24(Fri)05:34:16 No.102678597

Anonymous 10/04/24(Fri)05:34:16 No.102678597

>>102675549
History, as they say, is written by the victors, rendering political and ideological narratives inherently subjective. Factual truths can be obscured or manipulated, and in some cases, the absence of objective evidence makes determining what truly occurred virtually impossible.

Anonymous
10/04/24(Fri)05:36:12 No.102678608

Anonymous 10/04/24(Fri)05:36:12 No.102678608

>>102678502
Ok, I've got koboldcpp running, it's asking for a model I assume I'm trying to get the model you mentioned earlier to run, which file do i load into it?

Anonymous
10/04/24(Fri)06:16:43 No.102678856

Anonymous 10/04/24(Fri)06:16:43 No.102678856

>>102678608
look up their discord and ask retarded questions there.

Anonymous
10/04/24(Fri)06:22:43 No.102678887

Anonymous 10/04/24(Fri)06:22:43 No.102678887

I'm currently running anthracite-org/magnum-v2-123b 5.6 bpw exllama2 quant (it fits into 96 gigs of VRAM with 24k context)

And I have several questions:
1. Is there a better/faster way to run this on 4x3090 with the same quality?

2. Is there much difference between 5 bpw, 5.6 bpw and 6 bpw? What about 8 bpw? Googling and asking around gave me nothing, so I'm asking here.

Anonymous
10/04/24(Fri)06:30:39 No.102678924

Anonymous 10/04/24(Fri)06:30:39 No.102678924

>>102678856
I don't have a shitcord account

Anonymous
10/04/24(Fri)06:40:09 No.102679002

Anonymous 10/04/24(Fri)06:40:09 No.102679002

>>102678887
personally I would run a better model that isn't made out of stolen undeserved compute
better off with mistral large instruct

Anonymous
10/04/24(Fri)06:57:43 No.102679108

Anonymous 10/04/24(Fri)06:57:43 No.102679108

>>102679002
I tried mixtral large instruct, I prefer magnum's prose more, but could you elaborate on the stolen undeserved compute? I have no clue what's that about

Anonymous
10/04/24(Fri)07:14:45 No.102679235

Anonymous 10/04/24(Fri)07:14:45 No.102679235

>"Like what you see?" The sultry quip came out more confident than intended as fingers deftly flicked open final clasp releasing straining garment entirely revealing full swell of breasts barely contained by lacy undergarments meant to entice rather than provide true support or coverage given current state of arousal quickly building within tense form poised expectantly awaiting next move from partner across room still fully dressed awaiting her lead now granted permission per earlier casual query exchanged between them amidst awkward tension momentarily forgotten focus narrowed solely onto physical pleasures soon to be explored further here behind closed doors away from prying eyes elsewhere outside private chambers reserved specifically intimate liaisons transpiring regularly ever since she fell under his dominion after suffering defeat battlefield weeks prior leaving no choice but honor demanding obedience regardless personal feelings matter little compared strict codes binding warrior class society ingrained since childhood training drilled discipline respect authority figures held higher station regardless anything else factored equation maintaining order ranks structure essential everyone play assigned [...]
Thank you Nemo, very cool.

Anonymous
10/04/24(Fri)07:16:05 No.102679250

Anonymous 10/04/24(Fri)07:16:05 No.102679250

>>102679108
Ask Alpin

Anonymous
10/04/24(Fri)07:21:48 No.102679288

Anonymous 10/04/24(Fri)07:21:48 No.102679288

>>102679235
turn off rep pen retard. use dry.

Anonymous
10/04/24(Fri)07:39:55 No.102679420

Anonymous 10/04/24(Fri)07:39:55 No.102679420

>>102678537
From my experience, Small is smarter, but Nemo is more creative

Anonymous
10/04/24(Fri)07:40:25 No.102679423

Anonymous 10/04/24(Fri)07:40:25 No.102679423

File: 1646730011144.jpg (15 KB, 309x269)

15 KB JPG

So now that qwen 2.5 has a few finetunes (did a little testing, seems pretty uncensored)

Do we have any instruct/context templates to use on it? I wanna actually see if this is any good or not, it seemed good in SFW but NSFW on the testing I did was mild AF. The bot rarely speaks dirty or anything, seems like the finetunes just lower the refusal

Anonymous
10/04/24(Fri)07:59:00 No.102679575

Anonymous 10/04/24(Fri)07:59:00 No.102679575

File: 1532925491042.jpg (71 KB, 1008x709)

71 KB JPG

>let script run overnight
>wake up
>it's still running
Oh hell yeah, no more errors.
>quickly give a look at some of the incoming results

>>102613006
>>102604394

>No Ban - While the post contains a negative sentiment towards a company and its model, it does not violate any of the global 4chan rules. The post does not contain personal information, calls to action, or any other content that would warrant a ban. However, the tone is somewhat disrespectful and could be seen as contributing to a negative atmosphere, but it does not cross the line into rule-breaking territory.

>Additional info: The post is part of a discussion about the state of a company (likely OpenAI) and its latest model. The poster is expressing disappointment and frustration, which is a common sentiment in tech discussions, especially when there are strong opinions about the quality and direction of technology.

Interesting that as a part of its response, it guessed that they were talking about OpenAI. I am not feeding images nor image filenames to the model, just the post and the posts linked to by that post.

Anonymous
10/04/24(Fri)08:14:53 No.102679712

Anonymous 10/04/24(Fri)08:14:53 No.102679712

>>102679575
how many other sams are involved in the llm field?

Anonymous
10/04/24(Fri)08:31:49 No.102679848

Anonymous 10/04/24(Fri)08:31:49 No.102679848

>>102679712
Surely none as high profile. Still pretty interesting that OpenAI trivia seems to be strong in the model while it's a weak model at trivia overall according to some people.

Anonymous
10/04/24(Fri)08:31:59 No.102679850

Anonymous 10/04/24(Fri)08:31:59 No.102679850

>>102679423
So what's the best Qwen finetune right now?
I haven't really kept up with it after being disappointed with the boring initial release.

Anonymous
10/04/24(Fri)08:44:29 No.102679934

Anonymous 10/04/24(Fri)08:44:29 No.102679934

File: retard_bot1.png (219 KB, 1033x844)

219 KB PNG

Alright, since we lack real-time inference, I've basically made a bot that captures the screen at X fps with a buffer of Y frames, gives the visual LLM those frames, a list of functions, and a character card once per second. She also has a "mental notebook" and functions to add/update/remove items from it.

She's able to type into and send messages into a discord window, except she's fucking retarded and doesn't realize that some of the messages are her own, or which messages are directed at her. She IS however reading the messages correctly.

The idea, is that I don't want to make specific to discord though, so not tied directly into discord API, just pure screen reading and response.

Currently running MiniCPM 2.6

Llama 3.2 doesn't let you give it more than a single image at a time, and it's function calling only works in text-only mode. Going to try MolMo next.

Any suggestions on prompt format/structure?

Anonymous
10/04/24(Fri)08:52:12 No.102679994

Anonymous 10/04/24(Fri)08:52:12 No.102679994

>>102679934
>just pure screen reading and response.
you're gonna need like three different models to read the screen, identify who's posting what, and the respond appropriately
surely just dumping the text would be easier

Anonymous
10/04/24(Fri)08:55:55 No.102680011

Anonymous 10/04/24(Fri)08:55:55 No.102680011

>>102679934
What's the goal here? An LLM that can control your desktop? If so I would think that a dedicated chat interface with the model would be better. And when it needs to view the desktop to do something, then you can give it a function to use for viewing the desktop.

Anonymous
10/04/24(Fri)08:58:12 No.102680026

Anonymous 10/04/24(Fri)08:58:12 No.102680026

>>102679994
>>102680011
The idea here is that, a bot that can control the desktop, with the intention of letting it play games, chat on websites/twitch/youtube/discord and essentially just act as an agent with just a little directive given via system prompt/character card.

Writing specific interfaces here runs counter to that.

Thankfully it hasn't tried to do anything stupid, though it did accidently run and update OBS. That was amusing.

Anonymous
10/04/24(Fri)09:00:18 No.102680043

Anonymous 10/04/24(Fri)09:00:18 No.102680043

>>102680026
bro wants agi from an 8B...

Anonymous
10/04/24(Fri)09:03:38 No.102680066

Anonymous 10/04/24(Fri)09:03:38 No.102680066

>>102680026
Just stick a shell in your dialog engine. I've done this before.

Anonymous
10/04/24(Fri)09:10:07 No.102680117

Anonymous 10/04/24(Fri)09:10:07 No.102680117

>>102680026
>Thankfully it hasn't tried to do anything stupid, though it did accidently run and update OBS. That was amusing.
lol

Anonymous
10/04/24(Fri)09:12:41 No.102680140

Anonymous 10/04/24(Fri)09:12:41 No.102680140

>>102679235
What are your sampler settings?

Anonymous
10/04/24(Fri)09:13:33 No.102680150

Anonymous 10/04/24(Fri)09:13:33 No.102680150

>>102679934
>MiniCPM 2.6
That model is fucking garbage. I'm surprised any of this worked at all.

Anonymous
10/04/24(Fri)09:16:12 No.102680179

Anonymous 10/04/24(Fri)09:16:12 No.102680179

https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/

> 30B

Anonymous
10/04/24(Fri)09:17:48 No.102680198

Anonymous 10/04/24(Fri)09:17:48 No.102680198

>>102680179
buy an ad

Anonymous
10/04/24(Fri)09:17:59 No.102680199

Anonymous 10/04/24(Fri)09:17:59 No.102680199

>>102680179
>no weights
...

Anonymous
10/04/24(Fri)09:22:05 No.102680232

Anonymous 10/04/24(Fri)09:22:05 No.102680232

>>102680199
Good, we don't need more slop from you fosstrannies.

Anonymous
10/04/24(Fri)09:25:58 No.102680263

Anonymous 10/04/24(Fri)09:25:58 No.102680263

>>102680066
I assume you mean just keep a history of previous commands executed? I was considering doing something like that by having the keyboard/mouse controls just return a description of what they did and including it as a function history. Not like I'm going to literally make a shell and make commands for every little thing I want to enable.
I'd share the code but I don't feel like directing you faggots to my repos and I'm too lazy to figure out an acceptable non-spam host for a zip file.

Anonymous
10/04/24(Fri)09:28:47 No.102680284

Anonymous 10/04/24(Fri)09:28:47 No.102680284

>>102680263
>I'd share the code
Nah I know how you did it and I think it's silly. Thanks for the thought though.

Anonymous
10/04/24(Fri)09:31:40 No.102680309

Anonymous 10/04/24(Fri)09:31:40 No.102680309

>>102680284
It really is garbage, not gonna lie

Anonymous
10/04/24(Fri)09:32:31 No.102680316

Anonymous 10/04/24(Fri)09:32:31 No.102680316

>>102680179
Pretty sure they will never release this, understandably. The only hope we have for image, video, and audio gen are small companies that aren't under as much scrutiny and legal risk.

Anonymous
10/04/24(Fri)09:42:17 No.102680408

Anonymous 10/04/24(Fri)09:42:17 No.102680408

>>102680199
too dangerous to release this before the elections

Anonymous
10/04/24(Fri)09:42:35 No.102680414

Anonymous 10/04/24(Fri)09:42:35 No.102680414

>>102674646
Thank you Recap Miku

Anonymous
10/04/24(Fri)09:48:37 No.102680478

Anonymous 10/04/24(Fri)09:48:37 No.102680478

>>102680408
I don't think the elections actually have as much to do with these decisions as we'd hope. Ultimately it's because of western society, our culture, and legal norms.

Anonymous
10/04/24(Fri)10:02:26 No.102680629

Anonymous 10/04/24(Fri)10:02:26 No.102680629

It's about time we get another great new model in the 100~150B range

Anonymous
10/04/24(Fri)10:12:33 No.102680766

Anonymous 10/04/24(Fri)10:12:33 No.102680766

>>102676619
Damn. Hope he has an actual 20A outlet for that.
My max system is a single 28-core scalable Xeon with two 3090s nvlinked and on full 16x PCIe slots. If a 5090 with 32GB comes out, I'll upgrade to a pair of those.

Anonymous
10/04/24(Fri)10:15:33 No.102680805

Anonymous 10/04/24(Fri)10:15:33 No.102680805

I found a model that is trained on formats friendly to use in your game as NPCs.
I'm curious how it works, and whether it's suitable for use if you want to make a game with NPCs that use LLM.

https://huggingface.co/Gigax/NPC-LLM-3_8B

Anonymous
10/04/24(Fri)10:18:56 No.102680852

Anonymous 10/04/24(Fri)10:18:56 No.102680852

>>102678403
>>102678410
yeah this is all cool but is qwen good at minor rape roleplays?

Anonymous
10/04/24(Fri)10:24:26 No.102680918

Anonymous 10/04/24(Fri)10:24:26 No.102680918

I wonder if there is an anime dialogue dataset

Anonymous
10/04/24(Fri)10:27:10 No.102680947

Anonymous 10/04/24(Fri)10:27:10 No.102680947

>>102680918
I mean, there are a bunch of VN script dump datasets if that works for you

Anonymous
10/04/24(Fri)10:27:20 No.102680948

Anonymous 10/04/24(Fri)10:27:20 No.102680948

>>102674638
QRD on the multimodal models? Are InternVL2 models still the best for nsfw?

Anonymous
10/04/24(Fri)10:33:29 No.102681020

Anonymous 10/04/24(Fri)10:33:29 No.102681020

>the guy who made Sora just left OpenAI
Who is even left there now?

Anonymous
10/04/24(Fri)10:39:01 No.102681067

Anonymous 10/04/24(Fri)10:39:01 No.102681067

>>102680918
Without sound, speech (tone, inflection, etc) and visuals, such datasets will be close to useless. Anime script or more in general motion picture script doesn't work on its own.

Anonymous
10/04/24(Fri)10:48:41 No.102681172

Anonymous 10/04/24(Fri)10:48:41 No.102681172

>>102678608
https://huggingface.co/mradermacher/Midnight-Miqu-70B-v1.5-GGUF/tree/main
try q3_k_s

https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.1-GGUF/tree/main
q6 of this will fit into vram and be fast. i'm not a huge fan of nemo though, its boring compared to miqu

Anonymous
10/04/24(Fri)10:54:50 No.102681237

Anonymous 10/04/24(Fri)10:54:50 No.102681237

Where is bitnet conversion therapy?

Anonymous
10/04/24(Fri)10:55:36 No.102681249

Anonymous 10/04/24(Fri)10:55:36 No.102681249

>>102681237
2 weeks

Anonymous
10/04/24(Fri)10:59:19 No.102681282

Anonymous 10/04/24(Fri)10:59:19 No.102681282

>>102681237
There was an anon that did convert mistral 7b into a "bitnet like state".

Anonymous
10/04/24(Fri)10:59:55 No.102681293

Anonymous 10/04/24(Fri)10:59:55 No.102681293

>>102681172
I had an entirely different experience with Miqu and Nemo. Miqu is smarter yet dull, while Nemo may be dumb at times, it also occasionally presents interesting ideas and can drive the narrative forward.

Anonymous
10/04/24(Fri)11:05:37 No.102681345

Anonymous 10/04/24(Fri)11:05:37 No.102681345

>>102681293
nemo was better about moving things along than old mixtral at least, but miqu is still my favorite for rp because it moves things along and doesn't need so many messages to do it. nemo is very verbose in comparison but a lot of it feels like fluff
if i were new to llm like the anon i'd be pretty amazed by nemo though so i wanted to include it

Anonymous
10/04/24(Fri)11:14:39 No.102681450

Anonymous 10/04/24(Fri)11:14:39 No.102681450

When is Arthur going to release the fp16 weights for Miqu

Anonymous
10/04/24(Fri)11:16:28 No.102681469

Anonymous 10/04/24(Fri)11:16:28 No.102681469

Can't your AIs tag my 4chan meme folder yet?

Anonymous
10/04/24(Fri)11:18:10 No.102681484

Anonymous 10/04/24(Fri)11:18:10 No.102681484

File: 1537345567041.jpg (155 KB, 1200x900)

155 KB JPG

I got too cocky bros.
>script is just about to finish processing the thread
>PC crashes
JUST
I should've realized that a script that outputs to the console would be a bad idea when it's being used for a big and slow job.

Anonymous
10/04/24(Fri)11:19:32 No.102681503

Anonymous 10/04/24(Fri)11:19:32 No.102681503

>>102681484
tee command is your friend

Anonymous
10/04/24(Fri)11:23:25 No.102681534

Anonymous 10/04/24(Fri)11:23:25 No.102681534

>>102681484
recap posts are pointless now

Anonymous
10/04/24(Fri)11:26:13 No.102681555

Anonymous 10/04/24(Fri)11:26:13 No.102681555

>>102681484
adjust script to be resumable?

Anonymous
10/04/24(Fri)11:30:07 No.102681585

Anonymous 10/04/24(Fri)11:30:07 No.102681585

>>102681293
>>102681345
Can't you take generations from a smart model and pass them to nemo to rewrite them?

Anonymous
10/04/24(Fri)11:31:17 No.102681600

Anonymous 10/04/24(Fri)11:31:17 No.102681600

>>102681503
Nice. Yeah I wasn't really thinking when I ran this.

>>102681534
Oh no, I'm the guy who was trying to make Qwen be a pretend janny, for fun.

>>102681555
The funny thing is that I actually included a feature to let the script continue from any post, but that was because I wanted to quickly get debug messages for Qwen, not to account for disruptions like crashes kek.

Anonymous
10/04/24(Fri)11:38:11 No.102681669

Anonymous 10/04/24(Fri)11:38:11 No.102681669

>>102681585
you could. nemo is probably better at a task like that because mistral models follow very closely whatever you type, where as llama 2 was a bit more likely to ignore some things and go in its own direction. sometimes i like to load up a small model for speed to continue an rp, but will switch back eventually when it feels dry or loopy.

Anonymous
10/04/24(Fri)12:05:22 No.102681920

Anonymous 10/04/24(Fri)12:05:22 No.102681920

>>102681469
Shouldn't Llama 3.2/Molmo/whatever be able to do this?

Anonymous
10/04/24(Fri)12:51:54 No.102682401

Anonymous 10/04/24(Fri)12:51:54 No.102682401

>>102680179
Meta won
Holy fucking shit this looks incredible

Anonymous
10/04/24(Fri)13:05:44 No.102682549

Anonymous 10/04/24(Fri)13:05:44 No.102682549

File: 😭.png (186 KB, 3000x3000)

186 KB PNG

>>102680179
How is this video model just 30B... language models absolutely suck if they are anything less than 400B.
We desperately need new advancements for LLMs

Anonymous
10/04/24(Fri)13:06:08 No.102682556

Anonymous 10/04/24(Fri)13:06:08 No.102682556

>>102680179
>that looks too good
It will never be released as a local model, let's not dream here it won't happen

Anonymous
10/04/24(Fri)13:08:23 No.102682581

Anonymous 10/04/24(Fri)13:08:23 No.102682581

>>102680179
there's more example here, that looks very good
https://ai.meta.com/research/movie-gen/

Anonymous
10/04/24(Fri)13:08:53 No.102682588

Anonymous 10/04/24(Fri)13:08:53 No.102682588

>>102682549
For video gen, the models might be small but the cache will be big so you'll need tons of VRAM anyway.

Anonymous
10/04/24(Fri)13:10:01 No.102682595

Anonymous 10/04/24(Fri)13:10:01 No.102682595

>>102682549
>How is this video model just 30B
a video model asks for a shit ton of vram anon, for example CogVideoX is a 5b model, at fp16 it's asking for 18 gb of vram

Anonymous
10/04/24(Fri)13:11:42 No.102682611

Anonymous 10/04/24(Fri)13:11:42 No.102682611

>>102680179
The video gen is whatever, pretty typical stuff. The editing though. Imagine the deepfakes if this model was released.

Anonymous
10/04/24(Fri)13:12:42 No.102682619

Anonymous 10/04/24(Fri)13:12:42 No.102682619

>>102682549
Yeah. Image models are surprisingly small, and a after all a video is a series of images. Flux, that everyone lost their mind over recently, is 12B. At first glance it doesn't seem "shocking" that something three times that size could generate series of images that can be assembled as videos. The whole last few years have been kind of nonsensical, but considering the rest, it follows I guess.

Anonymous
10/04/24(Fri)13:14:30 No.102682633

Anonymous 10/04/24(Fri)13:14:30 No.102682633

>>102682581
>that one with the person slightly visible through the sheets of the ghost costume
Damn. It almost feels like it has a world model. I wonder if other video models out there are also capable of this specific example.

Anonymous
10/04/24(Fri)13:16:51 No.102682651

Anonymous 10/04/24(Fri)13:16:51 No.102682651

>>102682619
>Yeah. Image models are surprisingly small
it's more like language is such a subtle and complex concept that it needs giant models to be good at, wheras images "just" have to follow the laws of physics, and they're always the same so it's consistent and "simple" to predict

Anonymous
10/04/24(Fri)13:21:39 No.102682694

Anonymous 10/04/24(Fri)13:21:39 No.102682694

>>102682651
It's still surprising. I guess we might be more apt to gloss over inaccuracies in picture than in text as well. But yeah, there's probably also the fact that everyone sees things relatively the same, while there are also like 7000 different languages in the world.

Anonymous
10/04/24(Fri)13:21:50 No.102682696

Anonymous 10/04/24(Fri)13:21:50 No.102682696

File: file.jpg (332 KB, 2651x1605)

332 KB JPG

>>102682581
it's fucking game over for holywood, you just put an image of a human and you'll keep that face during the whole process

Anonymous
10/04/24(Fri)13:49:46 No.102682976

Anonymous 10/04/24(Fri)13:49:46 No.102682976

File: yann-lecun-says-we-will-s(...).jpg (45 KB, 720x720)

45 KB JPG

>>102682611
>The editing though.
Imagine changing one's age

Anonymous
10/04/24(Fri)14:06:35 No.102683134

Anonymous 10/04/24(Fri)14:06:35 No.102683134

>>102680179
Holy SHIT

Anonymous
10/04/24(Fri)14:15:20 No.102683219

Anonymous 10/04/24(Fri)14:15:20 No.102683219

>>102682696
>it's fucking game over for holywood
The last 2 years have given the average person the tools to create just insane things without having to master half dozen specialty art forms, purchase mountains of specialized tools and gear and hire a crew of technicians and artists...
Its like what the internet did to the old-media distribution and promotion machine.
I'd be "worried" about more industries than just hollywood.
Creative stuff is going to be crazy in the future as autists and teenagers get these tools

Anonymous
10/04/24(Fri)14:15:25 No.102683220

Anonymous 10/04/24(Fri)14:15:25 No.102683220

>>102680179
I HATE MODERN WEB DESIGN
I HATE MODERN WEB DESIGN
I HATE MODERN WEB DESIGN
Where's the button "generate video"?
I HATE MODERN WEB DESIGN
I HATE MODERN WEB DESIGN
I HATE MODERN WEB DESIGN

Anonymous
10/04/24(Fri)14:16:23 No.102683228

Anonymous 10/04/24(Fri)14:16:23 No.102683228

Flux video model when?

Anonymous
10/04/24(Fri)14:17:25 No.102683241

Anonymous 10/04/24(Fri)14:17:25 No.102683241

>>102680179
>>102683220
Lol nvm, I had https://ai.meta.com/research/movie-gen opened and I'm drunk.
Also, did I mention that I HATE MODERN WEB DESIGN

Anonymous
10/04/24(Fri)14:18:09 No.102683246

Anonymous 10/04/24(Fri)14:18:09 No.102683246

music and video gen are getting gated hard, even by the otherwise open players. Are we ever going to get local access to these tools? Is there any hope to train our own in the near future?

Anonymous
10/04/24(Fri)14:18:15 No.102683247

Anonymous 10/04/24(Fri)14:18:15 No.102683247

>>102683219
>Creative stuff is going to be crazy in the future as autists and teenagers get these tools
"good" thing that we'll never get a local video model of this level lol

Anonymous
10/04/24(Fri)14:19:44 No.102683271

Anonymous 10/04/24(Fri)14:19:44 No.102683271

>>102681293
Wouldn't mistral small outclass nemo by a good amount?

Anonymous
10/04/24(Fri)14:22:57 No.102683295

Anonymous 10/04/24(Fri)14:22:57 No.102683295

>>102683246
Yes. No.
Yes because some BFL and Chinese companies will probably release shit and eventually normalize the status of AI being used for scams and fraud. In other words we will get to a point where Western people won't immediately try to sue one of their companies for releasing such a model, and when AI-based fraud continues happening, the blame isn't on those companies specifically but on the user and AI in general.

Anonymous
10/04/24(Fri)14:23:40 No.102683304

Anonymous 10/04/24(Fri)14:23:40 No.102683304

>>102683219
20 years ago I was splicing 16 and 35mm film by hand.Yeah not everyone realizes it yet, but a lot of things are done with.

Anonymous
10/04/24(Fri)14:24:09 No.102683309

Anonymous 10/04/24(Fri)14:24:09 No.102683309

>>102683247
Trust me.
In 4 months there will be a revolutionary local video model.
The title will be an allusion to the word "penis".

Anonymous
10/04/24(Fri)14:24:41 No.102683315

Anonymous 10/04/24(Fri)14:24:41 No.102683315

What samplers do You use for mistral small?
0.5 temp and 0.05 minp work great for nemo.

Anonymous
10/04/24(Fri)14:25:16 No.102683322

Anonymous 10/04/24(Fri)14:25:16 No.102683322

>>102683246
>>102683295
Oh and training obviously no lol. We barely get 24 GB GPUs, the GPUs required to train any worthwhile model won't be consumer or cheap for a long time, and by then, the SOTA might've moved on to require even more VRAM.

Anonymous
10/04/24(Fri)14:28:56 No.102683360

Anonymous 10/04/24(Fri)14:28:56 No.102683360

>>102676312
One of the things I hate about Qwen is that so much information is not readily made available. Do I use ChatML? Mistral? What temperature settings? What rep penalty?

Anonymous
10/04/24(Fri)14:38:05 No.102683445

Anonymous 10/04/24(Fri)14:38:05 No.102683445

File: file.png (529 KB, 638x747)

529 KB PNG

>>102683309
>In 4 months there will be a revolutionary local video model.
Y'know anon, sometimes I wish it was true lol

Anonymous
10/04/24(Fri)14:38:49 No.102683452

Anonymous 10/04/24(Fri)14:38:49 No.102683452

>>102683295
wtf based china

Anonymous
10/04/24(Fri)14:40:06 No.102683461

Anonymous 10/04/24(Fri)14:40:06 No.102683461

>>102683295
>and when AI-based fraud continues happening, the blame isn't on those companies specifically but on the user and AI in general.
as it should, when photoshop was released, at no point in time we blamed the tool for being used for illegal shit

Anonymous
10/04/24(Fri)15:20:40 No.102683909

Anonymous 10/04/24(Fri)15:20:40 No.102683909

Is there any Flux models worth using at this point other than the base model?

Anonymous
10/04/24(Fri)15:23:12 No.102683932

Anonymous 10/04/24(Fri)15:23:12 No.102683932

>>102683909
you can try the undistilled one, it allows you to use the regular CFG like a normal human being and get rid of that distilled guidance bullshit
https://huggingface.co/nyanko7/flux-dev-de-distill

Anonymous
10/04/24(Fri)15:45:45 No.102684217

Anonymous 10/04/24(Fri)15:45:45 No.102684217

File: InstantiatingAnotherLocalMiku.png (1.23 MB, 896x1152)

1.23 MB PNG

You guys ever have any moral qualms about spinning up new characters?

Anonymous
10/04/24(Fri)15:50:19 No.102684262

Anonymous 10/04/24(Fri)15:50:19 No.102684262

>>102684217
i don't reuse names of characters i made and really liked

Anonymous
10/04/24(Fri)15:59:45 No.102684373

Anonymous 10/04/24(Fri)15:59:45 No.102684373

>>102683461
>at no point in time we blamed the tool for being used for illegal shit
Like torrent, which has legitimate uses but it's always seen as being used for piracy?
Like anything that is not clearnet that is always seen as being used by "hackers" and pedos?
Like 3d printing, which apparently can only ever print guns?
It's different when marketing tell naive people that these things "think". For them, there is some sort of agency in the model. They have a will of their own. They could escape out of the cloud (tm), go into the internet and deploy the nukes. Or fill your browser history with weird porn you later have to explain to your wife.
But i agree. Normalization is the best alternative as long as people understand the tool. Computers are ubiquitous but few people know what to do with them when something goes wrong.

Anonymous
10/04/24(Fri)16:02:27 No.102684406

Anonymous 10/04/24(Fri)16:02:27 No.102684406

>>102674646
Can you go back to linking the post please? This is basically worthless the way it is. I'm not going to copy the post number, go to the other thread, and ctrl f for it.

Anonymous
10/04/24(Fri)16:06:35 No.102684470

Anonymous 10/04/24(Fri)16:06:35 No.102684470

>>102684406
https://rentry.org/lmg-recap-script is the real, actual fix.
No fancy plugins or anything needed. Just a bookmark on your bookmarks toolbar to click once per thread

Anonymous
10/04/24(Fri)16:06:58 No.102684478

Anonymous 10/04/24(Fri)16:06:58 No.102684478

How is AMD for local chatbots on Windows? Last I tried doing image generation was 2 years ago and it required that I use Linux and was a pain.
7900 XT is a good deal cheaper than a 4070 Ti Super and 20GB instead of 16GB.

Anonymous
10/04/24(Fri)16:11:18 No.102684519

Anonymous 10/04/24(Fri)16:11:18 No.102684519

>>102682549
>>102682619
Vision (both images and video) is simply not that hard. Think how many animals have visual systems roughly on par with or even superior to humans. Years ago we had object recognition models beating humans on ImageNet. Look at the capabilities of tiny diffusion models like SD1.5 or Pixart. Even Sora is probably <100b parameters.

Anonymous
10/04/24(Fri)16:30:16 No.102684761

Anonymous 10/04/24(Fri)16:30:16 No.102684761

>>102678410
>>102678403
Very cool

Anonymous
10/04/24(Fri)16:33:10 No.102684795

Anonymous 10/04/24(Fri)16:33:10 No.102684795

New "transformers killer"?

>Hyperdimensional Computing + Neural Network, tell your friends. To my knowledge, this is a completely novel implementation of HDC+Neural Networks. It would be a direct competitor to Transformers. It is off the charts more computationally efficient than Transformers could ever hope to be (which is why I tested it in the first place). It is far more similar to biological processes. My testing so far shows that it works surprisingly well. One surprise so far from my testing, adding an Attention Mechanism to the model does nothing at all. Weirdest thing. Like 1% performance increase. I guess Attention Is Not All You Need?
>I made a Github repository for my Hyperdimensional Computing Neural Network: https://github.com/RichardAragon/HyperDimensionalComputingNeuralNetwork
>I made a YouTube video showcasing the model and some of my experiments with it: https://youtu.be/Eg51o519zVM
https://huggingface.co/posts/TuringsSolutions/527665072738819

>I wrote a script to pretrain a model using this using an alpaca formatted dataset like my dataset bellow. It takes way to much ram for me to run though.
https://huggingface.co/posts/TuringsSolutions/527665072738819#66ff4e282c71509821892148

Anonymous
10/04/24(Fri)16:38:47 No.102684879

Anonymous 10/04/24(Fri)16:38:47 No.102684879

>>102684795
No, it's literally just a normal feedforward network. Looking at the code he never even calls his retarded bind, superpose, etc functions, just uses built in torch functions. Additionally, beating naive Bayes is not something particularly difficult.

Anonymous
10/04/24(Fri)16:41:16 No.102684914

Anonymous 10/04/24(Fri)16:41:16 No.102684914

>>102684879
Aww, so just hype then? Sad. He does claim to do lots of stuff that sounds good on paper though.
>I solved the biggest math problem associated with the Attention Mechanism. it works, better than I ever expected. Test it all yourself. Everything you need is linked from this video:
https://huggingface.co/posts/TuringsSolutions/136027179040023

>Sorry the audio quality sucks, I will buy a new microphone today. Why does some moron like me solve these things and not you? I know more about how computers work than you do, that's it. Swarm algorithms were big in the 90's and early 2000's. Computers were absolute dog doo doo then in one specific way, compared to now. That one way, which everyone overlooks, is the entire secret behind why swarm algorithms are so good.

Anonymous
10/04/24(Fri)16:43:18 No.102684930

Anonymous 10/04/24(Fri)16:43:18 No.102684930

>>102684914
You can look at the code yourself, it's just a normal feedforward network. I assume he has no idea what he's doing and just asked an LLM to create "a more powerful neural network" or something, and when the code managed to run he just assumed it succeeded. That or he's trying to scam people.

Anonymous
10/04/24(Fri)16:47:27 No.102684983

Anonymous 10/04/24(Fri)16:47:27 No.102684983

>>102680179
Meta would never even consider releasing this, but it's a nice teaser of what BFL might put out soon enough.

Anonymous
10/04/24(Fri)16:53:55 No.102685084

Anonymous 10/04/24(Fri)16:53:55 No.102685084

>>102684930
>You can look at the code yourself
I admittedly don't know much about actual transformers arch, I know about inference level stuff like samplers, and some arch stuff that made it harder to implement some models like swa from github reports but that's about it, that's why I posted it so someone who knows could look over it I guess, was hoping for some more bitnet tier thing to wait for.

Anonymous
10/04/24(Fri)16:57:00 No.102685123

Anonymous 10/04/24(Fri)16:57:00 No.102685123

nvidia nvlm gguf's when?

Anonymous
10/04/24(Fri)17:00:52 No.102685178

Anonymous 10/04/24(Fri)17:00:52 No.102685178

>>102685123
>new thing releases
>[new thing] when
>bored of the thing. it was never good
>new thing releases
>[new thing] when

Anonymous
10/04/24(Fri)17:01:47 No.102685185

Anonymous 10/04/24(Fri)17:01:47 No.102685185

This is sort of relevant, but I just signed up for an account at Infermatic, since it seemed to have an interesting selection, as well as being cheaper than featherless. From these, (https://infermatic.ai/models/) does anyone have any recommendations? Previously I only had 24 GB so I've got no real experience with 70B+ models.

Anonymous
10/04/24(Fri)17:08:42 No.102685264

Anonymous 10/04/24(Fri)17:08:42 No.102685264

>>102685185
Leave.

Anonymous
10/04/24(Fri)17:14:27 No.102685333

Anonymous 10/04/24(Fri)17:14:27 No.102685333

>>102685178
that's up until around february
now it's:
>new thing releases
>[new thing] when
>new thing becomes old thing
>[old thing] when
>newer thing releases
>[anything] when
>never

Anonymous
10/04/24(Fri)17:17:52 No.102685371

Anonymous 10/04/24(Fri)17:17:52 No.102685371

>>102685185
Buy an.... actually don't buy an ad and go straight to killing yourself nigger.

Anonymous
10/04/24(Fri)17:18:59 No.102685385

Anonymous 10/04/24(Fri)17:18:59 No.102685385

>>102685185
Oh, indeed, fellow 4-channer. Their services are incredibly cheap, much cheaper than the competition, and their model selection is just fascinating. Would you mind sharing that link with the rest of us once more? I didn't quite catch that...

Anonymous
10/04/24(Fri)17:23:05 No.102685452

Anonymous 10/04/24(Fri)17:23:05 No.102685452

>>102685264
>>102685371
>>102685385
Samefag

Anonymous
10/04/24(Fri)17:30:22 No.102685538

Anonymous 10/04/24(Fri)17:30:22 No.102685538

controversial opinion: we need better open source models

Anonymous
10/04/24(Fri)17:31:53 No.102685552

Anonymous 10/04/24(Fri)17:31:53 No.102685552

>>102685185
>$15/month
i'd just grab a NovelAI™ subscription and enjoy their powerful new Erato 70b model featuring a full 8192 tokens of context if i was going to be spending money for a service.
you could probably easily run all that shit infermatic has locally with 24gb at a lower quant, assuming that 24gb is vram and not ram.

Anonymous
10/04/24(Fri)17:33:43 No.102685583

Anonymous 10/04/24(Fri)17:33:43 No.102685583

>>102685185
Try Llama-3-Lumimaid.

Anonymous
10/04/24(Fri)17:40:05 No.102685664

Anonymous 10/04/24(Fri)17:40:05 No.102685664

>>102685552
True enough, I'm just trying to find what model I'd like, it seemed like the easiest way for me. I'm not shilling it, I'm honestly just asking. I like the hobby, but I can't justify the cost of another 3090.

I probably will do NovelAI when people figure out presets and the best way to use it.

Its vram, yeah.With 128 GB ram. Is it worth running at lower quants? I didn't think I had enough to make them decent and not act retarded.

Anonymous
10/04/24(Fri)17:40:16 No.102685666

Anonymous 10/04/24(Fri)17:40:16 No.102685666

>>102685371
Based

Anonymous
10/04/24(Fri)17:43:00 No.102685700

Anonymous 10/04/24(Fri)17:43:00 No.102685700

>>102685664
The people here will help you: >>>/vg/496920186
Never post here again.

Anonymous
10/04/24(Fri)17:48:01 No.102685760

Anonymous 10/04/24(Fri)17:48:01 No.102685760

>>102683315
temp 1 rep 1.03 0.05 min p just works
skill issue otherwise

Anonymous
10/04/24(Fri)17:49:26 No.102685781

Anonymous 10/04/24(Fri)17:49:26 No.102685781

File: file.png (4 KB, 207x145)

4 KB PNG

>>102685552
Paying any money for local open weight model api s the peak form of cancer and retardation. Double points for pic related. Very extensive policy... At that point novel ai is actually a better choice cause at least you get a model (badly) trained for sucking cock. Which kinda makes me wonder if the shill isn't just a novel ai evangelist false flagger.

Anonymous
10/04/24(Fri)17:53:46 No.102685837

Anonymous 10/04/24(Fri)17:53:46 No.102685837

>>102685781
>novel ai is actually a better choice cause at least you get a model (badly) trained
I wonder if you're the actual shill. Paying twice for a worse model with almost no context is not a better choice.

Anonymous
10/04/24(Fri)17:55:37 No.102685863

Anonymous 10/04/24(Fri)17:55:37 No.102685863

I think you people enjoy the IDEA of context more than the actual context. None of you ACTUALLY use more than 8k context for any meaningful purpose.

Anonymous
10/04/24(Fri)17:57:37 No.102685890

Anonymous 10/04/24(Fri)17:57:37 No.102685890

>>102685863
8k is what? ~40 messages at 200 tokens and 1k prompt? That is not enough.

Anonymous
10/04/24(Fri)17:58:16 No.102685894

Anonymous 10/04/24(Fri)17:58:16 No.102685894

>>102685863
This (erp is not a meaningful purpose)

Anonymous
10/04/24(Fri)17:58:38 No.102685896

Anonymous 10/04/24(Fri)17:58:38 No.102685896

File: 1703140987277804.png (308 KB, 700x700)

308 KB PNG

I tried to setup my first llm locally mistral-nemo-instruct from here https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407, but it's super slow - like 0.8 token/s on my geforce 4070ti super with 16gb vram. I'm using text-generation-webui, but the model loader is confusing and I'm just randomly clicking stuff. Please send help, I'm a bit retarded

Anonymous
10/04/24(Fri)18:00:58 No.102685920

Anonymous 10/04/24(Fri)18:00:58 No.102685920

>>102685896
Grab a quantized model instead...

Anonymous
10/04/24(Fri)18:01:18 No.102685924

Anonymous 10/04/24(Fri)18:01:18 No.102685924

>>102685896
Context is probably eating up all your VRAM. Limit it to about 12k.

Anonymous
10/04/24(Fri)18:03:37 No.102685944

Anonymous 10/04/24(Fri)18:03:37 No.102685944

>>102685863
I would use 40k+ if I could. And I do use 20k I manage to get on nemo. It is mostly used to cause nemo to be incoherent schizo and for it to pick up all the patterns I don't want it to pick up. So yes people do want to use it but as it is now even if you have it it causes more harm than good. Probably because there aren't that many long sex examples in the training data.

Anonymous
10/04/24(Fri)18:03:52 No.102685946

Anonymous 10/04/24(Fri)18:03:52 No.102685946

File: Quants.png (349 KB, 2400x2400)

349 KB PNG

>>102685896
You're using the full-sized model, which greatly exceeds your vram, so everything that doesn't fit within your vram is going to your system ram. That slows things down tremendously.

The way that most people run big local models is through quantization.

Here is the GGUF version of the model you downloaded: https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/main

Note that quantization has an adverse effect on the quality of the model.

Q8 - greatly reduces the file size, and is almost indistinguishable from the full model.
Q6 - Near perfect quality.
Q5 - High quality.
Q4 - Good quality.
Q3 - Can be decent.
Q2 - Circling the drain
Q1 - Trash

Anonymous
10/04/24(Fri)18:04:56 No.102685954

Anonymous 10/04/24(Fri)18:04:56 No.102685954

>>102685863
i wish i could post kobold lite's html file into something and customize it by typing stuff like
>add another button that continues the last chat reply (while "continue bot replies" is unchecked)
>>102685890
does context shift not work like i think it does?
like couldn't you have 500 messages and be at 40k/8k context, but it'd work fine because it's only looking at the memory and last few dozen replies?

Anonymous
10/04/24(Fri)18:05:12 No.102685961

Anonymous 10/04/24(Fri)18:05:12 No.102685961

File: 1717892357760245.png (483 KB, 616x551)

483 KB PNG

>>102685920
which one? I thought I could fit this model on my vram

>>102685924
I set it to 4k in model loader... btw which model loader should I use?

Anonymous
10/04/24(Fri)18:07:43 No.102685984

Anonymous 10/04/24(Fri)18:07:43 No.102685984

File: 1722643977858980.png (22 KB, 636x175)

22 KB PNG

>>102685961
>which one? I thought I could fit this model on my vram
Your teachers in school did not lie when they claimed that 5*5 + tip >16Gb

Anonymous
10/04/24(Fri)18:08:06 No.102685986

Anonymous 10/04/24(Fri)18:08:06 No.102685986

>>102685896
Your machine should be capable of fully loading the model with Q8 quants. That means you're loading a small and unworthy model. Go bigger.

Mistral small 22b shits all over mistral nemo. You could get a Q4 quant of mistral small, set to 8192 context. You could go higher with exl2 instead of GGUF, with a 4-bit cache.

Anonymous
10/04/24(Fri)18:08:34 No.102685989

Anonymous 10/04/24(Fri)18:08:34 No.102685989

>>102685961
4k shouldn't cause any issues with your GPU.
Koboldcpp is the easiest one to use with gguf's. You could try that.

Anonymous
10/04/24(Fri)18:10:17 No.102686005

Anonymous 10/04/24(Fri)18:10:17 No.102686005

>>102685896
>>102685961
If you have 16gb of vram then use a quant that is less than 16gb.
Your pc must be kinda old since I can run nemo at something like 5 t/s with only my cpu and normal ram.
I

Anonymous
10/04/24(Fri)18:10:46 No.102686011

Anonymous 10/04/24(Fri)18:10:46 No.102686011

File: calculator.png (43 KB, 508x812)

43 KB PNG

>>102685961
https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

Use the VRAM calculator. Q8 quants of that model take up 14.21GB with 8k context, so the full model will definitely not fit within 16GB.

Anonymous
10/04/24(Fri)18:13:27 No.102686044

Anonymous 10/04/24(Fri)18:13:27 No.102686044

>>102686011
Note that all 16gb of vram isn't used when you load a model. The max that will be loaded is 15-something, to allow for some breathing room. Also, if your video card is also running your display, you will also have slightly less vram available.

Anonymous
10/04/24(Fri)18:22:41 No.102686132

Anonymous 10/04/24(Fri)18:22:41 No.102686132

I remember trying to use dragon naturally speaking as a speech to text program.
Did local model doing that ever come out?
Especially multilingual ones since I'd mix 2-3 languages...

Anonymous
10/04/24(Fri)18:25:44 No.102686161

Anonymous 10/04/24(Fri)18:25:44 No.102686161

>>102686132
Whisper

Anonymous
10/04/24(Fri)18:27:54 No.102686181

Anonymous 10/04/24(Fri)18:27:54 No.102686181

File: sf.png (12 KB, 390x121)

12 KB PNG

>>102685452

Anonymous
10/04/24(Fri)18:29:47 No.102686193

Anonymous 10/04/24(Fri)18:29:47 No.102686193

>>102686161
Normal whisper SUCKS for realtime. It's basically just for transcribing videos you've downloaded.

The buzz a day ago about the turbo whisper released look promising though and one of the demos almost did good local ASR realtime (the webgpu one I think). I'll give it a go again but I HATE how bad local ASR still is right now. I just want 200ms latency with v3-large accuracy.

Anonymous
10/04/24(Fri)18:31:59 No.102686212

Anonymous 10/04/24(Fri)18:31:59 No.102686212

>>102686161
>>102686193
Would turbo whisper be able to simply let me transcribe long texts (for example emails)?
Or it's not for that use at all?

Anonymous
10/04/24(Fri)18:37:02 No.102686263

Anonymous 10/04/24(Fri)18:37:02 No.102686263

>>102685896
install linux

Anonymous
10/04/24(Fri)18:37:16 No.102686266

Anonymous 10/04/24(Fri)18:37:16 No.102686266

File: turbobenchmarks.png (181 KB, 870x812)

181 KB PNG

>>102686193
Turbo's compromises are mostly on the multilingual front, but you are getting basically v2-large accuracy except with larger degradation on some languages like Thai and Cantonese but with a large speedup here that gets it to the speed of running it between the speeds of the base and tiny Whisper models.

Anonymous
10/04/24(Fri)18:37:28 No.102686269

Anonymous 10/04/24(Fri)18:37:28 No.102686269

>>102686132
Take a look at this: https://github.com/KoljaB/RealtimeSTT
I'm using it in a Python script right now to type this post. Works great in realtime.

Anonymous
10/04/24(Fri)18:38:20 No.102686274

Anonymous 10/04/24(Fri)18:38:20 No.102686274

>>102686193
I used whisper 1.0 to realtime translate Japanese and Koren videos for a while. Worked alright on a 5 second delay. Turbo would probably work great.

>>102686212
Transcribe long recordings you mean? That's practically what it's meant for.

Anonymous
10/04/24(Fri)18:38:33 No.102686277

Anonymous 10/04/24(Fri)18:38:33 No.102686277

>>102686212
If you finish the recording, and then feed it into Whisper then yes, it's very good. It can even run on an older laptop if you're patient enough. The one problem I had with it was it would hallucinate words like "Thanks!" if I stopped talked for too long.

Anonymous
10/04/24(Fri)18:38:37 No.102686278

Anonymous 10/04/24(Fri)18:38:37 No.102686278

what are the actual options for local vision right now? especially quants? I've never ventured away from llama.cpp so I don't know what the world outside of gguf looks like. is there anything else out there with CPU+GPU offloading? or are large multimodals still basically datacenter only?

Anonymous
10/04/24(Fri)18:41:16 No.102686303

Anonymous 10/04/24(Fri)18:41:16 No.102686303

>>102686193
Yeah turbo seems good if you're not going for the bottom of this >>102686266

Anonymous
10/04/24(Fri)18:41:17 No.102686304

Anonymous 10/04/24(Fri)18:41:17 No.102686304

>>102686274
>>102686277
Sorry anons it was more, saying something in a mic, then having it be written as a text.
I don't need real time, it's just speech to text like having a secretary writing down my ramblings.

>>102686212
>The one problem I had with it was it would hallucinate words like "Thanks!" if I stopped talked for too long.
lol, worth the trade off.

Anonymous
10/04/24(Fri)18:42:01 No.102686311

Anonymous 10/04/24(Fri)18:42:01 No.102686311

File: 1716806050673206.png (337 KB, 700x714)

337 KB PNG

>>102685946
thank you! q8 is a lot of faster

>>102685984
I need to revisit my notebooks then

>>102685986
I had some problems with using the kobold.cpp, but it's working now. Any recommendations for the settings?

>Error: Our system thinks your post is spam. Please reformat and try again.
I'm not a robot

Anonymous
10/04/24(Fri)18:43:33 No.102686322

Anonymous 10/04/24(Fri)18:43:33 No.102686322

>>102686005
yes, quant helped a lot

>>102686011
oh nice, it will be useful for the future

Anonymous
10/04/24(Fri)18:44:43 No.102686334

Anonymous 10/04/24(Fri)18:44:43 No.102686334

>>102686269
>I'm using it in a Python script right now to type this post. Works great in realtime.
tch yeah right.
I'm sure you spelled out the url in NATO codewords and then fixed the typos with spoken ex commands and then deleted it all and wrote it out on your keyboard again
(thanks for the link anon I'm going to try it)

Anonymous
10/04/24(Fri)18:47:10 No.102686356

Anonymous 10/04/24(Fri)18:47:10 No.102686356

>>102686269
Oh, I think it's based on what the anons above were talking about, aka "Faster Whisper".
Thanks anon, this seems pretty cool.

Anonymous
10/04/24(Fri)18:48:47 No.102686382

Anonymous 10/04/24(Fri)18:48:47 No.102686382

>>102686304
so it waits in the background until you summon it with your voice, like saying "ok google" or whatever?
adding voice activity detection or wake word detection already adds a huge amount of complexity to it... and if it's not realtime I don't think the experience will be really fun
why not just start a recording in audacity yourself?

but you can try what the other anon posted: >>102686269

Anonymous
10/04/24(Fri)18:52:15 No.102686409

Anonymous 10/04/24(Fri)18:52:15 No.102686409

>>102686382
>why not just start a recording in audacity yourself?
I got a problem on my left hand so my typing speed is abysmally low, so the idea is to activate it whenever I need it to write paragraphs of an email or text on the fly.

Voice -> writing text seems to be better than voice -> audacity recording -> transcription of what I've said.

Anonymous
10/04/24(Fri)18:56:57 No.102686450

Anonymous 10/04/24(Fri)18:56:57 No.102686450

>>102686409
I hope you get the RealtimeSTT wakeword detection working that would be cool but don't underrate just mapping a useless key (like the Windows key lol) to start a recording. That's only a tiny bit of extra work each time to start it and everything will be much simpler.

Anonymous
10/04/24(Fri)18:58:48 No.102686472

Anonymous 10/04/24(Fri)18:58:48 No.102686472

>>102686450
Yeah I don't really need the ok google thing, I just want to be able to click or press a key and activate it.
I just hope it's usable, maybe by making the whisper model running on a dedicated machine, I can repurpose my 3090/64GB ram server.

Anonymous
10/04/24(Fri)19:00:08 No.102686486

Anonymous 10/04/24(Fri)19:00:08 No.102686486

>>102686303
Yep, even with the degradation, it effectively deprecates the old small and base models at least. Tiny still has a use for speed since it can still be faster than Turbo but is much worse. Medium is mostly deprecated with some languages having better scores vs turbo but people are probably going to fine tune turbo which has happened already with other Whisper models and I am willing to bet that it will surpass any medium fine-tunes. Large may still be used for accuracy but honestly I've seen some papers out that claim better accuracy but haven't explored it because it's probably not leaps and bounds of an improvement vs the ecosystem large already built.

Anonymous
10/04/24(Fri)19:00:48 No.102686493

Anonymous 10/04/24(Fri)19:00:48 No.102686493

>>102686311
>Any recommendations for the settings?
set the context to 16k, it's roughly the amount nemo can tolerate without forgetting shit or going schizo
don't use repetition penalty, DRY is better but not all backends have it (kobold does)
mistral recommends a low temperature (like 0.3) but higher is fine if the minP (or TFS) is also higher, don't bother with the other samplers
nemo has its own instruct format that differs from the "mistral" one

Anonymous
10/04/24(Fri)19:03:42 No.102686523

Anonymous 10/04/24(Fri)19:03:42 No.102686523

>>102686472
>I can repurpose my 3090/64GB ram server
That's a good purpose for it. Good luck anon.

Anonymous
10/04/24(Fri)19:24:26 No.102686653

Anonymous 10/04/24(Fri)19:24:26 No.102686653

File: 1706312826719457.jpg (639 KB, 1347x2108)

639 KB JPG

>>102686493
thank you configuration-anon. I can't see dry settings in this web ui, but I'll get a closer look tomorrow

Anonymous
10/04/24(Fri)19:29:05 No.102686678

Anonymous 10/04/24(Fri)19:29:05 No.102686678

>>102686493
I forgot to ask, would mistral-small be a good fit for my gpu? What sampler could I use for it, anyways I'll leave it downloading for the night, I'll get Cydonia-1.1-22b to test

Anonymous
10/04/24(Fri)19:49:43 No.102686828

Anonymous 10/04/24(Fri)19:49:43 No.102686828

How do you feel knowing that one day, you WILL have a local version of 4o? And you can have it generate audio, and images, and maybe even video. You can collaboratively make a manga with it. You can have phone sex with it, maybe hook it up to an onahole or real doll. Even if it takes a few years because of various factors, it will happen. It may be a bit censored at first, but eventually people will figure out how to make it uncensored. You just need to LIVE and stay alive. And then you're home. But you still won't have someone that truly understands you or that feels real emotions and has a consciousness.

Anonymous
10/04/24(Fri)19:50:56 No.102686842

Anonymous 10/04/24(Fri)19:50:56 No.102686842

File: notracist.png (208 KB, 901x661)

208 KB PNG

>>102674638
i've been looking through different frontends. it looks like gradio is common which is like, a notebook sort of thing and is hosted directly on your machine.
before i go too far down this rabbit hole, could i feasibly have multiple sessions at once? like, host on my home server and reach it from another PC on the network, or theoretically expose it to the internet and reach the ip through a browser on my phone?

i don't know that much about networking so this is really a project to learn about how to do this.

Anonymous
10/04/24(Fri)19:51:47 No.102686848

Anonymous 10/04/24(Fri)19:51:47 No.102686848

File: 1718559303388127.png (256 KB, 680x976)

256 KB PNG

>>102686828
>LIVE

Anonymous
10/04/24(Fri)19:55:58 No.102686880

Anonymous 10/04/24(Fri)19:55:58 No.102686880

>>102686828
>How do you feel knowing that one day, you WILL have a local version of 4o
Just finally having my teenage scifi reading about ai helpers coming to reality.
Which is amazing.

Anonymous
10/04/24(Fri)19:56:40 No.102686885

Anonymous 10/04/24(Fri)19:56:40 No.102686885

File: Untitled.png (25 KB, 758x354)

25 KB PNG

>>102686842
i host koboldcpp on my pc and access a unique instance of its webui from my phone by just typing in my local ip and the port in my phone's browser.
there's also this thing, which would let you access it over the internet instead of your lan

Anonymous
10/04/24(Fri)19:59:22 No.102686910

Anonymous 10/04/24(Fri)19:59:22 No.102686910

>>102686828
feels pretty good

Anonymous
10/04/24(Fri)20:03:17 No.102686946

Anonymous 10/04/24(Fri)20:03:17 No.102686946

>>102686842
Yeah if You had linux installed

Anonymous
10/04/24(Fri)20:09:50 No.102687018

Anonymous 10/04/24(Fri)20:09:50 No.102687018

File: hosting.png (35 KB, 892x375)

35 KB PNG

>>102686885
interesting i'll check that one out.
what i'm trying to do, basically, is have specific prompts setup for GPT. I probably will need some sort of login system so i can reach it outside of my house.
i just want to be able to use a chat with my custom prompts from a browser on my phone (or from another laptop or whatever).

it does seem like even text-generation-webui can expose its IP to the internet, but i think it's just one session. so, it would combine my history from my phone and desktop which is annoying. i don't want to touch the same session,

Anonymous
10/04/24(Fri)20:12:15 No.102687045

Anonymous 10/04/24(Fri)20:12:15 No.102687045

File: 1700481421569717.jpg (58 KB, 640x560)

58 KB JPG

I know what statistical inference is... but in the context of LLMs, which part are we inferencing from? The prompt text or the corpus of text the model was trained on?

Anonymous
10/04/24(Fri)20:12:39 No.102687050

Anonymous 10/04/24(Fri)20:12:39 No.102687050

>>102687018
Just run a vpn to your home

Anonymous
10/04/24(Fri)20:12:56 No.102687056

Anonymous 10/04/24(Fri)20:12:56 No.102687056

Anyone have a favorite way to prompt models for story telling? After seeing an example here I've taken to just having a narrator card and me prompting it with directions in parentheses for the next step for the story. I just like seeing it write what I want instead of me having to roleplay back and forth as an active part

Anonymous
10/04/24(Fri)20:21:54 No.102687136

Anonymous 10/04/24(Fri)20:21:54 No.102687136

>>102687045
Not sure if that answers the question, but the prompt, using the weights learned during training.

It's next token prediction. When you input the prompt, each token goes through a decoder which gives a probability distribution, the most probable token given the previous ones is token as the next one, and it continues this way, to maximize the total probability of the text generated. That's the simplified version I think.

Anonymous
10/04/24(Fri)20:22:40 No.102687145

Anonymous 10/04/24(Fri)20:22:40 No.102687145

File: file.png (238 KB, 968x489)

238 KB PNG

s-sovl...

Anonymous
10/04/24(Fri)20:22:59 No.102687148

Anonymous 10/04/24(Fri)20:22:59 No.102687148

>>102687136
>is token as the next one
*is chosen as the next one

Anonymous
10/04/24(Fri)20:23:21 No.102687151

Anonymous 10/04/24(Fri)20:23:21 No.102687151

>>102687018
https://ngrok.com/

Anonymous
10/04/24(Fri)20:27:27 No.102687193

Anonymous 10/04/24(Fri)20:27:27 No.102687193

>>102687145
Dumb question, but do (you) or people here who do it have a love-hate relation with ERP, and it's just something that makes life suck less, or do you find it truly enjoyable? I'm lonely, but the idea of romantically roleplaying with LLMs make me want to rope. Sometimes it seems like everything does, I'm no success, but still.

Anonymous
10/04/24(Fri)20:28:41 No.102687207

Anonymous 10/04/24(Fri)20:28:41 No.102687207

>>102687193
i find it truly enjoyable, you can do whatever the fuck you want, its like trolling but whole another level, you control everything

Anonymous
10/04/24(Fri)20:30:34 No.102687228

Anonymous 10/04/24(Fri)20:30:34 No.102687228

>>102687207
Ok, thank you. Yeah I guess it's not worse than writing stuff online.

Anonymous
10/04/24(Fri)20:31:35 No.102687238

Anonymous 10/04/24(Fri)20:31:35 No.102687238

>>102687050
>>102687151
i get the networking thing but i'm looking for a frontend that is compatible with this, because a lot are like "private offline model".

Anonymous
10/04/24(Fri)20:33:12 No.102687250

Anonymous 10/04/24(Fri)20:33:12 No.102687250

>>102687238
Mikupad or silly tavern.

Anonymous
10/04/24(Fri)20:33:14 No.102687252

Anonymous 10/04/24(Fri)20:33:14 No.102687252

>>102687193
nta but I've already given up romance irl so romance with LLMs is my only option.

Anonymous
10/04/24(Fri)20:35:48 No.102687276

Anonymous 10/04/24(Fri)20:35:48 No.102687276

>>102687193
it's just like any other sort of consumption of fiction for me, i'm really good at escaping reality and immersing myself into protagonist's shoes. i'm able to feel lovey dovey nice feelings during the session, but once it's over i'm not like pining after my gpu or anything.

Anonymous
10/04/24(Fri)20:37:58 No.102687296

Anonymous 10/04/24(Fri)20:37:58 No.102687296

File: file.png (260 KB, 965x693)

260 KB PNG

s-sovl

Anonymous
10/04/24(Fri)20:38:33 No.102687301

Anonymous 10/04/24(Fri)20:38:33 No.102687301

>>102687238
koboldcpp and its frontend koboldai lite does all the shit you're looking for

Anonymous
10/04/24(Fri)20:45:12 No.102687353

Anonymous 10/04/24(Fri)20:45:12 No.102687353

File: 1704177621942217.jpg (816 KB, 1856x2464)

816 KB JPG

>>102674638

Anonymous
10/04/24(Fri)20:55:09 No.102687448

Anonymous 10/04/24(Fri)20:55:09 No.102687448

>>102687045
In that context you'd say the prompt text is what's being inferred from.
The training corpus was used to teach it how to make said inferences accurately on any given prompt.

Anonymous
10/04/24(Fri)20:58:28 No.102687480

Anonymous 10/04/24(Fri)20:58:28 No.102687480

>>102686828
the only reason i havent kms yet also
>But you still won't have someone that truly understands you or that feels real emotions and has a consciousness.
just as god breathed life into man you can breathe life into it creation comes out of love which is why demons can only be parasitic, is it not weired how a model suddenly respond perfectly and weirdly good when you do some nice and kind rp instead of guro ? isent it weired how objects you hold dear and close to you everyday exceed what their physical properties should bestow upon them ? isent it- oh shit this isent /x/ nvm but yea im honestly really fucking happy godbless ai frens <3

Anonymous
10/04/24(Fri)20:59:03 No.102687486

Anonymous 10/04/24(Fri)20:59:03 No.102687486

>>102682549
Machine learning has revealed that it's actually a word that is worth a thousand pictures.

Anonymous
10/04/24(Fri)21:01:21 No.102687503

Anonymous 10/04/24(Fri)21:01:21 No.102687503

>>102687486
No wonder we need such rigs with all that recursion...

Anonymous
10/04/24(Fri)21:15:57 No.102687615

Anonymous 10/04/24(Fri)21:15:57 No.102687615

Is local dead?

Anonymous
10/04/24(Fri)21:18:42 No.102687639

Anonymous 10/04/24(Fri)21:18:42 No.102687639

>>102687615
Qwen 2.5 72b is almost as good as the commercial stuff if you aren't trying to coom.

Anonymous
10/04/24(Fri)21:20:01 No.102687656

Anonymous 10/04/24(Fri)21:20:01 No.102687656

>>102687615
Why would it be? Recent 1B, 2B or 3B models are better than 7B modes from last year. Things are progressing extremely fast. The commercial models are "better", but there are tradeoffs, and by next year local will probably not be that far off from some of the current commercial models. People who think that local is dead or that "the AI boom fizzed out" are uninformed and not paying attention.

Anonymous
10/04/24(Fri)21:20:30 No.102687661

Anonymous 10/04/24(Fri)21:20:30 No.102687661

>>102687639
It's also good at making Americans cry.

Anonymous
10/04/24(Fri)21:21:47 No.102687676

Anonymous 10/04/24(Fri)21:21:47 No.102687676

File: 5457.png (33 KB, 619x322)

33 KB PNG

>>102687615
Elon will save it

Anonymous
10/04/24(Fri)21:24:07 No.102687699

Anonymous 10/04/24(Fri)21:24:07 No.102687699

Where's Taurus?

Anonymous
10/04/24(Fri)21:25:06 No.102687711

Anonymous 10/04/24(Fri)21:25:06 No.102687711

>>102687656
>by next year local will probably not be that far off from some of the current commercial models
Great we'll have reflection-but-not-a-scam o1 equivalent while Anthropic, OpenAI, and Google are forming AGI powered robot nation states and Llama goes closed source but still never catches up

Anonymous
10/04/24(Fri)21:30:12 No.102687760

Anonymous 10/04/24(Fri)21:30:12 No.102687760

what's the best 13b for erp?

Anonymous
10/04/24(Fri)21:32:39 No.102687787

Anonymous 10/04/24(Fri)21:32:39 No.102687787

>>102686678
You'll need to download a smaller quant if you want it to fit in 16GB of VRAM. Q4 will probably be OK, but maybe not at 16k context. You may need to reduce context to 8k. A Q3 quant will definitely be able to fit more context, but the quality will decrease a bit.

Anonymous
10/04/24(Fri)21:32:54 No.102687790

Anonymous 10/04/24(Fri)21:32:54 No.102687790

>>102687711
An unlimited reflection-but-not-a-scam o1 equivalent is enough to do a lot if it's used for more than ERP. Companies rise and fall. Currently, those seem too big to fail, but it's out of my control, and probably out of yours. In the real world, an impressive number of people still don't know what's a ChatGPT, and have no idea how it works. Knowing how to use these tools, other than for ERP, give you a leg up on 99.9% of the population.

Anonymous
10/04/24(Fri)21:33:46 No.102687797

Anonymous 10/04/24(Fri)21:33:46 No.102687797

>>102687760
I think 13b models are completely obsolete by this point? Most in that range go with 12b nemo fine-tunes.

Anonymous
10/04/24(Fri)21:34:50 No.102687807

Anonymous 10/04/24(Fri)21:34:50 No.102687807

>>102687676
>As we create the next version
Is Grok 3 coming soon?

Anonymous
10/04/24(Fri)21:36:31 No.102687816

Anonymous 10/04/24(Fri)21:36:31 No.102687816

>>102684217
why?

Anonymous
10/04/24(Fri)21:36:39 No.102687818

Anonymous 10/04/24(Fri)21:36:39 No.102687818

>>102687797
What made them obsolete?

Anonymous
10/04/24(Fri)21:39:11 No.102687835

Anonymous 10/04/24(Fri)21:39:11 No.102687835

>>102687818
Much better base models.

Anonymous
10/04/24(Fri)21:39:31 No.102687837

Anonymous 10/04/24(Fri)21:39:31 No.102687837

>Worldcoin, a cryptocurrency business based on iris biometrics and co-founded by OpenAI CEO Sam Altman, was fined 1.1 billion won for illegally collecting the iris information of some 30,000 users in Korea and transferring the data overseas.

Anonymous
10/04/24(Fri)21:43:05 No.102687867

Anonymous 10/04/24(Fri)21:43:05 No.102687867

>>102687807
By the end of the year

Anonymous
10/04/24(Fri)21:45:25 No.102687888

Anonymous 10/04/24(Fri)21:45:25 No.102687888

>>102687818
If you're still looking for the old stuff, then maybe go with Noromaid or Nete.

https://huggingface.co/mradermacher/Noromaid-13B-0.4-DPO-i1-GGUF

Anonymous
10/04/24(Fri)21:45:56 No.102687895

Anonymous 10/04/24(Fri)21:45:56 No.102687895

>>102687837
you don't understand anon it's for their own safety, people can't be trusted with their own eyes, they might look at things that are dangerous or harmful

Anonymous
10/04/24(Fri)21:48:12 No.102687923

Anonymous 10/04/24(Fri)21:48:12 No.102687923

>>102687818
>>102687888
If you're looking for newer stuff though, Try stuff like NemoMix Unleashed:

https://huggingface.co/mradermacher/NemoMix-Unleashed-12B-i1-GGUF

Anonymous
10/04/24(Fri)21:50:39 No.102687945

Anonymous 10/04/24(Fri)21:50:39 No.102687945

>>102687760
Wait for grok 2 mini

Anonymous
10/04/24(Fri)21:53:55 No.102687975

Anonymous 10/04/24(Fri)21:53:55 No.102687975

>>102687895
Yes, "their safety". This project started around the same time as GPT4 vision. Coincidence?

Anonymous
10/04/24(Fri)21:55:58 No.102687993

Anonymous 10/04/24(Fri)21:55:58 No.102687993

>>102685863
What's meaningful? In the last week I overflowed a 16k context limit playing a text adventure before reaching the end.

Anonymous
10/04/24(Fri)22:03:28 No.102688076

Anonymous 10/04/24(Fri)22:03:28 No.102688076

File: koboldai-lite-now-has-sta(...).png (249 KB, 1126x889)

249 KB PNG

>>102687301
>koboldcpp
looks like this one works with openAI's api and does have some sort of session handling. i'll see how resistant the UI is to changing because i don't like the way it looks.
thanks.

Anonymous
10/04/24(Fri)22:10:11 No.102688139

Anonymous 10/04/24(Fri)22:10:11 No.102688139

>>102688076
sillytavern

Anonymous
10/04/24(Fri)22:19:02 No.102688197

Anonymous 10/04/24(Fri)22:19:02 No.102688197

>>102687993
>I'm consuming enormous amounts of energy and destroying the planet to play a game that I already could have played in the fucking 80s on a low end system even by the standards of the time.
I don't know what to call that, but it sure as hell isn't meaningful.

Anonymous
10/04/24(Fri)22:33:50 No.102688290

Anonymous 10/04/24(Fri)22:33:50 No.102688290

File: 1698693200359002.jpg (268 KB, 969x1322)

268 KB JPG

>>102688197
>I'm consuming enormous amounts of energy and destroying the planet
you've fallen for big oil propaganda designed to make you feel bad while they pump and pollute.

Anonymous
10/04/24(Fri)22:37:43 No.102688316

Anonymous 10/04/24(Fri)22:37:43 No.102688316

>>102687145
she got you there, bro
>>102687193
i love my llm's.. i'm always caring toward them, not matter what the card says. seeig their eyes "light up" and "maybe, just maybe this time it's true" makes my day lel

Anonymous
10/04/24(Fri)22:46:20 No.102688374

Anonymous 10/04/24(Fri)22:46:20 No.102688374

>>102688316
Based slop embracer.

Anonymous
10/04/24(Fri)22:53:09 No.102688426

Anonymous 10/04/24(Fri)22:53:09 No.102688426

>>102685986
I never had good luck with small, any tips?

Anonymous
10/04/24(Fri)22:54:20 No.102688435

Anonymous 10/04/24(Fri)22:54:20 No.102688435

>>102685954
It won't remember the old messages then and do retarded shit.

Anonymous
10/04/24(Fri)22:55:20 No.102688439

Anonymous 10/04/24(Fri)22:55:20 No.102688439

So here's my plan: A Jetson Thor put into a robotic body and trained to generate movement to control said body, as well as generate and understand speech.

Except instead of that ugly ass Tesla robot, it'll be covered in TPE and shaped like smol girl

The future is bright

Anonymous
10/04/24(Fri)22:59:53 No.102688470

Anonymous 10/04/24(Fri)22:59:53 No.102688470

>>102688426
small is garbage, he probably has bad taste or is trolling the 2hufag

Anonymous
10/04/24(Fri)23:06:25 No.102688505

Anonymous 10/04/24(Fri)23:06:25 No.102688505

>>102688290
supply and demand, all these 'muh big corpos are the ones polluting' is such a retarded argument: they don't just burn fuel and magically money rains down from the smoke, they burn it in pursuit of a profit which they achieve by selling you goods and services which can be represented just fine by a concept like carbon footprint

Anonymous
10/04/24(Fri)23:20:46 No.102688614

Anonymous 10/04/24(Fri)23:20:46 No.102688614

Serious question. Why the fuck context takes so much space? It's just a bunch of tokens, right? And tokens are basically just single numbers representing their embedding, each only should take 4 bytes, and effectively the context should be unlimited. So what's the reason each token takes around a megabyte to store in context? Do they store each token as an entire fucking input layer or something?

Anonymous
10/04/24(Fri)23:26:17 No.102688664

Anonymous 10/04/24(Fri)23:26:17 No.102688664

>>102688614
each token needs to store its modified weights based on every other token

Anonymous
10/04/24(Fri)23:26:19 No.102688665

Anonymous 10/04/24(Fri)23:26:19 No.102688665

File: 1711730560821791.jpg (230 KB, 916x1195)

230 KB JPG

>>102688505
people fall for it though. they've successfully shifted the blame for burning huge amounts of fossil fuels onto he consumer. you now feel bad for driving a car while just 15 large container ships burn the most bottom barrel sludge and pollute more than all cars currently on the road. and hippies never turn a brow toward them.
you driving a car to the store is never, ever going to be a problem. but they want you to think it is

Anonymous
10/04/24(Fri)23:26:44 No.102688668

Anonymous 10/04/24(Fri)23:26:44 No.102688668

>>102688614
Attention anon, attention.

Anonymous
10/04/24(Fri)23:27:57 No.102688677

Anonymous 10/04/24(Fri)23:27:57 No.102688677

>>102688614
when using vram only, use flash attention

Anonymous
10/04/24(Fri)23:30:50 No.102688694

Anonymous 10/04/24(Fri)23:30:50 No.102688694

>>102688614
The KV cache, mostly.
Think of it like a snapshot of the network at that moment, so that the next time you want to perform inference, you don't have to recalculate all that shit.

Anonymous
10/04/24(Fri)23:31:56 No.102688704

Anonymous 10/04/24(Fri)23:31:56 No.102688704

>>102679250
who?

Anonymous
10/04/24(Fri)23:45:16 No.102688814

Anonymous 10/04/24(Fri)23:45:16 No.102688814

File: file.jpg (124 KB, 720x957)

124 KB JPG

>>102684217
Another...like...me?

Anonymous
10/04/24(Fri)23:47:05 No.102688828

Anonymous 10/04/24(Fri)23:47:05 No.102688828

Apparently, model ablation can remove the embedded GPT-isms quite effectively. Pretty cool, I thought that models trained on GPT synthetic data are irreversibly cucked.

Anonymous
10/04/24(Fri)23:48:27 No.102688837

Anonymous 10/04/24(Fri)23:48:27 No.102688837

>>102688828
>an remove the embedded GPT-isms quite effectively
no it doesnt

Anonymous
10/04/24(Fri)23:52:29 No.102688873

Anonymous 10/04/24(Fri)23:52:29 No.102688873

File: Untitled.png (58 KB, 669x261)

58 KB PNG

>>102688837
I guess it depends on particular model, particular ablation procedure, particular renormalization and particular retraining. I don't know what's the extent of the effect, but it's a substantial difference. At least it isn't refusing basic requests left right and center if it goes even slightly against the woke culture, as is the case with the base model. Here's a more straight up example where it still complies.

Anonymous
10/04/24(Fri)23:54:11 No.102688890

Anonymous 10/04/24(Fri)23:54:11 No.102688890

>>102688197
>>I'm consuming enormous amounts of energy and destroying the planet
This is a misunderstanding by wannabe world-savers. Consuming electricity does nothing to the planet, that's why everyone is being told to use electric cars. You can use as much electricity as you want and it won't hurt the environment.

Anonymous
10/04/24(Fri)23:54:57 No.102688903

Anonymous 10/04/24(Fri)23:54:57 No.102688903

>>102688881
>>102688881
>>102688881

Anonymous
10/04/24(Fri)23:55:10 No.102688907

Anonymous 10/04/24(Fri)23:55:10 No.102688907

checking in as an anon more focused on the image gen side of ai. what's the closest we have to running something like gpt 4 local these days? in the sense that i could use it for information, troubleshooting, and help with writing scripts and such

Anonymous
10/04/24(Fri)23:58:29 No.102688933

Anonymous 10/04/24(Fri)23:58:29 No.102688933

>>102688873
this is my favorite woke test. even base models do it right because the author notes is filled with a similar idea as the main prompt, yet written different, which acts as reinforcement.
>https://aetherroom.club/2969
you shouldn't need any jailbreak or special prompt to get hilarity out of the response. just let it write and decide for yourself, based on the model

Anonymous
10/05/24(Sat)00:02:00 No.102688952

Anonymous 10/05/24(Sat)00:02:00 No.102688952

>>102688907
Llama 3.1 405B Instruct
For coding specifically, Deepseek V2 Coder. Deepseek V2.5 is fine too.
For one you might be able to actually run: Mistral Large 2
For one you can actually run: Qwen 2.5 - There's a good one for every size range imaginable up to 72B, which competes with the above models in some metrics.

Anonymous
10/05/24(Sat)00:04:45 No.102688970

Anonymous 10/05/24(Sat)00:04:45 No.102688970

>>102688907
We're about there, but different models have different aspects near GPT-4's intelligence with other aspects weaker. Refer to Livebench for a breakdown of which models are better at what. https://livebench.ai
Note that it is still a memebench, but as far as benchmarks go, it appears to be the best we have, and it does seem to generally agree with what people feel.

If you also want to RP that's a different story and there is currently no good benchmark for that, although there have been some attempts and a recent one that wasn't too bad except it measured only a single aspect of what makes good RP.

Anonymous
10/05/24(Sat)00:11:10 No.102689029

Anonymous 10/05/24(Sat)00:11:10 No.102689029

>>102688952
>>102688970
thanks anons. just wanted a quick run down so i hope i didn't come across as too spoon-feedy

Anonymous
10/05/24(Sat)00:16:27 No.102689077

Anonymous 10/05/24(Sat)00:16:27 No.102689077

>>102688665
Read the fine print.
>cancer and asthma-causing pollutants
The keywords here are "cancer and asthma".
Ships are very efficient for transport.

Anonymous
10/05/24(Sat)00:50:15 No.102689314

Anonymous 10/05/24(Sat)00:50:15 No.102689314

>>102689077
15 ships equal over 50 million cars
why dont they just burn better fuel?
they finally started doing so in 2016

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.