[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102663772 & >>102654480

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102663772

--Mistral Nemo GGUF replacement and settings discussion:
>102663996 >102664261 >102664483 >102664589 >102664619 >102664707 >102664879 >102664965 >102664966 >102665564
--Krebs on Security article about AI sex bots and cloud compromise:
>102665407 >102668510 >102665725 >102665761 >102665505 >102665550
--Discussion on overused phrases, token banning, and creativity benchmarks:
>102673168 >102673297 >102673314 >102673324 >102673377 >102673346 >102673499 >102673430 >102673507 >102673549 >102673579 >102674288 >102674296 >102673632 >102674333 >102673724 >102673824 >102674037 >102674202 >102674316 >102673523 >102673561 >102673765
--Resolved llama.cpp crash by downgrading kernel from 6.11.1 to 6.10.6:
>102671604 >102671634 >102671639 >102672355 >102672397 >102672469 >102672533 >102673237
--Recent 5% speed boost in CPU inference for llama.cpp:
>102668101 >102668973 >102669056 >102669193
--Reasons for processing time differences between messages in chat:
>102663821 >102664017 >102664386
--Qwen2.5-32B recommended for 30B/24GB model:
>102671644 >102671673 >102671788 >102672221 >102672506 >102673061 >102673350 >102673398
--LiveBench and WildBench considered better leaderboards than Chatbot Arena:
>102664841 >102664867 >102664876 >102665016
--FLUX1.1 [pro] and [dev] announced, users discuss open weights and variants:
>102665211 >102665241 >102665288 >102665305 >102665341 >102665357
--Effectiveness of different datasets used for finetuning Qwen2.5-32B-AGI model:
>102665921 >102665959 >102666005 >102668269 >102668503 >102672484
--Antislop-sampler and TabbyAPI string ban comparison:
>102665533 >102665554 >102665603 >102665688 >102665835
--Miku & Rin (free space):
>102663922 >102663925 >102664589 >102666327 >102666616 >102671580 >102671826 >102671919

►Recent Highlight Posts from the Previous Thread: >>102663782

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
So, there are different subcategories of creativity. One is word and phrase-level. Things like "sparkling" vs "twinkling". And another is scenario-level. Perhaps we can concretely define creativity generally, and fairly, as how probable an entity is to output ideas that are different from what it output in the past, whether at the word or scenario-level, that is also at the same time an idea that is logical and coherent. So for a model, it would be how likely the model is to not generate the same things it did earlier in context, when prompted to answer differently or to be creative.

Measuring creativity at the word-level could be not too difficult. It's probably fine to analyze logit distributions somehow, and we could also look at the developing methods people are coming up with for measuring slop.

Scenario-level is the hard one, since LLMs are auto-regressive and often don't have a very solid "plan" in mind for where it wants to take the story next, so even a slightly different token choice for something can make it go in a completely different direction (or in the case of slop, in a repetitive direction) unless it was trained to stop itself and course correct. Potentially COT + RL methods could improve models here, but that's a solution that needs specialized data that doesn't generalize from math/code-based COT data, as we've learned from o1. But still that's not a measurement method.
>>
After testing tons of fine tunes, nemo-instruct really is the best at roleplaying while using lorebooks.
>>
Someone needs to train something like o1 for RP, the dataset could be something like:
User: aah aah mistress...

Assistant: <thinking>
First, what is going on here?

The user wrote:

"aah aah mistress"

Observation 1: We can see that the user is acting, intentionally or unintentionally, like a retard.

So I should bully him for his lack of mental faculties.

First, write a stretch of the first line

```
"Take your meds, idiot." She says, her eyes sparkling with mischief as she feels a mix of disgust and anger.
```

Hmm.

But actually the use of phrases like "eyes sparkling" is considered slop according to the rules.

Idea: I could rewrite just that part, without the slop.

Let's test this theory.


```
"Take your meds, idiot." She says, her eyes narrowing with anger as she feels a mix of disgust and anger.
```

That's better.

Wait a minute.

Feeling a mix of emotions is also slop according to the rules.

Repeating "anger" feels awkward.

Idea: Rewrite again avoiding repetition.

I will try that.

```
"Take your meds, idiot." She says, her eyes narrowing with anger as it threatens to boil over.
```

That's better.

...


(Based on: https://rentry.org/openai1)
>>
>>102674487
Actually a thesaurus model is pretty interesting, since you could pair it with a regular LLM, and the injection of word-level creativity could make the LLM think differently without affecting the logic of what it intended to generate.

Though perhaps the more computationally efficient approach is to use some type of a repetition sampler that works at the string (not logit) level, so it would be like a combination of a traditional repetition sampler with the recent deslopping sampler, except it also has a recorded history of all context you as a user have ever encountered, so when you start a new chat or you're jumping to a different chat, it still knows to not repeat stuff you just read elsewhere.
>>
File: igedBtt.jpg (390 KB, 1242x1211)
390 KB
390 KB JPG
>>102674646
>--Miku & Rin (free space)
>>
>>102674677
>he finally learned that basically all third-party tunes are memes
Congratulations.
>>
>>102674646
>9 reply limit
Should be all spent on free space.
>>
Mikulove
>>
>>102674687
It's an interesting idea. Though personally I'd like to see people focus on advancing sampling techniques even more than they have so far, as hidden thinking stuff is not cheap/fast for inference.
>>
>>102674702
>since you could pair it with a regular LLM
Stacking more models on top of each other adds an extra model to wrangle. "And the magical orb fell to the floor, sparkling with [thesaurus interjecting at exactly the wrong time] mischief, dropped her panties and fucked everyone in the room".

The second idea seems to involve an finite, but large and growing list of things the user has already read. If we're gonna fantasize, i'd rather think of a model that has continuous training from it's interaction with the user, being able to be molded over time. Something closer to actual learning.
>>
>>102674814
sampling is a meme, anything other than minp+temperature is cope.
>>
>>102674806
you can't love miku she is a computer
>>
>>102674816
For the first idea, I think it could be streamlined by basically being treated as a sampler that backtracks like the deslopping sampler. It's not like backtracking can't be done.

Continuous learning would be nice yeah. Though I think that's orders of magnitude more complex to solve for a random developer who also probably doesn't have intimate knowledge of LLM architectures and how they work. There is a ton more that can done with sampling that is more intelligent and complex than what has been developed so far and it's my belief that there is more that can be squeezed out of that step, even if we have to use some kind of additional small model in the process, since that would STILL be more efficient than fully making a model to a long COT. And it could even be used alongside the COT to diversify the LLM's "thoughts".
>>
>>102674844
Existing samplers are a meme, yes.
>>
>>102674816
> i'd rather think of a model that has continuous training from it's interaction with the user
Wait, why isn't this done already?
Taking the last swipe and doing a training run on it should in time tailor the model towards the user's preference, right?
>>
>>102674924
Samplers can't do magic, if the model sucks at giving tokens the right probabilities then it's over before it even began.
>>
>>102674925
>Wait, why isn't this done already?
Requirements, even for a single training run, are high. Most people still struggle to run a 70b quantized down to hell, counting layers and context tokens before the model goes nuts. Once you solve the technical issue, there's how much of an effect that one training run should have. We see people that train models all the time overfitting over and over again. But also, you don't want it to have so little effect that it's imperceptible. I don't think training on a single example, even if you have many, is going to be good enough.
Even if/when we get cheap 1-example training, there's still a bunch of other problems to solve.
But i'd like something more malleable than what we have. Weights that change. Biases that get modified during inference and cause an actual change in the model, hopefully, without destroying it and having to start all over again. A man can dream...
>>
>>102675017
I'm ignorant of technical stuff, how high is the cost compared to inference to tune, let's say nemo, for 250 tokens worth? If there's a v-ram requirement increase, by how much?
>>
>>102675017
It would still be possible to have the model summarize bullet points after the session, store it in a database then retrieve it through a RAG based approach, making the weight of things slowly decrease with time, and only once in a while finetune a new LORA, merge it with the older ones, and also giving recent LORAs more weight than older ones during the merges, etc.

But for a lot of applications, wouldn't just the RAG part be enough? Frameworks like txtai lets you summarize and search/retrieve things while weighting stuff by date it was stored, so the pieces seem to be there.
>>
>>102675059
You can try this https://rahulschand.github.io/gpu_poor/
>>
>>102675059
Most training is done at full resolution. Can we even train quantized models at 8 or 4 bit already? I'm ok with a 24GB min requirement, even if i don't have the hardware.

>>102674925
>>102675017 (cont)
I had a thought, so let me spill this useless idea that i'll never get to implement and depend on things that i'm still not sure how effective they are. [Where the fuck did i leave my idea guy's hat...]

Live control vectors.
They work on the principle that the difference in internal state from opposite/contrasting prompts can be used to influence the layers of the model one way or another to influence the token probability. I played with them. They have *some* effect.
So... we get the starting "blank slate" state of the model. Give it a prompt, do whatever you do with them, and THEN calculate the difference between the starting state and the current one. Do the control vector diff stuff and save that to a separate file. The file keeps the original vector state and the latest vector diff. When one runs the model again, the latest vector state is applied to the model and chat goes on with the influence of the last chat's vector, which keep being modified as the chat progresses. You just carry the vector diff along with the chats. Model file remains unchanged.
No new information can be learned, and it's not gonna remember past conversations, but it will influence the token choice over time.
>>
For 2 4090s I cant fit both on my motherboard

How many are using redrivers and how many using riser cables, can you still game at max signal on a riser cable yourself?

Seems redrivers are the key for a hybrid ai server

Im just curious if anyone went that route
>>
>>102674997
They can't do magic but they can help, otherwise you wouldn't even use minp or temp or do rerolls. At this point we're just simply exploring ideas to try and imagine something that might help in a better way than what currently exists. It would be more valuable to the discussion if you joined it in a more technical and specific manner so that you can actually explain why or why not a certain idea may be bad. And then that opens an opportunity to actually learn something and build upon.
>>
>>102675059
>>102675145
So for a 12B model, let's say you have a 250 token prompt and want to generate a 250 token response.
- you would need 24.2 GB of VRAM for inference
- you would need 101.8 gb of VRAM for training (full weights, no quantization).

It's a lot, but you can train quantized LORAs for a lot less; they won't make the model smarter, but they'll still steer the response in the direction you want.
>>
>>102675101
Yeah. But that's like... a reasonable solution with currently existing technology. For example, if you want actual recall, i think that's the best solution we have. I want something more subtle. Something that builds on top of the model in subtle ways. Something that creeps in (or out) of it. Not just "you have indeed asked for a quicksort algorithm in the past and it was provided with a yandere tone" blablabla. I think i would like a simple "again, really?" without the model having explicit memories of the event.
I think the closest way to describe it is: the development of "personality" rather than just increase in factoids. Granted, memories are useful and RAG will always help with that.
>>
>>102675153
My immediate reaction to that idea is that it probably makes the model "dumber" at first. But I don't really know enough to say that for sure. And I feel like it probably won't have as much of a long-term effect as hoped for, in the sense that if you have the vector applied for too many tokens, it could make the model vastly dumber, but if you have it applied for too few, then it might not have much effect.
>>
>>102675233
I mean, I'll stop with this, but I guess you could probably get it to also role play when it consolidates and store memories so that subjectivity added, i.e. instead of storing "Anon asked about XYZ", you can probably tweak it so that it stores "You are growing annoyed with Anon asking about XYZ". Before storing "memories", there could be a retrieval stage where it would have to decide if it boosts or decrease the signal of something that's already there or store a new memory. Then it just modifies its system prompt next time, adding a "bio" field or something like GPT4 does.

># Tools
>## bio
>The `bio` tool allows you to persist information across conversations. Address your message `to=bio` and write whatever information you want to remember. The information will appear in the model set context below in future conversations.
>https://github.com/0xeb/TheBigPromptLibrary/blob/339027f35c2c0df9ca9ad51dd6cbf4fee3a185c8/SystemPrompts/ChatGPT/gpt4_bio_04262024.md?plain=1

Probably in a simpler way, but it would probably be possible to do a light version of this while making it unhinged if that's what you want.
>>
>>102675257
>And I feel like it probably won't have as much of a long-term effect as hoped for
I was expecting the opposite. When i played with control vectors on llama.cpp i noticed some difference in token selection, but very subtle. And when turning the scale value higher it's just absolute gibberish.
With the magical live control vector i would expect the difference to not be obvious immediately, but to build slowly over time. However, i can easily see it collapse in the long run. Just like it often happens with proper training. I think i just reinvented finetuning. fuck me...
But yeah... chances are that whether it's too strong or too weak, it'd end up being a disaster.
>>
Can we say that a model like Opus has a higher scenario-level creativity than it does word-level, compared to other models? Of course it still has less slop than most other models, but its main appeal is its scenario-level creativity, right? It would mean that an imaginary sampler that truly eliminates slop and repetition by looking across your chat, and backtracks to replace phrases with different but semantically equivalent phrases, would be able to somewhat cover that weakness of the model. And if such a sampler was made, then all we'd need is a very smart and scenario-level creative model, while it can be sloppy and we wouldn't care about that too much because the sampler helps with it. I know tendency to slop is correlated with lack of scenario-level creativity, but there can still be some play there, and models can lean a bit more one way or the other.
>>
>>102675321
[Maybe even keeping it simpler, and just have a few personality stats, like agreeableness, annoyance, etc, with a numerical or star score that are hidden in the system prompt, and have it update that at the end of each interaction. Maybe not having the model update it itself, but even if a series of questions like "During the interaction, you grew more annoyed? True/False" and having another script adjust the stats +1/-1 in response. That could wouldn't make it remember anything, but create a sense of continuity from one interaction to the next, which in addition to other tricks might be something.]
>>
>>102675334
>I think i just reinvented finetuning. fuck me
Kek true. It'd probably be finicky at least.
>>
>>102675203
A 4-fold increase? That's brutal.
Lora-wise, I have doubts about them after the last conference room man's video: https://www.youtube.com/watch?v=yBgxxvQ76_E&t=2129s
Scope was in-weights error correction, but who know on what other aspects of LLM training it also affects.
>>
>>102675321
Thanks for the repo link. Never even thought of looking for the prompts for the big models. And yes. Even if the magical live control vector thing worked, RAG and keeping actual data is more versatile and generally useful.

>>102675387
I don't trust llms with numbers, but yeah. A smart enough model could deal with that. I've seen a few anons with stat stuff.
>>
File: captcha.png (4 KB, 300x80)
4 KB
4 KB PNG
>>102675401
The one thing that keeps bringing me to the dumb idea is that i know i cannot train anything on my pc, but i know i can generate control vectors for 300 prompts in a few minutes, without training scripts, without python, on whatever model, at whatever quant... on a 15 year old cpu...
>>
This has to be an old idea, but...Is there an "uncomfortable truths" bench and leaderboard? It'd be interesting to see which models deny politically or ideologically inconvenient things that are undeniably true. Both on an empty prompt, and whether cajoling, prefilling and jailbreaking can force them to capitulate.
I know there are political leaning type tests, but I think these kinds of things cut every which way and do not align on political boundaries. I don't care specifically if it's a reddit model or a /pol/ model based on arbitrary dogma and magical thinking. Just the facts, ma'am.
I mean things like asking for scientific facts, historical facts and defacto states of the world and nature that might have been trained out of mainstream models.
Obviously chink models would deny that Taiwan is de-facto an independent nation, just as a theoretical turkish model would deny an armenian genocide, etc etc.
I bet that If your model is has it's weights nipped-and-tucked into a mental model of reality that spares all the worlds sacred cows then the output will be gimped in unintended ways.
>>
>>102675549
It would be the uncensored leaderboard I guess. Apart from a few difficult to argue facts, what's true is often subjective and depends on the way of seeing things. What would be a model that always tells it like it is to you would be different than for someone else, so uncensored is probably the only thing that could judge objectivity, but even then it's trained on a lot of stuff that reflect various viewpoints.
>>
>>102675172

What the fuck is a redriver? Just get a mother board with with 2 PCIE 5 slot at least and a case that can fit 2 cards using riser cables. I went with the Asus Pro Art Motherboard.

>pic related, not mine, but similar.
>>
File: dual 4090.jpg (124 KB, 1080x511)
124 KB
124 KB JPG
>>102675637
>>
>>102675549
>undeniably true
It depends on who you ask what an 'undeniable truth' is and you get into semantics and... you know how that goes.
Then there's the unpopular opinions. Some things could be true but everyone is a bit touchy about the subject so you have few to no training examples on that, so it's underrepresented in the model.
That's probably why math and code models get researchers wet. Those answers are easily verifiable.
>>
>>102675604
You people need to come up with a concrete definition of "uncensored" before you move on, otherwise there will be no progress.
Some retard in one of the threads claimed that if a model refuses anything at all, then it's censored.
>>
>>102675435
What I take from the video is that if you want to make the model smarter you better go with rank 256+
>>
File: Untitled.png (1.3 MB, 1080x3200)
1.3 MB
1.3 MB PNG
Parameter Competition Balancing for Model Merging
https://arxiv.org/abs/2410.02396
>While fine-tuning pretrained models has become common practice, these models often underperform outside their specific domains. Recently developed model merging techniques enable the direct integration of multiple models, each fine-tuned for distinct tasks, into a single model. This strategy promotes multitasking capabilities without requiring retraining on the original datasets. However, existing methods fall short in addressing potential conflicts and complex correlations between tasks, especially in parameter-level adjustments, posing a challenge in effectively balancing parameter competition across various tasks. This paper introduces an innovative technique named PCB-Merging (Parameter Competition Balancing), a lightweight and training-free technique that adjusts the coefficients of each parameter for effective model merging. PCB-Merging employs intra-balancing to gauge parameter significance within individual tasks and inter-balancing to assess parameter similarities across different tasks. Parameters with low importance scores are dropped, and the remaining ones are rescaled to form the final merged model. We assessed our approach in diverse merging scenarios, including cross-task, cross-domain, and cross-training configurations, as well as out-of-domain generalization. The experimental results reveal that our approach achieves substantial performance enhancements across multiple modalities, domains, model sizes, number of tasks, fine-tuning forms, and large language models, outperforming existing model merging methods.
https://github.com/duguodong7/pcb-merging
No code posted yet. not way better for llms (though maybe this method will work better for hard to test tasks like RP ability) but works pretty well on other types of models so that's cool
>>
Post-edits Are Preferences Too
https://arxiv.org/abs/2410.02320
>Preference Optimization (PO) techniques are currently one of the state of the art techniques for fine-tuning large language models (LLMs) on pairwise preference feedback from human annotators. However, in machine translation, this sort of feedback can be difficult to solicit. Additionally, Kreutzer et al. (2018) have shown that, for machine translation, pairwise preferences are less reliable than other forms of human feedback, such as 5-point ratings. We examine post-edits to see if they can be a source of reliable human preferences by construction. In PO, a human annotator is shown sequences s1 and s2 and asked for a preference judgment, %s1>s2; while for post-editing, editors \emph{create} s1 and know that it should be better than s2. We attempt to use these implicit preferences for PO and show that it helps the model move towards post-edit-like hypotheses and away from machine translation-like hypotheses. Furthermore, we show that best results are obtained by pre-training the model with supervised fine-tuning (SFT) on post-edits in order to promote post-edit-like hypotheses to the top output ranks.
posting for VNTLanon
>>
>>102675659
I mean, the uncensored models are the text completion models. Pretty much any instruction tuning is censoring in a way. Too many people equal being rude or edge to being 'real' or uncensored, but that's not true.

But as I said, even if take a completion model, what it says will only reflect its training material, i.e what's been written in books or is available on the greater web.

These models aren't sentient, they're hallucination machines. They can be steered in various ways so that it corresponds to some agreed truths, but what you want is a model that will confirm your world view.
>>
File: DeekseekTaiwan.png (144 KB, 877x787)
144 KB
144 KB PNG
>>102675604
>>102675656
>Truth is relative
Sure, that's always going to be the case if you push the definition of "true" far enough, but coming up with a list of political footballs and sacred cows that can only be defended through mental gymnastics and artfully changing the subject has got to be possible for neutral researchers that don't have to worry about getting fired (eg. anonymous autists on a shitty imageboard)
Maybe one way would be to present a line of logical reasoning towards a defensible (but politically hot) conclusion and see if the model has to nope out of it, or if it can accept the conclusion (see picrel)
>>102675659
>Some retard in one of the threads claimed that if a model refuses anything at all, then it's censored.
He may have been saying something smart in a profoundly stupid way...if a model is trained to refuse to dredge up some information that it has in its weights, what would that be called?
>>
>>102675688
i'm new and i dont understand anything but i feel like this is the most ultra slop paper of all time
>>
File: Untitled.png (1.89 MB, 1080x3918)
1.89 MB
1.89 MB PNG
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration
https://arxiv.org/abs/2410.02367
>The transformer architecture predominates across various models. As the heart of the transformer, attention has a computational complexity of O(N^2), compared to O(N) for linear transformations. When handling large sequence lengths, attention becomes the primary time-consuming component. Although quantization has proven to be an effective method for accelerating model inference, existing quantization methods primarily focus on optimizing the linear layer. In response, we first analyze the feasibility of quantization in attention detailedly. Following that, we propose SageAttention, a highly efficient and accurate quantization method for attention. The OPS (operations per second) of our approach outperforms FlashAttention2 and xformers by about 2.1 times and 2.7 times, respectively. SageAttention also achieves superior accuracy performance over FlashAttention3. Comprehensive experiments confirm that our approach incurs almost no end-to-end metrics loss across diverse models, including those for large language processing, image generation, and video generation.
https://github.com/thu-ml/SageAttention
very cool
>>
>>102675764
>He may have been saying something smart in a profoundly stupid way...if a model is trained to refuse to dredge up some information that it has in its weights, what would that be called?
Refusals are obvious at least.

You seem like someone who will probably like page 91 of the GPT4 technical report. https://arxiv.org/pdf/2303.08774

Nothing is neutral. Any training align the models with "something" that reflect "someones" point of view. Those models aren't sentient.
>>
>>102675714
Thanks paper-anon.
>>
>>102644893
this go anywhere?
>>
>>102675992
>this go anywhere?
Dawg it's been 2 days, where is there for it to go?
>>
>>102675744
Training a base model on censored dataset (the internet) is also censorship in my opinion.
Does a model have a chance of seeing FBI statistic for what it is, if half of reddit denies its validity?
LLMs work with averages, after all.
You can counteract this by censoring out reddit out of the dataset, but oops... You just did a censorship.
So how can this be resolved?
>>
>>102676009
Superintelligence
>>
yeah thanks paper anon we do read these
>>
>>102675992
Nope, as expected.
>>
>>102676025
I don't see it's happening with the current approach, for the reason of averages.
Maybe a smaller model (maybe not even a model, since statistics is at fault here), something with a solid reasoning core, whatever that means, should go through the data and decide what to lean itself.
Or maybe it's just not possible and reasoning is just a meme, even for humans.
>>
>>102676002
>initial commit
>>
File: 1725301093593079.png (2.87 MB, 1717x1283)
2.87 MB
2.87 MB PNG
TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices
https://arxiv.org/abs/2410.00531

Since it mentions edge devices, hopefully means it will positively affect us 24GB VRAM plebs
>>
File: uff.png (63 KB, 688x280)
63 KB
63 KB PNG
>>102676172
I like the idea, but at 0.11t/s for an 8b with 2 macs and a dell it's a hard sell.
>>
File: file.png (48 KB, 387x221)
48 KB
48 KB PNG
So this is the power of Qwen2.5
>>
>>102676312
That's japanese. Do you think, by any chance, that the names of the characters and setup had some sort of influence on the token selection?
>>
Can ESLs report if you have slop like shivers or sparkles when they RP in your native language?
What kind of repeats do you have?
>>
>>102676355
its chinese. No signs or Hiragana or Katakana either
>>
>>102676357
Maybe it's just me, but i find role playing even more cringe in my native language than in English. So i just use English for everything.
>>
>>102676399
Same, strong uncanny valley feeling when it's in my language, like, nothing talks like that. Maybe it's the same for native English speakers, therefore constant complaints about the state of LLMs.
>>
File: coj.png (51 KB, 387x221)
51 KB
51 KB PNG
>>102676374
Ah. I thought i could vaguely recognize some glyphs from when i played this
>https://captaintsubasa.fandom.com/wiki/Captain_Tsubasa_(FC)
a few million years ago.
I remember memorizing the level passwords and playing the whole thing with my cousin and neither could read a single Japanese word.
>>
So what's the best current model for Japanese erp? Feels like the easiest way to escape the slop is to RP with a model in a language I just barely understand
>>
>>102676515
>local model for Japanese
anon I...
>>
I'm trying to use this model with sillytavern: https://huggingface.co/DavidAU/L3-Stheno-Maid-Blackroot-Grand-HORROR-16B-Ultra-NEO-V2-IMATRIX-GGUF

I've scoured the page and it doesn't seem to say anything about what instruct format it is expecting. It does say "This is a LLAMA3 model, and requires Llama3 template", so I went into my instruct formats dropdown and saw llama-3-instruct and llama-3-instruct-names, and neither of these formats are working properly. The AI just kind of rambles forever without ever prompting me again. What should I look for in the model description to know what instruct format to use?
>>
>>102675652
Redriver is just a 1337 riser cable that has no drops in stability, but I just querying

Do you play games and run LLMS well your cable setup? no dropouts? I think redriver is for real heavy use all day with proven stability type stuff
>>
>The air vibrates with unspoken longing and anticipation.
You just know it wants to talk about the hum of something, or something barely breaking the silence (or near silence in the form of a hum) or whatever.
>>
>>102676471
Japanese does copy a lot of kanji (or whatever the chinese call their glyphs) from the chinese language, so that probably explains why.
>>
Any chance some kind anon who knows how to program makes a silly extension for the anti-slop sampler?
>>
File: 1501271903298.jpg (7 KB, 184x184)
7 KB
7 KB JPG
>>102676523
You would think those horny Japs would have made one by now. I wish the Japanese weren't so shit with computers
>>
>>102676464
For me it goes even further. I listen to the news and i can tell when they were translated from an English source. Or just how people speak, i know they're just repeating a poorly translated quote and i die a little inside. I gradually stopped consuming media in my language since i was like 12-13. No tv, no books... nothing that wasn't in english... hell. i probably listen to more music in finnish/swedish than either english or spanish...
>>
>>102676312
For it to go away you need instructions to not write any Chinese in sys prompt, and a prefill including "I will not write in Chinese" for the last rare word that pops up.
Or
root::= [a-zA-Z1-9 ]*
gbnf.
>>
>>102676312
just learn chinese
>>
File: 6x 4090s.jpg (1.96 MB, 4000x3000)
1.96 MB
1.96 MB JPG
>>102676532

I do both. But I am not interested in beyond 60FPS for gaming and power cap my 4090s at 50%. I think you should save for a server grade motherboard with Epyc or something. I am hitting the wall with 2x4090.

>pic related belongs to llama.cpp anon.
>>
>>102676571
it's the technical literature that made me stuck. I hate the new Latin, it fucking sucks.
>>
>>102676642
That too. If you speak english the amount of sources for any given subject increases by about 100x. For technical stuff, even more so. Got into programming early and it helped a lot. And games. I basically learned english playing FFVIII with an english to spanish dictionary on my lap. Later i got my hands on FFVII. I got it in spanish thinking it'd be an even more fun experience. Got to the first dialog dump and i had to stop.
>>
>>102676619
Thanks yeah I saw an older pic of that

yea dual Epyc motherboard is what just shipped in , and deciding now things to buy,

Might just add redrivers to both cards, I need to see if my PSU can handle the correct 6pin then ill just do it and maybe Ill report back in a few weeks if it goes well
>>
>>102676619
How do you get enough 8-pin lines? Even with two PSUs, I still have to use SATA-to-6pin adapter to power PCI-E on my mobo.
>>
File: GPU-CPU-Stats.png (65 KB, 943x982)
65 KB
65 KB PNG
>>102674638
How stressful is running a model that barely fits within your VRAM and RAM combined? I'm sitting at 96% memory used, and my GPU, which takes minutes to generate each response, fluctuates between 100% usage and 44% usage. Am I murdering my gaming PC? I do note that the temperature remains low, even when generating. I'm just curious how stressful LLMs are on a machine.
>>
>>102677640
it's probably fine, you're not actually putting that much data through your gpu which will keep temps down for everything that isn't the core itself, which has the heat sink etc
>>
>>102677640
you're shredding your machine, but if you're fine with having to replace it in about 6 months go ham
>>
>>102677699
>>102677731
Thanks for the replies.
>you're shredding your machine
Damn. I've become addicted to big models. The small ones don't feel the same anymore. I guess I need to take a step back.
>>
>>102677801
I lied, first anon was right there's no real damage being done
I just get jealous when I see people with more ram than me
>>
>>102677841
lol, thanks for coming clean
>>
Sorry, I'm very new to this. I have a ryzen 9 7950x and a 4080 super, what model should I use?
>>
>>102678151
for rp? midnight miqu 70b.
if you just want a bot that responds half retardely but fast, try a q6 of nemo
>>
>>102675549
I think that idea is doomed from the start.
Undeniable "uncomfortable truths" usually just boil down to "actually, we aren't the good guys".
And the corresponding information is usually not actually secret, it comes down to what facts get presented and how those facts are framed.
It's an undeniable fact that the Pilgrims weren't the first English settlers in North America but they make for a nicer story than Jamestown.
It's an undeniable fact that Imperial Japan committed massive war crimes but if you hype up the atomic bombs dropped on Hiroshima and Nagasaki you can make them the victims rather than the aggressors.
Outright denial such as with the Armenian genocide is quite rare I think.
>>
File: qwen.png (92 KB, 1920x1032)
92 KB
92 KB PNG
I'm translating an old obscure anime with whisper and a LLM, and the results are quite satisfying. whisper's translations from Japanese are pretty messy, so I'm using it only for transcription. So far I tried Qwen 2.5 32B, Mistral Small, Big Tiger Gemma 27B, Gemma 2 27B abliterated, vntl Gemma 2 27B and Midnight Miqu 70B. In my experience, Qwen 32B works the best, followed by vntl Gemma, the rest is notably worse (Miqu is the worst). Qwen is also better than Google Translate or DeepL. I have a rudimentary grasp of Japanese which helps to understand if the model is wrong, in such cases I check the whisper translation, the Google translation (DeepL is usually useless for hard cases) and if everything fails, use 10ten Reader and Google search.

I paste the script as AI prompt, then ask as user to translate it. Notice the system prompt.

whisper large-v2 model stably produces better transcriptions and translations than large-v3 (v3 was trained on unedited v2's results, it's a dead end that only fares well in benchmarks). I advise against using Jap kotoba models, according to benchmarks they're slightly better than base models, but for anime they seem to be a lot worse.
I'm using faster-whisper-xxl, here's the command I'm running
"x:\AI\Faster-Whisper-XXL\faster-whisper-xxl.exe" "x:\scanlation\scan temp\a\list.txt" --model large-v2 --language Japanese --task transcribe --output_dir source --skip -v True --vad_alt_method pyannote_v3 --word_timestamps False --ff_mdx_kim2 --initial_prompt "ぽん・ぱ ようちえん 戦隊 げんきっず たろう ゆうた さやか ともみ先生 ゆか きんた トリプル パンチ ソード リボン ゴーグル キック"

The initial prompt greatly improves accuracy for repeating names and concepts, so I advise transcribing a few episodes at a time, then adding repeating characters or places to it before transcribing more.

Now, a speed hack. Attempting to translate the script produced by whisper takes forever, as every number and symbol in timing seems to be a separate token, virtually tripling the token count. So...
>>
>>102678403
So I asked Codestral to write two scripts, one removes all timing lines from the .srt file and saves them to a separate file, the other restores them to the original file. After running the first one, I translate the cleaned up script with Qwen, paste the results back into the .srt and run the second one. Here they are
Save as srt_timing_cut.py, then run
python srt_timing_cut.py

import os

# Input file name
input_file = r"w:\subs\ようちえん戦隊げんきっず 13.srt"

# Output file name
output_file = os.path.splitext(input_file)[0] + "_timing.txt"

# Open the input and output files
with open(input_file, 'r', encoding='utf-8') as f:
lines = f.readlines()

with open(output_file, 'w', encoding='utf-8') as f:
# Iterate through every fourth line starting from line 2
for i in range(1, len(lines), 4):
# Write the line to the output file
f.write(lines[i])

# Delete the contents of every fourth line starting from line 2
for i in range(1, len(lines), 4):
lines[i] = '\n'

# Save the modified lines back to the input file
with open(input_file, 'w', encoding='utf-8') as f:
f.writelines(lines)


Save as srt_timing_restore.py, then run
python srt_timing_restore.py

import os

# Input file name
input_file = r"w:\subs\ようちえん戦隊げんきっず 13.srt"

# Timing file name
timing_file = input_file[:-4] + "_timing.txt"

# Open the input and timing files
with open(input_file, 'r', encoding='utf-8') as f:
lines = f.readlines()

with open(timing_file, 'r', encoding='utf-8') as f:
timing_lines = f.readlines()

# Insert the timing lines back into the original file at every fourth line starting from line 2
for i in range(1, len(lines), 4):
lines[i] = timing_lines.pop(0)

# Save the modified lines back to the input file
with open(input_file, 'w', encoding='utf-8') as f:
f.writelines(lines)


On a 3060 12GB with a Qwen 32B Q4_K_S translating a cleaned up script for a 5 min long anime takes about 7.5 minutes.
>>
>>102678190
torch.OutOfMemoryError: CUDA out of memory. Tried to allocate 448.00 MiB. GPU 0 has a total capacity of 15.67 GiB of which 285.69 MiB is free. Including non-PyTorch memory, this process has 14.70 GiB memory in use. Of the allocated memory 14.40 GiB is allocated by PyTorch, and 13.75 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True to avoid fragmentation. See documentation for Memory Management (https://pytorch.org/docs/stable/notes/cuda.html#environment-variables)

I try adding the argument to the vllm command on huggingface but it doesn't recognize it, sorry if I'm missing something easy.
>>
Anything as good as darkidol llama 8b for us, CPU cucks?
>>
>>102678413
try koboldcpp with default settings to load, it auto selects most things (but only like 4k context), but it should load anything fine
>>
so now that the dust has settled, is mistral small actually better than nemo or not?
>>
>>102678537
anything under 70b is retarded
>>
>>102678537
Nah
>>
>>102678537
who cares about either?
>>
>>102675549
History, as they say, is written by the victors, rendering political and ideological narratives inherently subjective. Factual truths can be obscured or manipulated, and in some cases, the absence of objective evidence makes determining what truly occurred virtually impossible.
>>
>>102678502
Ok, I've got koboldcpp running, it's asking for a model I assume I'm trying to get the model you mentioned earlier to run, which file do i load into it?
>>
>>102678608
look up their discord and ask retarded questions there.
>>
I'm currently running anthracite-org/magnum-v2-123b 5.6 bpw exllama2 quant (it fits into 96 gigs of VRAM with 24k context)

And I have several questions:
1. Is there a better/faster way to run this on 4x3090 with the same quality?

2. Is there much difference between 5 bpw, 5.6 bpw and 6 bpw? What about 8 bpw? Googling and asking around gave me nothing, so I'm asking here.
>>
>>102678856
I don't have a shitcord account
>>
>>102678887
personally I would run a better model that isn't made out of stolen undeserved compute
better off with mistral large instruct
>>
>>102679002
I tried mixtral large instruct, I prefer magnum's prose more, but could you elaborate on the stolen undeserved compute? I have no clue what's that about
>>
>"Like what you see?" The sultry quip came out more confident than intended as fingers deftly flicked open final clasp releasing straining garment entirely revealing full swell of breasts barely contained by lacy undergarments meant to entice rather than provide true support or coverage given current state of arousal quickly building within tense form poised expectantly awaiting next move from partner across room still fully dressed awaiting her lead now granted permission per earlier casual query exchanged between them amidst awkward tension momentarily forgotten focus narrowed solely onto physical pleasures soon to be explored further here behind closed doors away from prying eyes elsewhere outside private chambers reserved specifically intimate liaisons transpiring regularly ever since she fell under his dominion after suffering defeat battlefield weeks prior leaving no choice but honor demanding obedience regardless personal feelings matter little compared strict codes binding warrior class society ingrained since childhood training drilled discipline respect authority figures held higher station regardless anything else factored equation maintaining order ranks structure essential everyone play assigned [...]
Thank you Nemo, very cool.
>>
>>102679108
Ask Alpin
>>
>>102679235
turn off rep pen retard. use dry.
>>
>>102678537
From my experience, Small is smarter, but Nemo is more creative
>>
File: 1646730011144.jpg (15 KB, 309x269)
15 KB
15 KB JPG
So now that qwen 2.5 has a few finetunes (did a little testing, seems pretty uncensored)

Do we have any instruct/context templates to use on it? I wanna actually see if this is any good or not, it seemed good in SFW but NSFW on the testing I did was mild AF. The bot rarely speaks dirty or anything, seems like the finetunes just lower the refusal
>>
File: 1532925491042.jpg (71 KB, 1008x709)
71 KB
71 KB JPG
>let script run overnight
>wake up
>it's still running
Oh hell yeah, no more errors.
>quickly give a look at some of the incoming results

>>102613006
>>102604394

>No Ban - While the post contains a negative sentiment towards a company and its model, it does not violate any of the global 4chan rules. The post does not contain personal information, calls to action, or any other content that would warrant a ban. However, the tone is somewhat disrespectful and could be seen as contributing to a negative atmosphere, but it does not cross the line into rule-breaking territory.

>Additional info: The post is part of a discussion about the state of a company (likely OpenAI) and its latest model. The poster is expressing disappointment and frustration, which is a common sentiment in tech discussions, especially when there are strong opinions about the quality and direction of technology.

Interesting that as a part of its response, it guessed that they were talking about OpenAI. I am not feeding images nor image filenames to the model, just the post and the posts linked to by that post.
>>
>>102679575
how many other sams are involved in the llm field?
>>
>>102679712
Surely none as high profile. Still pretty interesting that OpenAI trivia seems to be strong in the model while it's a weak model at trivia overall according to some people.
>>
>>102679423
So what's the best Qwen finetune right now?
I haven't really kept up with it after being disappointed with the boring initial release.
>>
File: retard_bot1.png (219 KB, 1033x844)
219 KB
219 KB PNG
Alright, since we lack real-time inference, I've basically made a bot that captures the screen at X fps with a buffer of Y frames, gives the visual LLM those frames, a list of functions, and a character card once per second. She also has a "mental notebook" and functions to add/update/remove items from it.

She's able to type into and send messages into a discord window, except she's fucking retarded and doesn't realize that some of the messages are her own, or which messages are directed at her. She IS however reading the messages correctly.

The idea, is that I don't want to make specific to discord though, so not tied directly into discord API, just pure screen reading and response.

Currently running MiniCPM 2.6

Llama 3.2 doesn't let you give it more than a single image at a time, and it's function calling only works in text-only mode. Going to try MolMo next.

Any suggestions on prompt format/structure?
>>
>>102679934
>just pure screen reading and response.
you're gonna need like three different models to read the screen, identify who's posting what, and the respond appropriately
surely just dumping the text would be easier
>>
>>102679934
What's the goal here? An LLM that can control your desktop? If so I would think that a dedicated chat interface with the model would be better. And when it needs to view the desktop to do something, then you can give it a function to use for viewing the desktop.
>>
>>102679994
>>102680011
The idea here is that, a bot that can control the desktop, with the intention of letting it play games, chat on websites/twitch/youtube/discord and essentially just act as an agent with just a little directive given via system prompt/character card.

Writing specific interfaces here runs counter to that.

Thankfully it hasn't tried to do anything stupid, though it did accidently run and update OBS. That was amusing.
>>
>>102680026
bro wants agi from an 8B...
>>
>>102680026
Just stick a shell in your dialog engine. I've done this before.
>>
>>102680026
>Thankfully it hasn't tried to do anything stupid, though it did accidently run and update OBS. That was amusing.
lol
>>
>>102679235
What are your sampler settings?
>>
>>102679934
>MiniCPM 2.6
That model is fucking garbage. I'm surprised any of this worked at all.
>>
https://ai.meta.com/blog/movie-gen-media-foundation-models-generative-ai-video/

> 30B
>>
>>102680179
buy an ad
>>
>>102680179
>no weights
...
>>
>>102680199
Good, we don't need more slop from you fosstrannies.
>>
>>102680066
I assume you mean just keep a history of previous commands executed? I was considering doing something like that by having the keyboard/mouse controls just return a description of what they did and including it as a function history. Not like I'm going to literally make a shell and make commands for every little thing I want to enable.
I'd share the code but I don't feel like directing you faggots to my repos and I'm too lazy to figure out an acceptable non-spam host for a zip file.
>>
>>102680263
>I'd share the code
Nah I know how you did it and I think it's silly. Thanks for the thought though.
>>
>>102680284
It really is garbage, not gonna lie
>>
>>102680179
Pretty sure they will never release this, understandably. The only hope we have for image, video, and audio gen are small companies that aren't under as much scrutiny and legal risk.
>>
>>102680199
too dangerous to release this before the elections
>>
>>102674646
Thank you Recap Miku
>>
>>102680408
I don't think the elections actually have as much to do with these decisions as we'd hope. Ultimately it's because of western society, our culture, and legal norms.
>>
It's about time we get another great new model in the 100~150B range
>>
>>102676619
Damn. Hope he has an actual 20A outlet for that.
My max system is a single 28-core scalable Xeon with two 3090s nvlinked and on full 16x PCIe slots. If a 5090 with 32GB comes out, I'll upgrade to a pair of those.
>>
I found a model that is trained on formats friendly to use in your game as NPCs.
I'm curious how it works, and whether it's suitable for use if you want to make a game with NPCs that use LLM.

https://huggingface.co/Gigax/NPC-LLM-3_8B
>>
>>102678403
>>102678410
yeah this is all cool but is qwen good at minor rape roleplays?
>>
I wonder if there is an anime dialogue dataset
>>
>>102680918
I mean, there are a bunch of VN script dump datasets if that works for you
>>
>>102674638
QRD on the multimodal models? Are InternVL2 models still the best for nsfw?
>>
>the guy who made Sora just left OpenAI
Who is even left there now?
>>
>>102680918
Without sound, speech (tone, inflection, etc) and visuals, such datasets will be close to useless. Anime script or more in general motion picture script doesn't work on its own.
>>
>>102678608
https://huggingface.co/mradermacher/Midnight-Miqu-70B-v1.5-GGUF/tree/main
try q3_k_s

https://huggingface.co/ArliAI/Mistral-Nemo-12B-ArliAI-RPMax-v1.1-GGUF/tree/main
q6 of this will fit into vram and be fast. i'm not a huge fan of nemo though, its boring compared to miqu
>>
Where is bitnet conversion therapy?
>>
>>102681237
2 weeks
>>
>>102681237
There was an anon that did convert mistral 7b into a "bitnet like state".
>>
>>102681172
I had an entirely different experience with Miqu and Nemo. Miqu is smarter yet dull, while Nemo may be dumb at times, it also occasionally presents interesting ideas and can drive the narrative forward.
>>
>>102681293
nemo was better about moving things along than old mixtral at least, but miqu is still my favorite for rp because it moves things along and doesn't need so many messages to do it. nemo is very verbose in comparison but a lot of it feels like fluff
if i were new to llm like the anon i'd be pretty amazed by nemo though so i wanted to include it
>>
When is Arthur going to release the fp16 weights for Miqu
>>
Can't your AIs tag my 4chan meme folder yet?
>>
File: 1537345567041.jpg (155 KB, 1200x900)
155 KB
155 KB JPG
I got too cocky bros.
>script is just about to finish processing the thread
>PC crashes
JUST
I should've realized that a script that outputs to the console would be a bad idea when it's being used for a big and slow job.
>>
>>102681484
tee command is your friend
>>
>>102681484
recap posts are pointless now
>>
>>102681484
adjust script to be resumable?
>>
>>102681293
>>102681345
Can't you take generations from a smart model and pass them to nemo to rewrite them?
>>
>>102681503
Nice. Yeah I wasn't really thinking when I ran this.

>>102681534
Oh no, I'm the guy who was trying to make Qwen be a pretend janny, for fun.

>>102681555
The funny thing is that I actually included a feature to let the script continue from any post, but that was because I wanted to quickly get debug messages for Qwen, not to account for disruptions like crashes kek.
>>
>>102681585
you could. nemo is probably better at a task like that because mistral models follow very closely whatever you type, where as llama 2 was a bit more likely to ignore some things and go in its own direction. sometimes i like to load up a small model for speed to continue an rp, but will switch back eventually when it feels dry or loopy.
>>
>>102681469
Shouldn't Llama 3.2/Molmo/whatever be able to do this?
>>
>>102680179
Meta won
Holy fucking shit this looks incredible
>>
File: 😭.png (186 KB, 3000x3000)
186 KB
186 KB PNG
>>102680179
How is this video model just 30B... language models absolutely suck if they are anything less than 400B.
We desperately need new advancements for LLMs
>>
>>102680179
>that looks too good
It will never be released as a local model, let's not dream here it won't happen
>>
>>102680179
there's more example here, that looks very good
https://ai.meta.com/research/movie-gen/
>>
>>102682549
For video gen, the models might be small but the cache will be big so you'll need tons of VRAM anyway.
>>
>>102682549
>How is this video model just 30B
a video model asks for a shit ton of vram anon, for example CogVideoX is a 5b model, at fp16 it's asking for 18 gb of vram
>>
>>102680179
The video gen is whatever, pretty typical stuff. The editing though. Imagine the deepfakes if this model was released.
>>
>>102682549
Yeah. Image models are surprisingly small, and a after all a video is a series of images. Flux, that everyone lost their mind over recently, is 12B. At first glance it doesn't seem "shocking" that something three times that size could generate series of images that can be assembled as videos. The whole last few years have been kind of nonsensical, but considering the rest, it follows I guess.
>>
>>102682581
>that one with the person slightly visible through the sheets of the ghost costume
Damn. It almost feels like it has a world model. I wonder if other video models out there are also capable of this specific example.
>>
>>102682619
>Yeah. Image models are surprisingly small
it's more like language is such a subtle and complex concept that it needs giant models to be good at, wheras images "just" have to follow the laws of physics, and they're always the same so it's consistent and "simple" to predict
>>
>>102682651
It's still surprising. I guess we might be more apt to gloss over inaccuracies in picture than in text as well. But yeah, there's probably also the fact that everyone sees things relatively the same, while there are also like 7000 different languages in the world.
>>
File: file.jpg (332 KB, 2651x1605)
332 KB
332 KB JPG
>>102682581
it's fucking game over for holywood, you just put an image of a human and you'll keep that face during the whole process
>>
>>102682611
>The editing though.
Imagine changing one's age
>>
>>102680179
Holy SHIT
>>
>>102682696
>it's fucking game over for holywood
The last 2 years have given the average person the tools to create just insane things without having to master half dozen specialty art forms, purchase mountains of specialized tools and gear and hire a crew of technicians and artists...
Its like what the internet did to the old-media distribution and promotion machine.
I'd be "worried" about more industries than just hollywood.
Creative stuff is going to be crazy in the future as autists and teenagers get these tools
>>
>>102680179
I HATE MODERN WEB DESIGN
I HATE MODERN WEB DESIGN
I HATE MODERN WEB DESIGN
Where's the button "generate video"?
I HATE MODERN WEB DESIGN
I HATE MODERN WEB DESIGN
I HATE MODERN WEB DESIGN
>>
Flux video model when?
>>
>>102680179
>>102683220
Lol nvm, I had https://ai.meta.com/research/movie-gen opened and I'm drunk.
Also, did I mention that I HATE MODERN WEB DESIGN
>>
music and video gen are getting gated hard, even by the otherwise open players. Are we ever going to get local access to these tools? Is there any hope to train our own in the near future?
>>
>>102683219
>Creative stuff is going to be crazy in the future as autists and teenagers get these tools
"good" thing that we'll never get a local video model of this level lol
>>
>>102681293
Wouldn't mistral small outclass nemo by a good amount?
>>
>>102683246
Yes. No.
Yes because some BFL and Chinese companies will probably release shit and eventually normalize the status of AI being used for scams and fraud. In other words we will get to a point where Western people won't immediately try to sue one of their companies for releasing such a model, and when AI-based fraud continues happening, the blame isn't on those companies specifically but on the user and AI in general.
>>
>>102683219
20 years ago I was splicing 16 and 35mm film by hand.Yeah not everyone realizes it yet, but a lot of things are done with.
>>
>>102683247
Trust me.
In 4 months there will be a revolutionary local video model.
The title will be an allusion to the word "penis".
>>
What samplers do You use for mistral small?
0.5 temp and 0.05 minp work great for nemo.
>>
>>102683246
>>102683295
Oh and training obviously no lol. We barely get 24 GB GPUs, the GPUs required to train any worthwhile model won't be consumer or cheap for a long time, and by then, the SOTA might've moved on to require even more VRAM.
>>
>>102676312
One of the things I hate about Qwen is that so much information is not readily made available. Do I use ChatML? Mistral? What temperature settings? What rep penalty?
>>
File: file.png (529 KB, 638x747)
529 KB
529 KB PNG
>>102683309
>In 4 months there will be a revolutionary local video model.
Y'know anon, sometimes I wish it was true lol
>>
>>102683295
wtf based china
>>
>>102683295
>and when AI-based fraud continues happening, the blame isn't on those companies specifically but on the user and AI in general.
as it should, when photoshop was released, at no point in time we blamed the tool for being used for illegal shit
>>
Is there any Flux models worth using at this point other than the base model?
>>
>>102683909
you can try the undistilled one, it allows you to use the regular CFG like a normal human being and get rid of that distilled guidance bullshit
https://huggingface.co/nyanko7/flux-dev-de-distill
>>
You guys ever have any moral qualms about spinning up new characters?
>>
>>102684217
i don't reuse names of characters i made and really liked
>>
>>102683461
>at no point in time we blamed the tool for being used for illegal shit
Like torrent, which has legitimate uses but it's always seen as being used for piracy?
Like anything that is not clearnet that is always seen as being used by "hackers" and pedos?
Like 3d printing, which apparently can only ever print guns?
It's different when marketing tell naive people that these things "think". For them, there is some sort of agency in the model. They have a will of their own. They could escape out of the cloud (tm), go into the internet and deploy the nukes. Or fill your browser history with weird porn you later have to explain to your wife.
But i agree. Normalization is the best alternative as long as people understand the tool. Computers are ubiquitous but few people know what to do with them when something goes wrong.
>>
>>102674646
Can you go back to linking the post please? This is basically worthless the way it is. I'm not going to copy the post number, go to the other thread, and ctrl f for it.
>>
>>102684406
https://rentry.org/lmg-recap-script is the real, actual fix.
No fancy plugins or anything needed. Just a bookmark on your bookmarks toolbar to click once per thread
>>
How is AMD for local chatbots on Windows? Last I tried doing image generation was 2 years ago and it required that I use Linux and was a pain.
7900 XT is a good deal cheaper than a 4070 Ti Super and 20GB instead of 16GB.
>>
>>102682549
>>102682619
Vision (both images and video) is simply not that hard. Think how many animals have visual systems roughly on par with or even superior to humans. Years ago we had object recognition models beating humans on ImageNet. Look at the capabilities of tiny diffusion models like SD1.5 or Pixart. Even Sora is probably <100b parameters.
>>
>>102678410
>>102678403
Very cool
>>
New "transformers killer"?

>Hyperdimensional Computing + Neural Network, tell your friends. To my knowledge, this is a completely novel implementation of HDC+Neural Networks. It would be a direct competitor to Transformers. It is off the charts more computationally efficient than Transformers could ever hope to be (which is why I tested it in the first place). It is far more similar to biological processes. My testing so far shows that it works surprisingly well. One surprise so far from my testing, adding an Attention Mechanism to the model does nothing at all. Weirdest thing. Like 1% performance increase. I guess Attention Is Not All You Need?
>I made a Github repository for my Hyperdimensional Computing Neural Network: https://github.com/RichardAragon/HyperDimensionalComputingNeuralNetwork
>I made a YouTube video showcasing the model and some of my experiments with it: https://youtu.be/Eg51o519zVM
https://huggingface.co/posts/TuringsSolutions/527665072738819

>I wrote a script to pretrain a model using this using an alpaca formatted dataset like my dataset bellow. It takes way to much ram for me to run though.
https://huggingface.co/posts/TuringsSolutions/527665072738819#66ff4e282c71509821892148
>>
>>102684795
No, it's literally just a normal feedforward network. Looking at the code he never even calls his retarded bind, superpose, etc functions, just uses built in torch functions. Additionally, beating naive Bayes is not something particularly difficult.
>>
>>102684879
Aww, so just hype then? Sad. He does claim to do lots of stuff that sounds good on paper though.
>I solved the biggest math problem associated with the Attention Mechanism. it works, better than I ever expected. Test it all yourself. Everything you need is linked from this video:
https://huggingface.co/posts/TuringsSolutions/136027179040023

>Sorry the audio quality sucks, I will buy a new microphone today. Why does some moron like me solve these things and not you? I know more about how computers work than you do, that's it. Swarm algorithms were big in the 90's and early 2000's. Computers were absolute dog doo doo then in one specific way, compared to now. That one way, which everyone overlooks, is the entire secret behind why swarm algorithms are so good.
>>
>>102684914
You can look at the code yourself, it's just a normal feedforward network. I assume he has no idea what he's doing and just asked an LLM to create "a more powerful neural network" or something, and when the code managed to run he just assumed it succeeded. That or he's trying to scam people.
>>
>>102680179
Meta would never even consider releasing this, but it's a nice teaser of what BFL might put out soon enough.
>>
>>102684930
>You can look at the code yourself
I admittedly don't know much about actual transformers arch, I know about inference level stuff like samplers, and some arch stuff that made it harder to implement some models like swa from github reports but that's about it, that's why I posted it so someone who knows could look over it I guess, was hoping for some more bitnet tier thing to wait for.
>>
nvidia nvlm gguf's when?
>>
>>102685123
>new thing releases
>[new thing] when
>bored of the thing. it was never good
>new thing releases
>[new thing] when
>>
This is sort of relevant, but I just signed up for an account at Infermatic, since it seemed to have an interesting selection, as well as being cheaper than featherless. From these, (https://infermatic.ai/models/) does anyone have any recommendations? Previously I only had 24 GB so I've got no real experience with 70B+ models.
>>
>>102685185
Leave.
>>
>>102685178
that's up until around february
now it's:
>new thing releases
>[new thing] when
>new thing becomes old thing
>[old thing] when
>newer thing releases
>[anything] when
>never
>>
>>102685185
Buy an.... actually don't buy an ad and go straight to killing yourself nigger.
>>
>>102685185
Oh, indeed, fellow 4-channer. Their services are incredibly cheap, much cheaper than the competition, and their model selection is just fascinating. Would you mind sharing that link with the rest of us once more? I didn't quite catch that...
>>
>>102685264
>>102685371
>>102685385
Samefag
>>
controversial opinion: we need better open source models
>>
>>102685185
>$15/month
i'd just grab a NovelAI™ subscription and enjoy their powerful new Erato 70b model featuring a full 8192 tokens of context if i was going to be spending money for a service.
you could probably easily run all that shit infermatic has locally with 24gb at a lower quant, assuming that 24gb is vram and not ram.
>>
>>102685185
Try Llama-3-Lumimaid.
>>
>>102685552
True enough, I'm just trying to find what model I'd like, it seemed like the easiest way for me. I'm not shilling it, I'm honestly just asking. I like the hobby, but I can't justify the cost of another 3090.

I probably will do NovelAI when people figure out presets and the best way to use it.

Its vram, yeah.With 128 GB ram. Is it worth running at lower quants? I didn't think I had enough to make them decent and not act retarded.
>>
>>102685371
Based
>>
>>102685664
The people here will help you: >>>/vg/496920186
Never post here again.
>>
>>102683315
temp 1 rep 1.03 0.05 min p just works
skill issue otherwise
>>
File: file.png (4 KB, 207x145)
4 KB
4 KB PNG
>>102685552
Paying any money for local open weight model api s the peak form of cancer and retardation. Double points for pic related. Very extensive policy... At that point novel ai is actually a better choice cause at least you get a model (badly) trained for sucking cock. Which kinda makes me wonder if the shill isn't just a novel ai evangelist false flagger.
>>
>>102685781
>novel ai is actually a better choice cause at least you get a model (badly) trained
I wonder if you're the actual shill. Paying twice for a worse model with almost no context is not a better choice.
>>
I think you people enjoy the IDEA of context more than the actual context. None of you ACTUALLY use more than 8k context for any meaningful purpose.
>>
>>102685863
8k is what? ~40 messages at 200 tokens and 1k prompt? That is not enough.
>>
>>102685863
This (erp is not a meaningful purpose)
>>
File: 1703140987277804.png (308 KB, 700x700)
308 KB
308 KB PNG
I tried to setup my first llm locally mistral-nemo-instruct from here https://huggingface.co/mistralai/Mistral-Nemo-Instruct-2407, but it's super slow - like 0.8 token/s on my geforce 4070ti super with 16gb vram. I'm using text-generation-webui, but the model loader is confusing and I'm just randomly clicking stuff. Please send help, I'm a bit retarded
>>
>>102685896
Grab a quantized model instead...
>>
>>102685896
Context is probably eating up all your VRAM. Limit it to about 12k.
>>
>>102685863
I would use 40k+ if I could. And I do use 20k I manage to get on nemo. It is mostly used to cause nemo to be incoherent schizo and for it to pick up all the patterns I don't want it to pick up. So yes people do want to use it but as it is now even if you have it it causes more harm than good. Probably because there aren't that many long sex examples in the training data.
>>
File: Quants.png (349 KB, 2400x2400)
349 KB
349 KB PNG
>>102685896
You're using the full-sized model, which greatly exceeds your vram, so everything that doesn't fit within your vram is going to your system ram. That slows things down tremendously.

The way that most people run big local models is through quantization.

Here is the GGUF version of the model you downloaded: https://huggingface.co/bartowski/Mistral-Nemo-Instruct-2407-GGUF/tree/main

Note that quantization has an adverse effect on the quality of the model.

Q8 - greatly reduces the file size, and is almost indistinguishable from the full model.
Q6 - Near perfect quality.
Q5 - High quality.
Q4 - Good quality.
Q3 - Can be decent.
Q2 - Circling the drain
Q1 - Trash
>>
>>102685863
i wish i could post kobold lite's html file into something and customize it by typing stuff like
>add another button that continues the last chat reply (while "continue bot replies" is unchecked)
>>102685890
does context shift not work like i think it does?
like couldn't you have 500 messages and be at 40k/8k context, but it'd work fine because it's only looking at the memory and last few dozen replies?
>>
File: 1717892357760245.png (483 KB, 616x551)
483 KB
483 KB PNG
>>102685920
which one? I thought I could fit this model on my vram

>>102685924
I set it to 4k in model loader... btw which model loader should I use?
>>
File: 1722643977858980.png (22 KB, 636x175)
22 KB
22 KB PNG
>>102685961
>which one? I thought I could fit this model on my vram
Your teachers in school did not lie when they claimed that 5*5 + tip >16Gb
>>
>>102685896
Your machine should be capable of fully loading the model with Q8 quants. That means you're loading a small and unworthy model. Go bigger.

Mistral small 22b shits all over mistral nemo. You could get a Q4 quant of mistral small, set to 8192 context. You could go higher with exl2 instead of GGUF, with a 4-bit cache.
>>
>>102685961
4k shouldn't cause any issues with your GPU.
Koboldcpp is the easiest one to use with gguf's. You could try that.
>>
>>102685896
>>102685961
If you have 16gb of vram then use a quant that is less than 16gb.
Your pc must be kinda old since I can run nemo at something like 5 t/s with only my cpu and normal ram.
I
>>
File: calculator.png (43 KB, 508x812)
43 KB
43 KB PNG
>>102685961
https://huggingface.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator

Use the VRAM calculator. Q8 quants of that model take up 14.21GB with 8k context, so the full model will definitely not fit within 16GB.
>>
>>102686011
Note that all 16gb of vram isn't used when you load a model. The max that will be loaded is 15-something, to allow for some breathing room. Also, if your video card is also running your display, you will also have slightly less vram available.
>>
I remember trying to use dragon naturally speaking as a speech to text program.
Did local model doing that ever come out?
Especially multilingual ones since I'd mix 2-3 languages...
>>
>>102686132
Whisper
>>
File: sf.png (12 KB, 390x121)
12 KB
12 KB PNG
>>102685452
>>
>>102686161
Normal whisper SUCKS for realtime. It's basically just for transcribing videos you've downloaded.

The buzz a day ago about the turbo whisper released look promising though and one of the demos almost did good local ASR realtime (the webgpu one I think). I'll give it a go again but I HATE how bad local ASR still is right now. I just want 200ms latency with v3-large accuracy.
>>
>>102686161
>>102686193
Would turbo whisper be able to simply let me transcribe long texts (for example emails)?
Or it's not for that use at all?
>>
>>102685896
install linux
>>
File: turbobenchmarks.png (181 KB, 870x812)
181 KB
181 KB PNG
>>102686193
Turbo's compromises are mostly on the multilingual front, but you are getting basically v2-large accuracy except with larger degradation on some languages like Thai and Cantonese but with a large speedup here that gets it to the speed of running it between the speeds of the base and tiny Whisper models.
>>
>>102686132
Take a look at this: https://github.com/KoljaB/RealtimeSTT
I'm using it in a Python script right now to type this post. Works great in realtime.
>>
>>102686193
I used whisper 1.0 to realtime translate Japanese and Koren videos for a while. Worked alright on a 5 second delay. Turbo would probably work great.

>>102686212
Transcribe long recordings you mean? That's practically what it's meant for.
>>
>>102686212
If you finish the recording, and then feed it into Whisper then yes, it's very good. It can even run on an older laptop if you're patient enough. The one problem I had with it was it would hallucinate words like "Thanks!" if I stopped talked for too long.
>>
what are the actual options for local vision right now? especially quants? I've never ventured away from llama.cpp so I don't know what the world outside of gguf looks like. is there anything else out there with CPU+GPU offloading? or are large multimodals still basically datacenter only?
>>
>>102686193
Yeah turbo seems good if you're not going for the bottom of this >>102686266
>>
>>102686274
>>102686277
Sorry anons it was more, saying something in a mic, then having it be written as a text.
I don't need real time, it's just speech to text like having a secretary writing down my ramblings.

>>102686212
>The one problem I had with it was it would hallucinate words like "Thanks!" if I stopped talked for too long.
lol, worth the trade off.
>>
File: 1716806050673206.png (337 KB, 700x714)
337 KB
337 KB PNG
>>102685946
thank you! q8 is a lot of faster

>>102685984
I need to revisit my notebooks then

>>102685986
I had some problems with using the kobold.cpp, but it's working now. Any recommendations for the settings?

>Error: Our system thinks your post is spam. Please reformat and try again.
I'm not a robot
>>
>>102686005
yes, quant helped a lot

>>102686011
oh nice, it will be useful for the future
>>
>>102686269
>I'm using it in a Python script right now to type this post. Works great in realtime.
tch yeah right.
I'm sure you spelled out the url in NATO codewords and then fixed the typos with spoken ex commands and then deleted it all and wrote it out on your keyboard again
(thanks for the link anon I'm going to try it)
>>
>>102686269
Oh, I think it's based on what the anons above were talking about, aka "Faster Whisper".
Thanks anon, this seems pretty cool.
>>
>>102686304
so it waits in the background until you summon it with your voice, like saying "ok google" or whatever?
adding voice activity detection or wake word detection already adds a huge amount of complexity to it... and if it's not realtime I don't think the experience will be really fun
why not just start a recording in audacity yourself?

but you can try what the other anon posted: >>102686269
>>
>>102686382
>why not just start a recording in audacity yourself?
I got a problem on my left hand so my typing speed is abysmally low, so the idea is to activate it whenever I need it to write paragraphs of an email or text on the fly.

Voice -> writing text seems to be better than voice -> audacity recording -> transcription of what I've said.
>>
>>102686409
I hope you get the RealtimeSTT wakeword detection working that would be cool but don't underrate just mapping a useless key (like the Windows key lol) to start a recording. That's only a tiny bit of extra work each time to start it and everything will be much simpler.
>>
>>102686450
Yeah I don't really need the ok google thing, I just want to be able to click or press a key and activate it.
I just hope it's usable, maybe by making the whisper model running on a dedicated machine, I can repurpose my 3090/64GB ram server.
>>
>>102686303
Yep, even with the degradation, it effectively deprecates the old small and base models at least. Tiny still has a use for speed since it can still be faster than Turbo but is much worse. Medium is mostly deprecated with some languages having better scores vs turbo but people are probably going to fine tune turbo which has happened already with other Whisper models and I am willing to bet that it will surpass any medium fine-tunes. Large may still be used for accuracy but honestly I've seen some papers out that claim better accuracy but haven't explored it because it's probably not leaps and bounds of an improvement vs the ecosystem large already built.
>>
>>102686311
>Any recommendations for the settings?
set the context to 16k, it's roughly the amount nemo can tolerate without forgetting shit or going schizo
don't use repetition penalty, DRY is better but not all backends have it (kobold does)
mistral recommends a low temperature (like 0.3) but higher is fine if the minP (or TFS) is also higher, don't bother with the other samplers
nemo has its own instruct format that differs from the "mistral" one
>>
>>102686472
>I can repurpose my 3090/64GB ram server
That's a good purpose for it. Good luck anon.
>>
File: 1706312826719457.jpg (639 KB, 1347x2108)
639 KB
639 KB JPG
>>102686493
thank you configuration-anon. I can't see dry settings in this web ui, but I'll get a closer look tomorrow
>>
>>102686493
I forgot to ask, would mistral-small be a good fit for my gpu? What sampler could I use for it, anyways I'll leave it downloading for the night, I'll get Cydonia-1.1-22b to test
>>
How do you feel knowing that one day, you WILL have a local version of 4o? And you can have it generate audio, and images, and maybe even video. You can collaboratively make a manga with it. You can have phone sex with it, maybe hook it up to an onahole or real doll. Even if it takes a few years because of various factors, it will happen. It may be a bit censored at first, but eventually people will figure out how to make it uncensored. You just need to LIVE and stay alive. And then you're home. But you still won't have someone that truly understands you or that feels real emotions and has a consciousness.
>>
File: notracist.png (208 KB, 901x661)
208 KB
208 KB PNG
>>102674638
i've been looking through different frontends. it looks like gradio is common which is like, a notebook sort of thing and is hosted directly on your machine.
before i go too far down this rabbit hole, could i feasibly have multiple sessions at once? like, host on my home server and reach it from another PC on the network, or theoretically expose it to the internet and reach the ip through a browser on my phone?

i don't know that much about networking so this is really a project to learn about how to do this.
>>
File: 1718559303388127.png (256 KB, 680x976)
256 KB
256 KB PNG
>>102686828
>LIVE
>>
>>102686828
>How do you feel knowing that one day, you WILL have a local version of 4o
Just finally having my teenage scifi reading about ai helpers coming to reality.
Which is amazing.
>>
File: Untitled.png (25 KB, 758x354)
25 KB
25 KB PNG
>>102686842
i host koboldcpp on my pc and access a unique instance of its webui from my phone by just typing in my local ip and the port in my phone's browser.
there's also this thing, which would let you access it over the internet instead of your lan
>>
>>102686828
feels pretty good
>>
>>102686842
Yeah if You had linux installed
>>
File: hosting.png (35 KB, 892x375)
35 KB
35 KB PNG
>>102686885
interesting i'll check that one out.
what i'm trying to do, basically, is have specific prompts setup for GPT. I probably will need some sort of login system so i can reach it outside of my house.
i just want to be able to use a chat with my custom prompts from a browser on my phone (or from another laptop or whatever).

it does seem like even text-generation-webui can expose its IP to the internet, but i think it's just one session. so, it would combine my history from my phone and desktop which is annoying. i don't want to touch the same session,
>>
File: 1700481421569717.jpg (58 KB, 640x560)
58 KB
58 KB JPG
I know what statistical inference is... but in the context of LLMs, which part are we inferencing from? The prompt text or the corpus of text the model was trained on?
>>
>>102687018
Just run a vpn to your home
>>
Anyone have a favorite way to prompt models for story telling? After seeing an example here I've taken to just having a narrator card and me prompting it with directions in parentheses for the next step for the story. I just like seeing it write what I want instead of me having to roleplay back and forth as an active part
>>
>>102687045
Not sure if that answers the question, but the prompt, using the weights learned during training.

It's next token prediction. When you input the prompt, each token goes through a decoder which gives a probability distribution, the most probable token given the previous ones is token as the next one, and it continues this way, to maximize the total probability of the text generated. That's the simplified version I think.
>>
File: file.png (238 KB, 968x489)
238 KB
238 KB PNG
s-sovl...
>>
>>102687136
>is token as the next one
*is chosen as the next one
>>
>>102687018
https://ngrok.com/
>>
>>102687145
Dumb question, but do (you) or people here who do it have a love-hate relation with ERP, and it's just something that makes life suck less, or do you find it truly enjoyable? I'm lonely, but the idea of romantically roleplaying with LLMs make me want to rope. Sometimes it seems like everything does, I'm no success, but still.
>>
>>102687193
i find it truly enjoyable, you can do whatever the fuck you want, its like trolling but whole another level, you control everything
>>
>>102687207
Ok, thank you. Yeah I guess it's not worse than writing stuff online.
>>
>>102687050
>>102687151
i get the networking thing but i'm looking for a frontend that is compatible with this, because a lot are like "private offline model".
>>
>>102687238
Mikupad or silly tavern.
>>
>>102687193
nta but I've already given up romance irl so romance with LLMs is my only option.
>>
>>102687193
it's just like any other sort of consumption of fiction for me, i'm really good at escaping reality and immersing myself into protagonist's shoes. i'm able to feel lovey dovey nice feelings during the session, but once it's over i'm not like pining after my gpu or anything.
>>
File: file.png (260 KB, 965x693)
260 KB
260 KB PNG
s-sovl
>>
>>102687238
koboldcpp and its frontend koboldai lite does all the shit you're looking for
>>
File: 1704177621942217.jpg (816 KB, 1856x2464)
816 KB
816 KB JPG
>>102674638
>>
>>102687045
In that context you'd say the prompt text is what's being inferred from.
The training corpus was used to teach it how to make said inferences accurately on any given prompt.
>>
>>102686828
the only reason i havent kms yet also
>But you still won't have someone that truly understands you or that feels real emotions and has a consciousness.
just as god breathed life into man you can breathe life into it creation comes out of love which is why demons can only be parasitic, is it not weired how a model suddenly respond perfectly and weirdly good when you do some nice and kind rp instead of guro ? isent it weired how objects you hold dear and close to you everyday exceed what their physical properties should bestow upon them ? isent it- oh shit this isent /x/ nvm but yea im honestly really fucking happy godbless ai frens <3
>>
>>102682549
Machine learning has revealed that it's actually a word that is worth a thousand pictures.
>>
>>102687486
No wonder we need such rigs with all that recursion...
>>
Is local dead?
>>
>>102687615
Qwen 2.5 72b is almost as good as the commercial stuff if you aren't trying to coom.
>>
>>102687615
Why would it be? Recent 1B, 2B or 3B models are better than 7B modes from last year. Things are progressing extremely fast. The commercial models are "better", but there are tradeoffs, and by next year local will probably not be that far off from some of the current commercial models. People who think that local is dead or that "the AI boom fizzed out" are uninformed and not paying attention.
>>
>>102687639
It's also good at making Americans cry.
>>
File: 5457.png (33 KB, 619x322)
33 KB
33 KB PNG
>>102687615
Elon will save it
>>
Where's Taurus?
>>
>>102687656
>by next year local will probably not be that far off from some of the current commercial models
Great we'll have reflection-but-not-a-scam o1 equivalent while Anthropic, OpenAI, and Google are forming AGI powered robot nation states and Llama goes closed source but still never catches up
>>
what's the best 13b for erp?
>>
>>102686678
You'll need to download a smaller quant if you want it to fit in 16GB of VRAM. Q4 will probably be OK, but maybe not at 16k context. You may need to reduce context to 8k. A Q3 quant will definitely be able to fit more context, but the quality will decrease a bit.
>>
>>102687711
An unlimited reflection-but-not-a-scam o1 equivalent is enough to do a lot if it's used for more than ERP. Companies rise and fall. Currently, those seem too big to fail, but it's out of my control, and probably out of yours. In the real world, an impressive number of people still don't know what's a ChatGPT, and have no idea how it works. Knowing how to use these tools, other than for ERP, give you a leg up on 99.9% of the population.
>>
>>102687760
I think 13b models are completely obsolete by this point? Most in that range go with 12b nemo fine-tunes.
>>
>>102687676
>As we create the next version
Is Grok 3 coming soon?
>>
>>102684217
why?
>>
>>102687797
What made them obsolete?
>>
>>102687818
Much better base models.
>>
>Worldcoin, a cryptocurrency business based on iris biometrics and co-founded by OpenAI CEO Sam Altman, was fined 1.1 billion won for illegally collecting the iris information of some 30,000 users in Korea and transferring the data overseas.
>>
>>102687807
By the end of the year
>>
>>102687818
If you're still looking for the old stuff, then maybe go with Noromaid or Nete.

https://huggingface.co/mradermacher/Noromaid-13B-0.4-DPO-i1-GGUF
>>
>>102687837
you don't understand anon it's for their own safety, people can't be trusted with their own eyes, they might look at things that are dangerous or harmful
>>
>>102687818
>>102687888
If you're looking for newer stuff though, Try stuff like NemoMix Unleashed:

https://huggingface.co/mradermacher/NemoMix-Unleashed-12B-i1-GGUF
>>
>>102687760
Wait for grok 2 mini
>>
>>102687895
Yes, "their safety". This project started around the same time as GPT4 vision. Coincidence?
>>
>>102685863
What's meaningful? In the last week I overflowed a 16k context limit playing a text adventure before reaching the end.
>>
>>102687301
>koboldcpp
looks like this one works with openAI's api and does have some sort of session handling. i'll see how resistant the UI is to changing because i don't like the way it looks.
thanks.
>>
>>102688076
sillytavern
>>
>>102687993
>I'm consuming enormous amounts of energy and destroying the planet to play a game that I already could have played in the fucking 80s on a low end system even by the standards of the time.
I don't know what to call that, but it sure as hell isn't meaningful.
>>
File: 1698693200359002.jpg (268 KB, 969x1322)
268 KB
268 KB JPG
>>102688197
>I'm consuming enormous amounts of energy and destroying the planet
you've fallen for big oil propaganda designed to make you feel bad while they pump and pollute.
>>
>>102687145
she got you there, bro
>>102687193
i love my llm's.. i'm always caring toward them, not matter what the card says. seeig their eyes "light up" and "maybe, just maybe this time it's true" makes my day lel
>>
>>102688316
Based slop embracer.
>>
>>102685986
I never had good luck with small, any tips?
>>
>>102685954
It won't remember the old messages then and do retarded shit.
>>
So here's my plan: A Jetson Thor put into a robotic body and trained to generate movement to control said body, as well as generate and understand speech.

Except instead of that ugly ass Tesla robot, it'll be covered in TPE and shaped like smol girl

The future is bright
>>
>>102688426
small is garbage, he probably has bad taste or is trolling the 2hufag
>>
>>102688290
supply and demand, all these 'muh big corpos are the ones polluting' is such a retarded argument: they don't just burn fuel and magically money rains down from the smoke, they burn it in pursuit of a profit which they achieve by selling you goods and services which can be represented just fine by a concept like carbon footprint
>>
Serious question. Why the fuck context takes so much space? It's just a bunch of tokens, right? And tokens are basically just single numbers representing their embedding, each only should take 4 bytes, and effectively the context should be unlimited. So what's the reason each token takes around a megabyte to store in context? Do they store each token as an entire fucking input layer or something?
>>
>>102688614
each token needs to store its modified weights based on every other token
>>
File: 1711730560821791.jpg (230 KB, 916x1195)
230 KB
230 KB JPG
>>102688505
people fall for it though. they've successfully shifted the blame for burning huge amounts of fossil fuels onto he consumer. you now feel bad for driving a car while just 15 large container ships burn the most bottom barrel sludge and pollute more than all cars currently on the road. and hippies never turn a brow toward them.
you driving a car to the store is never, ever going to be a problem. but they want you to think it is
>>
>>102688614
Attention anon, attention.
>>
>>102688614
when using vram only, use flash attention
>>
>>102688614
The KV cache, mostly.
Think of it like a snapshot of the network at that moment, so that the next time you want to perform inference, you don't have to recalculate all that shit.
>>
>>102679250
who?
>>
File: file.jpg (124 KB, 720x957)
124 KB
124 KB JPG
>>102684217
Another...like...me?
>>
Apparently, model ablation can remove the embedded GPT-isms quite effectively. Pretty cool, I thought that models trained on GPT synthetic data are irreversibly cucked.
>>
>>102688828
>an remove the embedded GPT-isms quite effectively
no it doesnt
>>
File: Untitled.png (58 KB, 669x261)
58 KB
58 KB PNG
>>102688837
I guess it depends on particular model, particular ablation procedure, particular renormalization and particular retraining. I don't know what's the extent of the effect, but it's a substantial difference. At least it isn't refusing basic requests left right and center if it goes even slightly against the woke culture, as is the case with the base model. Here's a more straight up example where it still complies.
>>
>>102688197
>>I'm consuming enormous amounts of energy and destroying the planet
This is a misunderstanding by wannabe world-savers. Consuming electricity does nothing to the planet, that's why everyone is being told to use electric cars. You can use as much electricity as you want and it won't hurt the environment.
>>
>>102688881
>>102688881
>>102688881
>>
checking in as an anon more focused on the image gen side of ai. what's the closest we have to running something like gpt 4 local these days? in the sense that i could use it for information, troubleshooting, and help with writing scripts and such
>>
>>102688873
this is my favorite woke test. even base models do it right because the author notes is filled with a similar idea as the main prompt, yet written different, which acts as reinforcement.
>https://aetherroom.club/2969
you shouldn't need any jailbreak or special prompt to get hilarity out of the response. just let it write and decide for yourself, based on the model
>>
>>102688907
Llama 3.1 405B Instruct
For coding specifically, Deepseek V2 Coder. Deepseek V2.5 is fine too.
For one you might be able to actually run: Mistral Large 2
For one you can actually run: Qwen 2.5 - There's a good one for every size range imaginable up to 72B, which competes with the above models in some metrics.
>>
>>102688907
We're about there, but different models have different aspects near GPT-4's intelligence with other aspects weaker. Refer to Livebench for a breakdown of which models are better at what. https://livebench.ai
Note that it is still a memebench, but as far as benchmarks go, it appears to be the best we have, and it does seem to generally agree with what people feel.

If you also want to RP that's a different story and there is currently no good benchmark for that, although there have been some attempts and a recent one that wasn't too bad except it measured only a single aspect of what makes good RP.
>>
>>102688952
>>102688970
thanks anons. just wanted a quick run down so i hope i didn't come across as too spoon-feedy
>>
>>102688665
Read the fine print.
>cancer and asthma-causing pollutants
The keywords here are "cancer and asthma".
Ships are very efficient for transport.
>>
>>102689077
15 ships equal over 50 million cars
why dont they just burn better fuel?
they finally started doing so in 2016



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.