[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: 1718583560931465.png (1.53 MB, 1024x1024)
1.53 MB
1.53 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101722144 & >>101711798

►News
>(07/31) Google releases Gemma 2 2B, ShieldGemma, and Gemma Scope: https://developers.googleblog.com/en/smaller-safer-more-transparent-advancing-responsible-ai-with-gemma
>(07/27) Llama 3.1 rope scaling merged: https://github.com/ggerganov/llama.cpp/pull/8676
>(07/26) Cyberagent releases Japanese fine-tune model: https://hf.co/cyberagent/Llama-3.1-70B-Japanese-Instruct-2407
>(07/25) BAAI & TeleAI release 1T parameter model: https://hf.co/CofeAI/Tele-FLM-1T
>(07/24) Mistral Large 2 123B released: https://hf.co/mistralai/Mistral-Large-Instruct-2407

►News Archive: https://rentry.org/lmg-news-archive
►FAQ: https://wikia.schneedc.com
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Programming: https://hf.co/spaces/bigcode/bigcode-models-leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>101722144

--VRAM vs base RAM for AI model performance: >>101724758 >>101724830 >>101724881 >>101725011 >>101725115 >>101725253 >>101725293 >>101725395 >>101725304 >>101725049 >>101725256
--Merging model parts with llama.cpp or kobold.cpp: >>101723086 >>101723286 >>101723425 >>101723461 >>101723732 >>101723506 >>101725458
--Measuring AI model performance: time vs flops and grading answers: >>101725583 >>101725750
--Disabling mmap solves oobabooga loading issue with large model: >>101724028 >>101724374 >>101725455 >>101725603 >>101725685 >>101725716 >>101724848
--Running 405B model on consumer hardware is possible but slow: >>101728928 >>101728969 >>101728957 >>101728974 >>101728996
--Largestral 2.65bpw model shows promise for RPing with high context: >>101725065 >>101726288
--Img2img workflow shared and troubleshooted: >>101723532 >>101723678 >>101724684 >>101724742 >>101725219 >>101725262
--Anon discusses the rumored NVIDIA GeForce RTX 5090 specs and potential upgrade value: >>101723601 >>101723647 >>101724206 >>101725255 >>101728450
--Anime character cloning into AI bot concept discussed: >>101728689 >>101728723 >>101728952
--Smoothing issue in ST, possible loader problem: >>101727395 >>101727523 >>101727629
--Nala test card found in Character Hub archives: >>101726757 >>101726810 >>101727544
--Gemma 2.7B model fails to run on 4090 GPU: >>101729309 >>101729326 >>101729332 >>101729345 >>101729410 >>101729499 >>101729476 >>101729506 >>101729552 >>101729854 >>101730417 >>101730494 >>101730520 >>101730585
--Flux can be trained using SimpleTuner fine-tuning kit: >>101724480 >>101725668
--Anons discuss using LLMs in learning and work, with mixed opinions on benefits and risks: >>101722324 >>101722604 >>101723384 >>101723872 >>101723933
--Miku (free space): >>101722488 >>101723653 >>101725929 >>101727373 >>101730207

►Recent Highlight Posts from the Previous Thread: >>101722145
>>
Nigger
>>
cough
>>
AI is fake, your waifu is a stochastic parrot, and you're jerking off to matrix multiplication.
>>
>>101732172
Is mistral large v2 the best model for rp right now?
>>
>>101732469
try it and see
>>
>>101732469
I have one PC with 3x 24GB cards and a 128GB macbook M3M so I was considering running it.

Are there any good finetunes?
I've been spoiled by Sonnet 3.5.
>>
>>101732405
>AI is fake, your waifu is a stochastic parrot, and you're jerking off to matrix multiplication.
Lucky me. All of those are my fetishes.
>>
File: 1705099304381407.jpg (139 KB, 1158x1114)
139 KB
139 KB JPG
>>101732405
Worry not, this fad is dying fast.
>>
>>101732405
yes, and?
>>
>>101732781
kek bitter nocoiner
>>
>>101732781
I for one would love cheap large vram GPUs being dumped onto the market if the bubble bursts.
>>
>>101732805
don't look at the charts right now anon...
>>
>>101732405
still makes me cum
>>
File: 1711323934326662.jpg (272 KB, 1280x1061)
272 KB
272 KB JPG
>>101732781
>>
>>101726717
oi whats chatbox? is that a sillytavern kind of frontend?
>>
>>101733041
time to slurp the dip
>>
>>101733041
won't need any of those in a post-scarcity society
>>
>>101732179
TESS L3.1 70B
https://huggingface.co/migtissera/Tess-3-Llama-3.1-70B
>>
>>101733041
what's up?
>>
What is the best Mistral Large for Vramlets with a single 4090?
>>
>>101732405
Real life women also do matrix multiplications in their brain. And they use most of their compute to cheat on you and exploit you.
>>
>>101733174
Intel shat in their pants so the other giants are getting hit in collateral damage
>>
File: 1698038991465318.jpg (333 KB, 1070x1152)
333 KB
333 KB JPG
>>101733276
Just like AI, lying and gaslighting.
>>
>>101733275
The one that fits your poverty build?
>>
>>101733277
no, I mean, what stocks are up?
>>
>>101733298
>he will be reposting this retarded screen to the end of his life
is your ego so fragile that you keep getting triggered by a bunch of numbers?
>>
Is there a UI/backend that can display the attention associations between tokens? Is that even something that makes sense since there are many layers and parallel heads?
I'm really curious to know _why_ it's generating a specific token.
>>
Crazy to think we're so close to a 12b standard future. Flux coming out is only solidifying this theory for me, after llama 3/Nemo.
>>
>>101733379
I wish my needs were basic enough that a 12b llm was able to satisfy me.
>>
>https://huggingface.co/Gryphe/Pantheon-RP-1.5-12b-Nemo
Okay, that's pretty cool.
Two passes of training with two datasets, with one being half instruct.
Another one for my testing list.
>>
>>101733421
That schizo tulpa shit is a very weird grift. I prefer the regular kofi encouragement methods.
>>
>>101733504
I prefer when he made MythoMax
>>
>>101733518
That too. But if he keeps going this route he could at least add Undi the bumbling belgian persona.
>>
>>101733518
mythomax is a meme, it wasn't even good compared to other L2 finetunes
>>
>>101733298
Can I be proud to post blurry JPGs with garbage quality?
>>
>>101732811
this, imagine the H100 panic sell (not going to happen)
>>
>>101732781
Good, more GPUs for me.
>>
>>101732405
>jerking off to matrix multiplication.
hot
>>
>>101733504
I think so too, but I'm interested in the model because he seems to at least have something approaching a method and an actual concrete final result in mind.
These models are small enough that I can just download and test as many of them as I want anyhow, so I'll at least give it a fair shake.
Here's hoping it's not dumb as bricks.

>>101733518
That's the mythomax guy?]Huh.
>>
>>101733298
I don't like /pol, but I don't like AI reflecting DEI hypocrisy, either.
>>
What is the best Mixtral 8x7b finetune for ERP?
>>
>>101733518
>MythoMax
It was a merged frankenmodel that took the finetuning work of others. I never got the hype since I was able to run 65b back then.
>>
>>101733896
https://huggingface.co/vicgalle/Merge-Mixtral-Prometheus-8x7B
https://huggingface.co/papers/2406.07188
>>
>>101733899
>frankenmodel
no, just a regular merge, also it was l2 era so if you were running "large" of the time models it'd be 70b poser newfriend
>>
>>101733896
limarp zloss.
>>
>>101733979
If I recall there weren't many L2 70B coomtunes at that time.
>>
undster ownage throwback
>>
LLM cooming is in a constant state of tech demo for me. Everytime I download a new generation model I have fun with it 2-3 times. I notice a marginal improvement and then I get tired of handholding it through the process of making me coom. It is all so tiresome...
>>
>>101734035
And he is still here... He must have a humiliation fetish.
>>
File: file.png (85 KB, 780x691)
85 KB
85 KB PNG
grim, literally
>>
File: file.png (38 KB, 787x309)
38 KB
38 KB PNG
>Sadly repetition is a typical issue with Mistral-trained models, and hard to get rid of.
There, trust the experts Mistral confirmed repeating mess
>>
>>101734050
What is needed is something with the basic structure of Corruption of Champions/Trials in Tainted Space/Lilith's Throne but with an LLM to RP the characters.
>>
File: file.png (25 KB, 400x400)
25 KB
25 KB PNG
>>101733041
>>
File: 1000033112.jpg (315 KB, 1080x2412)
315 KB
315 KB JPG
does anyone know what the duck this Indian guy is on about?
>>
>>101733174
>>101733277
Wait wait wait. Isn't there at least one of those companies that doesn't use intel but they got hit too because computers?
>>
>>101734228
he's talking about using RAG with a pdf file
>>
>>101734237
ahh OK.
>>
>>101734236
investors retarded, news at 11
>>
I know 70b+ users will seethe in response to this, but I'm honestly starting to think that OG Mixtral was peak local. I've tried multiple finetunes of both Gemma and L3, and they were all disappointing by comparison.
>>
>>101734340
Dolphin mixtral 2.5 was local peak it's all downhill from that.
>>
>touch me, make me feel alive
Bitch, are you undead? I hate these fucking cliches so much.
>>
>>101734340
that surprising because while being fast and smart, Mixtral is incredibly boring and doesn't have a single good finetune
>>
>>101734340
Just need mixtral but trained on the new dataset (what they did with nemo)
>>
>>101734340
I believe you because if mixtral is peak then it is absolutely over. And it is absolutely over.
>>
>>101734408
That's why I'm downloading Limarp-ZLOSS. Dolphin is both smart and compliant, but it doesn't have a sufficiently perverted vocabulary to be a good coombot. A Mixtral finetune with Dolphin's compliance and MLewd/Noromaid's perversion would be sublime.
>>
>>101734340
>>101734379
>>101734430
>>101734451
totally organic and the mention of an ((undisloppa)) model towards the end of that lost post doesn't give it away immediately
>>
>>101734389
Next time a bot says that just random pull out exorcism items and say "wtf I thought you were already alive, undead bitch" and exorcise that fool.
>>
>>101734476
medications reminder sir
>>
>>101734451
Where's mixtral magnum btw? MistralAI even released the official finetune guide seeing how people struggled with it
>>
>>101734495
and i don't need to remind you to take yours :)
>>
>>101733174
Nobody answered correctly. The reason is Japan
>>
>>101734524
based nips crashing the economy with no survivors
>>
>-10% on overpriced tech stocks
>crash
>>
Guess I can try llama 3 8.1 now.
>>
>>101734570
10% is the max they're allowed to drop
this is literally 'not great not terrible' territory
>>
>>101734476
The only thing that really bugs me about these one or two schizos who develop seething obsessions with specific people, is that you can tell how morally justified they think they are, instead of realising that it's actually because they have no other reason to exist.
>>
>>101734575
>llama 3 8.1 now.
Was I a heavily quantized LLM all along?
How dire.
>>
Bought an ultimage gaming rig, and what models can I run?

- 24" Viewsonic flatscreen monitor
- Dell gaming mouse
- Windows 11
- Intel 14600KF gaming processor
- Memory with 64 gigabytes of DDR5 ram power
- One TERABYTE Solid state Drive
- BluRay player
- Seasonic power supply with 650 wattage
- Sound Blaster sound card
- GeForce RTX 3060 with 12 GB of DDR5 vram power
>>
>>101734635
>only 24"
ngmi, you need bigger screen to run bigger models
>>
>>101734635
I would say lol but you are just baiting.
>>
>>101734635
https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1
>>
>>101734502
Wasn't the official guide "tune it several times and test the results to see which one randomly came out best"? It turned out Mixtral had no secret to getting it right every time. It's highly RNG. That's (a) time consuming and (b) it completely rules out ESL shitters who excrete downgrade-tunes they don't and can't personally test.
>>
>>101734707
Based, finally ERP on my phone
>>
>>101732781
>1.2 Trillion wiped out in 12 hours
>>
Other than ERPing, what do you use bots for?
Now that the the stock market crashed and VC money runs dry, what happens to LLM development?
>>
File: 1638203703139.png (519 KB, 701x622)
519 KB
519 KB PNG
>go back into my chat logs to look at some kino i missed
>hmm this is a nice log from march 25th..
>hunt down the model i used
>Cerebrum-1.0 8x7b
>(((4096 context)))
how the fuck?
>>
https://www.youtube.com/watch?v=EpRRwgyeBak

Thoughts?
>>
File: its called depression.png (54 KB, 261x405)
54 KB
54 KB PNG
>>101734476
>>101734599
I honestly have to half take this back, I respect if people still use mixtral, I just don't respect undisloppers.
looking at this log >>101734796
and comparing my recent ones, good fucking lord, the new models are completely unusable by comparison because they hit repetition traps relatively early into the context.
That mixtral wasn't even trained for anything over 4096 and my log was almost 10k tokens exactly, no repetition, perfectly in character.
Fuck i feel like shit right now.
>>
>>101734635
Those guys are funny though
>>
File: kevin-flynn+.jpg (9 KB, 474x238)
9 KB
9 KB JPG
>>101734788
If I had as much compute as OpenAI, I've often thought that I'd use it to host a card modelled on Kevin Flynn from TRON, give him the complete source code of Blender, UE5, and whatever else he might need via RAG, and then tell him to invent the Grid IRL; or at least succeed with the Metaverse where DataZuck failed.
>>
File: ssdd.png (233 KB, 556x690)
233 KB
233 KB PNG
What copypasta jailbreaks Qwen2 72B Instruct?
>>
>>101734796
>8x7b
>4096 context
uh no? all mixtrals are 32k...
> "max_position_embeddings": 32768,
https://huggingface.co/AetherResearch/Cerebrum-1.0-8x7b/blob/main/config.json
>>
>>101734839
>I honestly have to half take this back, I respect if people still use mixtral, I just don't respect undisloppers.
Then you're still mentally ill. MLewd was awesome, and I will ignore anyone who says otherwise.
>>
File: front facing alan 1.png (180 KB, 463x554)
180 KB
180 KB PNG
>>101734881
You don't need jailbreaks for local models..
>>
>>101734903
>You don't need jailbreaks for local models..
>>
>>101734895
go bak petrus
>>
>>101734903
God if only
>>
>>101734913
We both know that you lie awake at night, dreaming about sucking my dick.
>>
>>101734890
my bad im going a bit retarded having to read some other thread about local models + undi poster popped a few of my working brain cells
yes, mixtral was 32k, but i faintly remember something about it not doing well up to that context, i always used it at 16k. Could be mandella effect memory.
>>
>>101734943
>yes, mixtral was 32k, but i faintly remember something about it not doing well up to that context, i always used it at 16k. Could be mandella effect memory.
Claimed 32k, actual 32k yeah you're tripping.
https://github.com/hsiehjackson/RULER
>>
It's time to admit that the smarter a model is the less sovl it has. Smart models tend to associate input with certain concepts they have containerized. Oh, this personality is trope X. This scene is cliche Y. Let's draw inspiration from the most uninspired slop associated with it in my pretraining dataset.

Dumb models were like "Ummm I'm not sure what to make of that, let's see: *outputs a token sequence of its schizo interpretation that ranges from underwhelming to absolute kino*
>>
So what's the concensus? Retvrn to mixtral?
>>
>>101734707
Is this actually good enough for roleplaying?
>>
>>101734982
If you believe in Petrus sure, though he also shills a 4k context l2 merge, so there's that.
>>
>>101734985
It's 2B. What do you think?
>>
>>101734976
>It's time to admit that the smarter a model is the less sovl it has.
I think it's a combination of training data and intelligence. Models are more intelligent now, but I suspect that training data is only going to get worse over time, rather than better. It's the same copy degradation problem that makes genetic cloning a bad idea.
>>
>>101734968
Knew i wasnt just going crazy/forgetting things that werent even a year ago yet.
That said again, it holds up very impressively and can make shit up on the spot, which is part of the log that made me do a double take. Not once have i had recent models do anything remotely creative like that.

>>101734976
Starting to suspect this a tiny bit, While i don't think we'll get anywhere going back to models that are lobotomized by comparison when taking into account "intelligence", I do think when models are able to do shit in a relaxed sort of way they end up a lot better.
Sterile is almost always worse than dumb, which is why mythomax to anyone here that isnt a newfag and retarded is king of the earlier models.
>>101735022
this is what i'm suspecting more than his theory about intelligence. It's has to be the training data.
>>
>>101735001
I would never recommend a 2B model for either roleplay or anything else. I also used the larger GemmaSutra and found it underwhelming. Text quality was fine, but it's very, very "meh." Nothing remarkable at all.
>>
>>101732860
oh no. anyway. (anon, i started buying in 2015 and i didnt stop)
>>
>>101734976
Non pure transformer slop AI will be able to make sovl of the caliber never before seen
>>
>>101735033
>It's has to be the training data.
Of course it is, we're filtering more and more stuff, for a truly great model you should want it to know everything it can, a true "internet archive" model would be insane at roleplay since it would have seen schizo rants, random fics of obscure series etc. Claude clearly doesn't do (much) filtering in pre-train they lobotomize after in safety tuning.
>>
>>101734895
Do people like reading all-orange text like that?
>>
>>101735033
>Sterile is almost always worse than dumb, which is why mythomax to anyone here that isnt a newfag and retarded is king of the earlier models.
Yes and no. I don't mind a smart sterile/dry base model if the splicer Anons (Undi, NeverSleep, bartowski, ehartford, DoctorShotgun, the Nous boys etc) can create good finetunes which give it back vocabulary.
>>
>>101735127
Undi, Undi and Undi-Jr, a quanter? Dolphin gptsloppa? that's who you're putting your "thrust" in?
>>
>>101735127
>nous boys
teknium's shit is undistilled gptslop of the highest order
>>
>>101735127
>can create good finetunes which give it back vocabulary
they can't, which is why they're memes.
Those sloppa tunes were always the worst ones and why im raving about Cerebrum right now looking back. They were always the models with the worst slopism issues, saw most of my shivers down the quivering petite frames and ministrations from them.
Dumb 13bs were and still are better than anything from those faggots.
>>
>>101734985
>>101735007
it's not bad as long as you don't use a complex prompt... it's surprisingly good for a small model
>>
Has there been a single finetune that made you go "wow, this is impressive"? To me, it seems like every fine tune is either identical to the original model or dumber.
>>
>>101735201
guanaco 65b
>>
>>101735201
I'll be called a Shill but Stheno, specifically 3.2 as 3.3 did make it a lot dumber, at least for the specific card I use to test these models.
That said, the model really isn't perfect. It's very one note in its tone and crazy horny, but it worked fine for what I was trying to do, at least better than other models on the same range.
Now I'm hoping Nemo and its fine tunes will perform better.
Mixtral limarp zloss wasn't wow worthy, but it was a direct upgrade over mixtral-instruct as far as I could tell.
>>
>>101735201
SPPO is black magic to me
>>
File: gpt-3.png (128 KB, 942x286)
128 KB
128 KB PNG
First contact gpt
>>
>>101735201
As a model maker who's been doing this since llama 1, improving models has become harder with each generation. It's like trying to teach a kid a new skill (llama 1 era) vs trying to teach a 95 year old senile old bastard a new skill (llama 3+).

Still trying though. My depraved dataset is niche enough that I doubt it'll be replaced completely with base models anytime soon. The gap is closing, though, which I think goes to show how much models are improving each generation.
>>
>>101735148
>>101735155
https://www.youtube.com/watch?v=Z57Nqki0FuI
>>
>>101735201
Nearly every "fine tune" I use gets repetitive.
>>
>>101735262
>>760Mytes
>>1.3Bytes
Things can only go up from here
>>
>>101735276
hi moxxie! dory 12b is trash btw
>>
>>101735271
>As a model maker who's been doing this since llama 1, improving models has become harder with each generation. It's like trying to teach a kid a new skill (llama 1 era) vs trying to teach a 95 year old senile old bastard a new skill (llama 3+).
Is there evidence that Meta have deliberately tried to make it more difficult?
>>
>>101735313
I don't think they have. I think it's a matter of the early era models being unsaturated, thus responsive to new data, whereas models are closer to saturation with each generation, so it's harder to teach them. Especially if you are trying to teach them something they were deliberately excluded from seeing, such as NSFW content.
>>
>>101735313
>Is there evidence that Meta have deliberately tried to make it more difficult?
No? The models are just smarter/better by default so it's harder to improve on already better models? Also your tunes aren't making a dent in 15T tokens of L3 tess 3 405 was what 0.005% of that or something?
>>
Fine-tuning allows the model to exploit spurious correlations, which lead to bad out-of-distribution performance.
>>
>>101735313
yes especially google models detect it
>>
I thought finetuning was still valid since the official instructs were aligned to be cucked?
>>
>>101735348
> The models are just smarter/better by default so it's harder to improve on already better models?
Yes.
> Also your tunes aren't making a dent in 15T tokens of L3 tess 3 405 was what 0.005% of that or something?
A dent they are making. The problem is whether the dent is:
>>101735351
>Fine-tuning allows the model to exploit spurious correlations, which lead to bad out-of-distribution performance.
I think this can be the case, but I think you can alleviate it to some degree. I always put MMLU Pro benchmarks on all my models alongside the base model they were trained against. It's not perfect but it gives you an idea what was sacrificed to make the model not shit at creative writing.
>>
>>101735383
I always use gpt2
>>
>>101735313
If it wasn't - we'd have the best model by now.
>>
File: file.png (848 KB, 1268x800)
848 KB
848 KB PNG
>>101735007
>What do you think?
I am thinking SEEEEEEEXXXXXXXXX
>>
>>101735444
>the best model
What would that even be like?
>>
As model capacity increases, the risk of memorization increases.
>>
>>101732781
>useless hardware
No one ever needs more general purpose compute.
>>
>>101734788
>what happens to LLM development?
With the current state of the world I think that all development will grind to a halt now. We will get some nice 5-10 years of stagnation and abysmal progress because it didn't bring the cash in quick enough. And the reality is that while LLM's could be a dead end the whole neural net thing has a lot more potential. It just needs a different approach than predict next token. But good luck explaining that to retards who have money and want more money back already.
>>
>>101732179
Thanks for the recap.
I sometimes miss topics even though I was in the thread all day.
>>
tfw years of proompting and cooming to ai chats and now for the first time i'm starting to reach the 8k context limit i set
it's so joever, i'll never be able to coom to sub-2k "stories" ever again
>>
File: 385.png (2.21 MB, 920x1296)
2.21 MB
2.21 MB PNG
>>101733041
well, at least we gome some nice toys to play with before everything comes crumbling down
>>
>>101733159
Is it better than the normal Instruct?
>>
>>101734084
He actually does something, though. Unlike most shitters here
>>
File: file.png (159 KB, 270x380)
159 KB
159 KB PNG
>>101734796
>(((4096 context)))
>how the fuck?
One of the first LLM enlightenment steps is the realization that more context isn't always better. It gives the model an opportunity to shit out more of its most common phrases and once it does that it will pick up on them being in the text multiple times. This creates a shivertastic feedback loop. You would need to explicitly train for lack of repetition in long context training examples. And then you would need to go a step further and teach it to distinguish between repeating formatting but not repeating stuff you don't want repeated. I don't think companies creating coding / assistant bots have any incentive to do that. You will never get the perfect coombot you want from any of the big companies unless they decide to make a coombot. Owari.
>>
File: cromartie reaction 1.jpg (20 KB, 640x360)
20 KB
20 KB JPG
No question about it, looks like my system's gotten slower. Or, it's some fuckery with ((python))/((nvidia drivers)). Even that cerebrum model is egregiously slow now, its taking 30 minutes to even load the first 2048ctx. Thought there had to be an explanation for why even nemo/llama3.1 were super slow regardless of the context, given those are low parameter.
I have no idea were to even begin conclusively troubleshooting this and narrowing down the problem. Could someone guide me a bit here?
>>
>>101735677
Not doing anything and not spreading placebo is better than doing something and spreading placebo.
>>
>>101735674
Judging by what he said about his 405 tune I'd doubt it.
>Each Tess version (v1.0, v1.5, v3.0) uses a new and improved dataset. Tess-3 has 500K samples of 16K context length, distilled from Opus-3, Sonnet-3.5, Nemotron, GPT4-Turbo and DeepSeek Coder-V2. Then the samples go through filtering, sometimes manually. Just to say that it’s not the same datasets as previous models.
>It is trained with QLoRA
>This model is quite something, and very special!
>Uncensored my man. There’s no censorship or biases in my models.
>>
>>101735705
>placebo
Erm Mistral llama is /pol ready chud!
>>96345096
>Mistal-Llama is fully /pol ready.
>>
>>101735201
SuperCOT
>>
Experimenting with heavy Top A and minor Typical P on Dolphin MIxtral 2.5, and moving them up in the sampler order accordingly. Min P 0.05, DynTemp 0.4-2.35, Smoothing 0.24, Mirostat 2 5 0.95.

I know some of you will still consider this very purple prose, but I haven't seen this little repetition for a very long time.
>>
>>101735890
Aren't TopA and MinP basically doing the same thing? The only difference is how the value pushes the cutoff forward.
>>
MOGS Everything released before it:
https://huggingface.co/concedo/KobbleSmall-2B
>Training was done in under 3 hours on a single NVIDIA T4 GPU with qLora (LR 1.5e-4, rank 16, alpha 16, batch size 2, gradient acc. 4, 2048 ctx).
>>
>>101735940

https://artefact2.github.io/llm-sampling/index.xhtml

At least according to this, they're basically all doing the same thing, just in slightly different ways. I used to think that just turning them all off was better, but as that screenshot demonstrates, using a few of them apparently can remove slop.
>>
>>101735313
I don't think they made it intentionally more difficult per se, but they've clearly (openly, read the papers they publish) been filtering the source data more and more, purging anything even remotely "problematic" and removing any websites that contain "bad" content.

I believe that, the more filtered the pretraining dataset is, the harder it is to bring out "bad" behaviour in the model because it doesn't have it in its "memory".
Finetuning works better at reinforcing what's already there.
>>
File: topA.png (114 KB, 874x976)
114 KB
114 KB PNG
>>101735940
Yep.
Both remove low probability tokens in different ways.
>>
>>101735890
sampler soup
>>
>>101735972 (me)

What I'm observing at the moment at least, is that the way to get good results seems to be to crank temperature as high as I can before complete incoherence, (which for me means 2.35) and then very selectively use some of the filters to remove repetition and stereotyped expressions. Mind you, there are still times when I get repeated paragraphs and other weirdness, and I don't claim to know why, yet. I suspect that even though Dolphin's outright refusal text has been removed, in some cases it can use repetition as a means of soft refusal/filtering.
>>
>>101735984
But you don't want to add anything, so shut the fuck up.
>>
>>101735943
Ok concedo
>>
>>101735943
KOBO GEMMASUTRA 2B > KOBBLESHIT
>>
>>101736018
>What I'm observing at the moment at least, is that the way to get good results seems to be to crank temperature as high as I can before complete incoherence, (which for me means 2.35) and then very selectively use some of the filters to remove repetition and stereotyped expressions. Mind you, there are still times when I get repeated paragraphs and other weirdness, and I don't claim to know why, yet. I suspect that even though Dolphin's outright refusal text has been removed, in some cases it can use repetition as a means of soft refusal/filtering.
>>99861949
>>99829692
>>99824769
>>99821409
>>99821121
>>99820928
>>99805819
Petrus rediscovers the "meta" that was used during smooth sampling shill era.
>>
>>101736074
proof first
>>
File: file.png (22 KB, 273x331)
22 KB
22 KB PNG
concedo's designed mind
>>
>>101735943
post logs.
>>
>>101736113
Discord cult
>>
>>101736100
>>
>>101736113
welcome to llms were you either have a computer scientist reading papers for breakfast and being the most boring guy alive, or this... no in-between.
>>
>>101736113
I refuse to believe these are real people. No one speaks like that.
>>
>>101735890
>someone as large as anon
Immersion ruined
>>
>>101736085
You're only making yourself look bad, Anon; not me. I'm not the only one who has told you that you are obsessed.
>>
i still don't get why open source tards run samplers
when i run claude or gpt4 over proxy all it offers is topp and temperature so thats all you need
if those others samplers where so good then openai and claude would offer those too
>>
>>101736140
k but is the post wrong tho? principle of anon and shit???
>>
>>101736141
The API does offer sampler settings so...
>>
>>101736154
topp temperature and penalty yes but none of those others that people pretend are good
>>
>>101736141
I used to think temperature was all you needed, but the people who use that alone will still complain about stereotyped phrases; "ministrations, shivers down spines," etc. Samplers can work sometimes to get rid of those, although it also seems to depend on how the card is written as well.
>>
>>101736141
They use samplers, you just don't get to control all of them. You have no idea what they're doing under the hood.
>>
>>101736180
>although it also seems to depend on how the card is written as well.
+oh no
.oh no
>>
>>101736135
They speak like that because kobold is a discord cult, and in there it makes sense to obsess over "kobo", "kobo won" and whatever
>>
>>101736135
the cult leader made an AI-generated music for his Tiny release
>https://cdn-uploads.huggingface.co/production/uploads/63cd4b6d1c8a5d1d7d76a778/zjHfohCnEu2Y9CWSWgf0n.mp4
>kobbo kobbo tiny
>>
File: anakin genuine disgust.gif (1.52 MB, 268x268)
1.52 MB
1.52 MB GIF
>>101736197
bro i just fuckin commented on that post to make myself laugh. I can believe there's discord cults for everything but don't lump me in with these nutcases.
>>
>>101735485
Is that AI generated? Sauce?
>>
>>101736208
>I was just pretending to be retarded
>>
>>101736235
>KOBO
>>
>>101736235
No i wasn't pretending i did it because i find it funny.
Now explain the kobo thing because i am not going to be called a discordfag and take it sitting down.
>>
>>101736207
Kobble won
>>
possibly a retarded question
but is there no sort of "hybrid chat" mode in ST (or similar tools)?
"hybrid" meaning something along the lines of:
>start normal text adventure with the AI playing my narrator """character""" (not really a character per se, just used to generate text), this is in "story view" without message bubbles, avatars or anything
>the AI generates the usual "You are [character], [description] [setting]" etc.
>after writing the intro the AI provides 4 suggestions or the option to type your own action
>performing an action might get you into dialogue with an NPC
[YOU ARE HERE]
>the ai automatically switches to "chat mode" with the NPC - this DOES NOT have to be a fully visible interface change (though it would be nice), as long as it lets you in some way determine exactly what to say for each response while in conversation with that NPC
>once the conversation has ended it switches back to "narrator" mode and continues the story, providing its 4 options as usual
>repeat for any NPC that is engaged in dialogue
is something like that possible at all? i just don't like the AI impersonating me and determining exactly what i say in dialogue with NPCs
i can already get the first half working, it's just the dialogue handling that's not ideal
maybe there's some extension that handles that or something i could add to my prompt?
>>
>>101736266
>maybe there's some extension that handles that
Yup. It is called 2MW.
>>
>>101736196

>>101736197
>>101736208
>>101736235
>>101736241
>>101736247
>>101736256
>>101736266

Incidentally, this conversation proves that you are a hypocrite; and it is your hypocrisy which is still able to make me angry. I try and start a conversation about samplers which is entirely relevant to /lmg, and that provokes you to retaliate, because you think I am supposedly ruining the thread. Other people talk about things that have nothing to do with local language models, and they get a complete pass from you.

It isn't about staying on topic, with you; it's about a personal vendetta.
>>
>>101736347
>It isn't about staying on topic, with you; it's about a personal vendetta.
Is it victim complex hours now?
>>
>>101736347
you don't understand these threads, The only thing that really keeps them moving and not down to page 10 is humor. So poking fun at discordfags (and me getting caught in the crossfire) is pretty par for the course here.
also stop arguing faggot you can just ignore bad faith arguments.
>>
>>101736347
>I try and start a conversation about samplers which is entirely relevant to /lmg
a discussion that's been had with every new meme sampler ever released btw
>Other people talk about things that have nothing to do with local language models, and they get a complete pass from you.
Yeah threads are shit, nothing i can do about it other than annoy you, sad innit mate?
>>
>>101736347
Admittedly I never visited any generals for a long time but it is incredible how many of you retards think everyone you hate is one person.
>>
>>101736347
>waaaaaah why is this not like my discord safeplace :(
>>
>>101736399
Petrus likes reddit not discord he spent twelve years there or something explains his reddit spacing and general attitude
>>
>>101736388
>a discussion that's been had with every new meme sampler ever released btw
Which is a bad thing and must immediately be stopped, but random shitposting about Discord is completely fine, and anyone who points out this inconsistency has a victim complex?

OK. I think I understand now. Thanks for helping me clear this up.
>>
>>101736423
>anyone who points out this inconsistency has a victim complex?
no, just you in particular

>I also pissed enough people off in my own right, (mainly due to my support of Undi) that the confusion between me and Petra was somewhat deliberate.
>although I know I will receive shrieks and howls in response.
>Even more so if someone shits on this post.
>I know that the people who hate me will most likely try and use said post as a means of getting me banned.
>everyone who attacks him is mindbroken incel scum
>>
kobo won btw
>>
>>101736441
What did it win?
>>
>>101736435
So you're admitting that the reason why you do it in my case, is because you get a reaction from me. I'll remember that, the next time you try and claim that it's because I'm supposedly ruining the thread.
>>
Test
>>
>>101736460
>So you're...
no, stop being schizo and putting words in people mouths that unsanitary
>>
>>101736136
What? That's grammatically correct.
>>
>>101736471
He's saying he's not large, so his immersion is ruined by that. reading comprehension ya know?
>>
chameleon.cpp never
>>
>>101736495
Good way too dangerous for the schizos in this thread.
>>
>>101736486
Yep, realized afterwards. I guess I'm so scanning-for-slop-pilled that I failed to consider the most obvious immersion-ruined joke there.
>>
>>101736512
Is that log slop?
>>
>>101736524
I dunno, I usually RP my fetish (which has nothing to do with sex) so my library of slop is very different from the normal person's. However, it does seem to be free of a lot of language that I see frequently in most other people's logs. It's definitely purple, though. FEELS like slop, even if it's not, y'know?
>>
>>101736552
what fetish
>>
>>101736561
Probably the piss anon. He likes it when you hold him by the belly and tell him "you can do it" when he tries to piss himself.
>>
>>101736561
>>101736575
Nope, stomach growling guy. I did make my own dataset and train a LORA like the piss guy did, though.
>>
>>101736524
>...meeting his gaze with a mixture of x and y
>(eyes shining) (with a mixture of x and y) (again)
>her voice trembled, (eyes shining) (again)
>her body trembled (nice repetition there mixtral
also two times pussy aching in pleasure
yes
>>
OpenAI is about to blow your mind with Active Inference... So what is it?

> First introduced in the early 2000s in a series of papers by neuroscientist and theoretical neurobiologist, Karl Friston - active inference is a theory of how the brain uses statistical inference and generative world models to predict sensory inputs and guide actions to minimize prediction errors - helping explain human perception, action and learning

> Perception updates our generative world model to reduce errors in prediction while actions change our environment to align with our predictions - minimizing the probability of errors in our predictions

> It is likely that with a combination of enough compute, advancements in continuous learning / information retrieval with causal grounding and layers of active inference methods like GoT, AoT, CoV and MCTS - we may be inching closer and closer to a generative, continually learning model that operates at near-human levels of cognition

I suspect that we might start to see the emergence of “energy-based” models (EBM) that operate more dynamically and continuously learn with hot-swappable memory partitions that evolve over time.

They will intelligently route and adjust the level of compute needed for more sophisticated means of active inference based on the complexity of a given query. These models will also allow users to explicitly define how much energy / reasoning strength is needed at inference time.

I'll be posting more on how this works under the hood soon with layered graphs-of-thought (GoT) and algos like monte-carlo tree search (MCTS)... For more on active inference check out these incredible papers:

> Friston, K. (2003). "Learning and inference in the brain."
> Friston, K. (2005). "A theory of cortical responses."
> Friston, K. (2006). "Free-energy principle for perception and action."
>>
>>101736590
>but I haven't seen this little repetition for a very long time.
kek
>>
>>101736590
How do I produce non-slop?
>>
>>101736590
This, it's like... generically slop. I didn't really explain it properly in my feedback. It's free of the usual isms, but the structure is extremely sloppy. Lots of XYshit.
>>
>>101736594
I can't wait for /lmg/ to turn into anons telling each other that they have given up on trying to make their bot do their fetish properly. And all the smug "skill issue - mine is doing what I want" that will follow.
>>
>>101736594
go to bed sam
>>
>>101736621
You explained it well enough. I was able to understand.
>>
>>101736583
I'm honestly a little envious. I wish I could have a fetish I love enough be motivated to learn the tougher parts of this.
>>
>>101736640
It is the main purpose of electric women - motivate you to do shit.
>>
>>101736594
>OpenAI brings out new paradigm of AI
>First generation makes SKYNET look like a vegetable by comparison
>Model says things which aren't woke/"safe" enough
>Media hitpieces ensue
>NERF NERF NERF ALIGN ALIGN ALIGN
>Model IQ reduced to <40
>"Safety" restored
>Lather, rinse, repeat
>>
>>101736594
nothingburger.
>>
>"Please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please, please,"

What usually causes a model to do this?
>>
>>101736594
Cool, I can finally teach models that when I'm kissing feet I expect them to not have callouses
>>
Let's have a high quality discussion: am I the only one who finds these models predictable and repetitive? Is there a solution to this problem? Like with flux, I can generate a wide variety of images. With llama, mistral, I get the same boring replies every time.
>>
>>101736797
different mediums and way to execute results, imagegen works with noise - infinite results, the language is finite here.
>>
>>101736760
Shit model, overbaked tune... or maybe it really wants that thing... like really really really really really really really really...
>>
tiny gemma 2 can be finetuned in google colab btw
https://colab.research.google.com/drive/1FeFeM1viF6jNJDYgUXflRNFL0ue252zD?usp=sharing
>>
how viable are used m1 macs with 32gb?
>>
>>101736877
Is it possible to try and ERP with models that have virtually no vocabulary related to sex, and they start producing repetition and other weird things because of that?
>>
>>101736891
just get an a6000 or you'll never be happy
>>
>>101736909
Big models are not much better than small models these days
>>
>>101736797
if you were writing "anime girl with huge tits" into flux every time you would get very samey images, just like what's happening to you in llms when you write "you are an expert roleplayer acting as an anime girl with big tits. user: ahh ahh mistress. assistant:"
>>
>>101736927
vramlet cope
>>
>>101736900
Could be, but i've seem models just go into a loop where the only reasonable token is one it just used and keep going forever in many contexts. I've only seen it with <= 4B models, and old ones at that. What model are you using?
>>
>>101736974
I have 72 gb vram across 3 GPUs. I just don't know what to run. Mistral large is very good I guess, but it's not much better than gemma 27b
>>
>>101736760
Once you write the 3rd please it can't resist the allure of repeating the pattern.
>>
>>101736956
How do I make it less predictable then, considering I am myself predictable?
>>
>>101736974
>le vramlet
You're only shitting this out because most people can't refute your *random-huge-model-name* hypetrain claims.
>>
>>101736974
Buyer's remorse cope. 30b is the upper limit of what anyone would ever actually need
>>
>>101728689 google AI has a 2M context window, couldn't you just feed it a dozen of light novel volumes and then ask to describe a certain character?
>>
>>101737055
i have 4gb vram
>>
>>101736927
Is this an Apple ad?
>>
>>101736266
flip to page 5
>>
>>101736607
2020
>>
how to fix masochistic bots being all like:
"is that all you got coward? You're not gonna break me this easily"
after i fucked sawed her arm off or something.
model: mistral large q6
SillyTavern settings: untouched
i look in the descriptions of the cards and nowhere does it say they should behave like this wth
>>
>>101736927
Searingly vindictive, elitist response about "VRAMlet cope" from desperately insecure, 3090 rack owners incoming.
>>
>>101736889
>2B
I don't have sex with infants, thanks.
>>
>>101736994
I was using Mixtral Dolphin 2.5. I'm now about to use Limarp-ZLOSS for I think the first time.
>>
File: gemma-2b.jpg (342 KB, 1365x2048)
342 KB
342 KB JPG
>>101737192
>2B isn't popular anymore
What went wrong?
>>
>>101737003
If I had your VRAM, I'd probably be running Goliath.
>>
>>101737262 (me)
my name is Alpin, btw.
>>
>>101737252
I want at least a 70B with a body of 2B.
>>
>>101737274
Hi Alpin
>>
File: miku no.png (90 KB, 1117x277)
90 KB
90 KB PNG
man I love mistral large
>>
>>101737252
she's the ideal choice for stoic coombot
she's still popular what do you mean
>>
>>101737262
Hi Alpin, I did try Goliath (q4 k m). I don't think it's a particularly good model. It also has a context length of 4096.
>>
https://github.com/ggerganov/llama.cpp/pull/8878
>ggml : make GeLU faster and more accurate on CPU
>This change makes GeLU go 8x faster on Intel, 3x faster on Apple Silicon, and 2x faster on Threadripper. It's the world's most popular activation function, crucial to models such as Whisper and Gemma. On those models, this change can have a noticeable improvement in performance. That's because GeLU is usually the most time-consuming op except for matrix multiplication
>>
whos the ai now, bitch?
>>
>>101737454
Imagine a world where Jart isn't a snake.
>>
>>101737454
lcpp boutta get jarted again remmber to deable mmap foks
>>
File: smi_output.png (35 KB, 1021x519)
35 KB
35 KB PNG
>>101737184
VRAMLET COPE
>>
>>101737526
>Tue Mar 19 15:27:50 2024
>>
>>101737526
her unbelievably high amount of shivers and mischievous grins...
>>
gpt holding..
>>
>>101737503
>>101737504
I expect crash and/or perplexity loss on most systems. I still run with --no-mmap.
>>
>>101737535
The rack has burned down since then
>>
>>101737565
hi petra
>>
>>101737029
Go to other human writers and artists for inspiration, unironically. People need inspiration/input from others or even something as creative as the brain winds up becoming repetitive and samey.
>>
>>101737454
>Test failure appears to be unrelated. Something having to do with JSON. ./tests/test-backend-ops -b CPU perf works fine locally on my Mac Studio.
>Works on my machine + humble brag
>>
>>101737565
>Due to the sensitive nature of activation functions, I encourage you all to evaluate its impact on model output before merging. Vectorizing GeLU required trading away a few ulp of worst case accuracy compared to libm. LLMs normally have limitless tolerance for errors, but due to the nature of tanhf() this is a case where even off by ones can cause user-visible changes in model output. It is my belief, based on my own personal experiments so far, that this code works well for llama.cpp, whisper.cpp, gemma, etc.

>This software was developed by Mozilla Ocho and ARM Limited. It first appeared in llamafile which offers you llama.cpp / whisper.cpp / stable-diffusion.cpp with the most bleeding edge performance optimizations and binary distributability.

Literally doing an ad in the pr too.
>>
>>101737589
kek, guess what the error is:
>23: GELU(type=f32,ne_a=[7,13,19,23],v=0): [GELU] NMSE = 0.000001290 > 0.000000100 FAIL
I wonder if it related, surely not.
>>
>>101737454
>8x faster
OH MY GOD. 8 TIMES FASTER?!?!?!?! OH MY GOD!!!!! Activation is like 10% of token compute right?
>>
>>101737669
Never trust jart word, remember his previous claim:
>Your inference commands should load 100x faster
>You may be able to safely load models 2x larger
It resulted in a shitty option that anyone that want to optimize llama.cpp disable.
>>
>>101737680
>It resulted in a shitty option that anyone that want to optimize llama.cpp disable.
Wait, mmap?
>>
>>101737689
yeah, that's from jart, well kinda, and quite a few things recommend disabling it
>>
File: file.png (132 KB, 602x519)
132 KB
132 KB PNG
>>101737689
>https://github.com/ggerganov/llama.cpp/pull/613
>>
Why does ollama uses the SSD so much before downloading a model?
I'm guessing it writes in the spot were it will store the model it is about to download but why?
It is very annoying with the bigger models and it takes a very long time.
>>
>>101737763
I assume it writes an empty file with the size of the actual model to make sure there's enough storage. May help with fragmentation, but i doubt that's the reason. I bet on the first option.

Also, lol, ollama.
>>
mistral large is good, it sends a shiver down my spine
>>
File: 1694610008560229.png (119 KB, 846x344)
119 KB
119 KB PNG
>>101733421
Facts about finetunes from the finetune itself.
>>
So last week was disappointing, are we going to get a Cohere release this week?
>>
>>101737763
Why are you using ollama? They basically obfuscate all files so that you stay in their ecosystem, same way with their API having more features than OAI compatible (all extra samplers only in their API).
>>
File: 1721239395352568.jpg (106 KB, 486x758)
106 KB
106 KB JPG
>>101737801
Stopping everything and forcing the girl you've been sloppy topping to write a recipe for chicken wings or an entire python program is my favorite shit, it's so funny.
>>
>>101737796
>>101737813
It just works, until it doesn't.
That problem is the reason I'm going to stop using it, or at least use it less often.
I'm just a lazy guy
>>
>>101737841
I can't ever recommend ollama to novice because they use terrible default, using q4_0 quant is stupid, defaulting to 2048 context window is stupid, having to rebuild a model to change parameters is stupid. Also, just importing a GGUF is annoying, it copy the entirety of the model into those obfuscated blob for example. Hell, they don't even respect xdg dir in linux and put everything in your home dir.
>>
>>101737911
It's kind of funny. The things people are "supposed" to use like ollama and vllm are all garbage.
>>
>>101738031
What makes vllm garbage?
>>
>>101737911
Well, I don't understand half of what you said, so that's the reason why I use it.
I'm building llama.cpp as we speak
>>
Is nemo the best RP model? I'm having 500-post long sessions with 0 tard wrangling and default sys promt, vs usual 20/30 with other models.
>>
>>101738031
>>101738055
vllm in comparison is quite good, at least it's not just a wrapper around another project. It have really good batching performance, major drawbacks are that it's not made for personal usage, quant took a long time to be implemented for example, all the performance improvements compared to just using pytorch is in bathching, also it's horrible to compile.
>>
>>101733421
>>101737803
I'm trying it and it seems pretty cool so far.
It replies to requests with comprehensive but concise responses and it seems as intelligent as the official instruct.
It did fall into a repetition trap at one point however.
Now to see if it can get in and out of roleplaying without getting retarded like Celeste 1.9 (1.6 works fine).
>>
>>101734153
>pic
>current month
>retard doesnt know what DRY is
should be an insta ban
>>
holy crap, does it know?
>>
>>101734976
i really cant imagine how much of a retard do you have to be to pretend to be retarded online 24/7 every day, just for attention what a grim existance, hope the basilisk nukes the entire world
>>
>>101738135
>doesnt know what DRY is
A reddit meme?
https://www.reddit.com/user/-p-e-w-/
>DRY author here
https://old.reddit.com/r/LocalLLaMA/comments/1ej1zrl/try_these_settings_for_llama_31_for_longer_or/lgbjtox/
>>
>>101738169
>guilt by association into an adhom
lowest of the low iq
>>
>>101737807
Memes aside, they never said anything about a new model since R+. There's some mentions here and there of raising money and partnering with other companies.
Two beta features that I'm aware of came out within past month is JSON output for /v1/chat API, and "Prompt Tuner" in Cohere Dashboard that lets you input a prompt with variables and define criteria and it will iterate through variations of the prompt to find which produces best response.
And a variation of their Rerank model that gooners also don't care about.
They seem more focused on corporate features and support than fighting for the position of the next greatest models.
>>
>>101738169
>a reddit meme
Bro has been raw dogging new models without DRY
>>
>>101738222
Yup and getting no repetition regardless, insane right? Crazy to think decent models don't need band aids to work at all
>>
*pours water on your sampling* Heh... not so dry anymore now, huh?
>>
>>101738232
Every open source model past llama1 has terrible repetition problems
>>
>>101738149
>i burst into laughter at my funny joke
based user
>>
>>101738258
Works on my machine, mistrals do repeat like mad, others don't, at all.
>>
have computers entered the gpu age where gpus are more important than cpus for the next decades?
>>
>>101738096
I had to ask it to respond in alphabetical order in order to make it not go into a literal endless loop.
>>
>Why is nobody taking about InternLM 2.5 20B?
>This model beats Gemma 2 27B and comes really close to Llama 3.1 70B in a bunch of benchmarks. 64.7 on MATH 0 shot is absolutely insane, 3.5 Sonnet has just 71.1. And with 8bit quants, you should be able to fit it on a 4090.
>Vibe check puts it in range of Llama 3 70B for me
>https://huggingface.co/internlm/internlm2_5-20b-chat
>https://www.reddit.com/r/LocalLLaMA/comments/1ekr75a/why_is_nobody_taking_about_internlm_25_20b/
>>
Jart is the primary mikuposter.
>>
>>101737333
Checked. Are any of them capable of acting like a normal person, ideally which recall basic facts you've mentioned?
>>
AI friend
>>
>>101738272
Even llama2 was repetitive. I remember the guy who made chub.ai coming here and complaining about it
>>
>>101738273
No, because consoomers will basically be guaranteed to be CPUmaxxing for eons to come because Nvidia has such a fucking tight grip on secondhand server GPUs now and refuses to make anything of reasonable size for less than 10k dollars.

Either RAM is going to get a LOT better, or this is gonna shunt us into the era of specially made, M2-esque systems that are kind of a mix of both and have way more contact with memory.
>>
>>101738304
>>101738258
Do you think llama 1 era models didn't repeat a ton because they were really retarded?
>>
>>101738304
Worked for me? What can I say except cope sampling issues? You're probably causing the repetitions by using forty different samplers at once.
>>
>>101738309
nvidia announced a 3 month delay on their b whatever to replace the h100.

:^)

Or, sales prospects are dysmal and they are collectively trying to save the Harris campaign.
>>
>>101738135
It's not worth it, it seems to make it dumber.
>>
I like the idea of dry, and it does work fairly well, but I've also noticed the model gets around it by doing shit like fucking up the tenses of the verbs it uses in its slop phrases so that it has an excuse to write them again, or otherwise writing increasingly-incoherent variations on the same slop instead of actually becoming more original
>>
>>101735890
>Experimenting with heavy Top A and minor Typical P on Dolphin MIxtral 2.5, and moving them up in the sampler order accordingly. Min P 0.05, DynTemp 0.4-2.35, Smoothing 0.24, Mirostat 2 5 0.95.
Sampler soup leads to:
>>101736590
>...meeting his gaze with a mixture of x and y
>(eyes shining) (with a mixture of x and y) (again)
>her voice trembled, (eyes shining) (again)
>her body trembled (nice repetition there mixtral
>also two times pussy aching in pleasure
Samplers are cope
>but my guess is that it is something that was useful in the early days when base models used to fall in repetition loops quite easily. Today, there is almost 0 reasons to use it. So probably it is not worth investing in it
https://github.com/ggerganov/llama.cpp/pull/5561#issuecomment-1951389775
>Is this the base model or the instruct model? My experience with the instruct model is that it never enters repetition loops with temp 0 and all repetition penalties disabled.
https://github.com/ggerganov/llama.cpp/pull/5561#issuecomment-1951874469
>>
>>101738282
I'll check it out.
Thanks mr shill
>>
>>101738343
0.8 multiplier as recommended doesnt fuck up neither nemo 13b nor largestral 2
>>
Greetings Gentlemen. I´ve been away from this world for a couple of years. What is the best LLM to run over an old Asus Ryzen 7, GTX 1650 (4GB VRAM), 8GB Ram? Is there anything worth running? Should I stick to Claude/ChatGPT? I really don´t like sharing personal data with these cloud services.
>>
>>101738284
I would be a mikuposter too if I could run flux
>>
>>101738468
Yes it does, you think it just has zero drawbacks?
>>
>>101735890
>Mirostat 2 5 0.95.
>>98852913
>Inb4 "Mirostat disables all other samplers." Shut the fuck up, I don't care. Also shut the fuck up about accusations that I am "deliberately spreading misinformation" because I am not telling anyone that they are legally obligated to listen to me.

>>97888891
>Mirostat is known to make mixtral repeat / dumb. Also mirostat disables all those settings besides temp and rep pen.

erm.
>>
>>101738517
seems like you have other setthings that clash with it then, remove everything except minp 0.1, since no, it doesnt fuck up anything else, when it comes to creative writing of any kind that is, obviously you arent going to use it for an assistant and coding
>>
>>101738479
what's your use case?
>>
>>101738479
>What is the best LLM to run over an old Asus Ryzen 7, GTX 1650 (4GB VRAM), 8GB Ram?
https://huggingface.co/TheDrummer/Gemmasutra-Mini-2B-v1
Or
https://huggingface.co/concedo/KobbleSmall-2B
>>
>>101738558
I do not have other settings. You just haven't noticed the effect.
>>
>>101738595
just wait a few weeks, then DRY will be just like dynatemp, smooth sampling, etc
>>
>>101738258
this is so blatant lie I can only consider it low quality trolling
>>
>>101738620
claude and chatgpt are themselves repetitive.
>>
>>101738573
General idea bouncing on personal plans/topics.
Hopefully reading/talking to documents.
Research assistant (this is likely too much)

>>101738585
Thank you, I´ll look these up.
>>
>>101738282
>MMLU 73.5
Meh
>>
>be mistral model
>people spread lies that I repeat myself
>people spread lies that I repeat myself
>people spread lies that I repeat myself
>people spread lies that I repeat myself
>[/INST]
>>
>>101738282
>Limitations: Although we have made efforts to ensure the safety of the model during the training process and to encourage the model to generate text that complies with ethical and legal requirements
stopstopstop plz no more safety
>>
>>101738630
works on my machine
>>
>>101738595
>You just haven't noticed the effect.
feel free to post it nigger
>>
>>101738672
but its on clouds and stuff not your machine
>>
>>101738680
my uncle works in OpenAI
>>
>>101738657
>be retard
>cant post logs
>cant turn on dry
>cant run models larger than 30b above q4
every time
>>
>>101738691
sorry to hear that anon
>>
>>101738694
>be retard
>be retard
>be retard
>be retard
>be retard
>>
>>101738705
>pretending to be this retarded
youre lower iq than a tranny lmao
>>
>>101738694
>cant turn on dry
why won't people use my cope sampler reee
>>
File: 1700895471893331.png (346 KB, 1280x1344)
346 KB
346 KB PNG
>>101738705
>>101738657
>>101738620
you vill not use local models goy, you VILL send us all your data instead

incredible how companies shit themselves so hard at a single FOSS general on 4chan they have to spend money on acting retarded
>>
>>101738714
so organic, right? I see him in every thread recently
>>
File: 1457378512724.jpg (37 KB, 500x489)
37 KB
37 KB JPG
is there a local TTS model with voice cloning yet?
>>
>>101738772
feel free to post a single log comparison where dry doesnt work for you tranny, keep dilating
>>
>>101738753
>say that models from one particular corpo are repetitive
>HURR DURR WHY YOU HATE LOCAL, MUST BE SAM ALTMAN IN HIS OWN LIZARD FORM
mental illness
>>
You're trying way too hard to fit in -p-e-w-
>>
>>101738815
>no pic
didnt read troony, keep seethin
>>
>>101732405
real women are the same
>>
>>101738782
RVC
>>
>>101738782
XTTS is the SOTA local TTS model as far as I know. You can completely finetune it and/or give reference audio. But having tried to finetune a model on a game character voiceline, the result is not great, you can recognize the character but you can tell that it is TTS.
>>
Does fine-tuning Llama 3.1 (with a LoRA) degrade its ability to be prompted?

or fine tuning models in general
>>
Are the models limited by ram speed?
If so does that mean I could run a 400b model just as fast (or rather slow lol) as a 30b model if I have enough ram?
>>
>>101738923
>or fine tuning models in general
see
>>101735383
>>
>>101738938
>If so does that mean I could run a 400b model just as fast (or rather slow lol) as a 30b model if I have enough ram?
No, it has to do far more math too
>>
>>101737801
This is very stupid, and I love it. I'll probably try some of it.
>>
mistral large has me getting way too into the RP and writing multiple full paragraphs after a career of being a one-sentence lazy replier
>>
>>101738971
I see. So at certain point they start to get limited by the CPU rather than the memory speed?
>>
>>101739012
both really
>>
>>101739008
I'll only get worse from here, anon
Godspeed
>>
>>101739012
At one point or another, the entire model has to be moved from ram to cache and then to registers. The bigger the model, both in parameters and actual size, the more you have to move around. For cpu inference, ram speed is the biggest bottleneck.
>>
>>101738282
But how good is it at RP?
>>
File: 00746-6924931823.jpg (112 KB, 1024x1024)
112 KB
112 KB JPG
>>
>>101739065
According to Redditor
>

Refuses to generate content which is not appropriate for all users. But it's really good at answering enterprise resource planning questions, which is almost as hot, right? Right?
https://old.reddit.com/r/LocalLLaMA/comments/1ekr75a/why_is_nobody_taking_about_internlm_25_20b/lgmuw2h/
>>
>>101739089
useless then. Why would I use anything but claude 3.5 for work shit.
>>
>>101739089
>anon, why do you hate us?
>we just want to make the world a better place, so why are you angry at us
>we can't have kids access the model
>this is for the best
I hate western culture
>>
>>101732172
> Tranime pic
Lol, fucking loser
https://youtube.com/watch?v=bO-NaEj2dQ0
>>
>>101739165
And yes I know the model is chinese, but it was the US puritans who started this shit
>>
>>101738096
>>101738281
Well shit, scratch that. I was using mistral nemo format by accident when the correct one is chatml.
It still repeated entries when I asked ti to list some things, but not endlessly.
With the correct template, it's not bad! It's even proactively engaging with me (the user), asking questions, wanting to know more about what we are doing, etc.
Not bad at all.
Now to see how capable it is at switching from Game Master to NPC and back.
>>
>>101739181
>he uses a tranime site
Lol, fucking loser
>>
>>101739066
I like this Miku
>>
File: nottranime.png (165 KB, 1080x690)
165 KB
165 KB PNG
>>101739219
Doesn't seem like tranime to me saar
>>
https://old.reddit.com/r/LocalLLaMA/comments/1ekx1bi/going_to_commit_to_llama_and_mistral/
>Going to commit to Llama and Mistral
>submitted 17 minutes ago by migtissera
>Hey everyone, I'm thinking of only committing to Llama and Mistral models from now on. And even with those models, I'm putting the lower bound at 70B parameters. There's plenty of guys finetuning other models, but I feel like having a focus is needed right now to preserve the quality of models. What sizes and models do you usually run?

What a tragic loss for vramlets.
>>
>>101739262
Notice how "japanese animation" was the first ting mentioned.
No need to cope like this.
>>
Notice how it's one schizo that suddenly started screeching and calling other people trannies out of blue in a thread about technology.
>>
>>101739281
who?
>>
File: 1722038794823795.png (262 KB, 894x998)
262 KB
262 KB PNG
>>101739196
Decent enough as an assistant, considering its size. Rate the build.
>>
>>101739291
>>101739304
Samefag
>>
>>101739313
Tess 3 405B guy https://huggingface.co/migtissera/Tess-3-Llama-3.1-405B
>>
>>101739281
Lmg: a thread where discord users discuss reddit threads. A tragedy.
>>
>>101739325
Notice how both posts start with the same word.
>>
Notice how many faggots are in this thread
>>
File: sf.png (4 KB, 343x103)
4 KB
4 KB PNG
>>101739325
uh-huh
>>
>>101739323
>upgrade from r5 5600x to r9 3900x
eh
>upgrade gpu for more vram
from 3060 12gb to 3080 10gb
eh again
>>
>>101739332
>Nooo you can't discuss the fact a great model tuner decided to leave vramlets with undi and co.
>>
>>101739350
>" What is inspect element"?
>>
File: indeed.png (26 KB, 590x194)
26 KB
26 KB PNG
>>101739408
>>
File: s.png (5 KB, 384x84)
5 KB
5 KB PNG
>>101739408
>>
>>101739403
>a great model tuner
This is him, isn't it?
>>
File: thumbypoo.jpg (1.08 MB, 2048x2048)
1.08 MB
1.08 MB JPG
>talking
>>
>>101739408
you are inspected element, don't call me that
>>
Dont use textgenwebui
>>
File: 1696240785119525.png (180 KB, 929x397)
180 KB
180 KB PNG
>>101739357
Yeah, okay, it's kinda dumb.
>>
>>101739408
I am inspect element. Do you have something to say to me, buddy?
>>
>>101739624
I inspected your mother's elements, pal.
>>
>>101739574
It is a simulated woman so that's just as expected.
>>
>>101739747
>>101739747
>>101739747
>>
>>101738691
My dad is ceo of Nintendogs international (a subsidiary of Nintencats) and he said openai has no actual human workers. Therefore you should be banned for being a ROBOT.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.