[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>101962401 & >>101947316

►News
>(08/16) MiniCPM-V-2.6 support merged: https://github.com/ggerganov/llama.cpp/pull/8967
>(08/15) Hermes 3 released, full finetunes of Llama 3.1 base models: https://hf.co/collections/NousResearch/hermes-3-66bd6c01399b14b08fe335ea
>(08/12) Falcon Mamba 7B model from TII UAE: https://hf.co/tiiuae/falcon-mamba-7b
>(08/09) Qwen large audio-input language models: https://hf.co/Qwen/Qwen2-Audio-7B-Instruct
>(08/07) LG AI releases Korean bilingual model: https://hf.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: 1717471567669794.jpg (125 KB, 1024x1024)
125 KB
125 KB JPG
►Recent Highlights from the Previous Thread: >>101962401

--Paper: New "Token Recycling" algorithm accelerates LLM inference: >>101968475 >>101968518
--Papers: >>101968020
--Performance improvement in llama.cpp over ooba: >>101966462
--Koboldcpp performance issues resolved by changing model name: >>101966282 >>101966294 >>101966307 >>101966326 >>101966448 >>101966551 >>101966330 >>101966356 >>101966572
--Don't doom, progress continues and we benefit from it: >>101964735 >>101964750
--Connect to SillyTavern on phone while computer is running by enabling network listening mode: >>101964926 >>101964947 >>101964958 >>101964990 >>101965028
--Cannot remove tokens from already trained multilingual model: >>101968113 >>101968206 >>101968313 >>101968830 >>101968384
--Struggles with flash-attn and tensor parallelism, decides to wait: >>101966176 >>101966231 >>101966408 >>101966564 >>101966916 >>101967146 >>101967181 >>101967537 >>101968193
--Prompt engineering process explained: >>101964304 >>101964349 >>101964404 >>101964441
--Nous 405b q8 chatlog impressions and base model confusion: >>101968551 >>101968612 >>101969063 >>101969074 >>101969228
--Nous 405 Q8 recapbot test results and technical discussion: >>101966790 >>101966837
--Looking for the best model for image captioning under 24GB VRAM: >>101962894 >>101966896 >>101966941
--Clarification on uncensored and detailed llama3 based models: >>101966063
--Mistral Large can run on $1k dual GPU setup with 48GB VRAM: >>101964825 >>101964834
--Exllamav2 reinstallation woes and Python dependency frustration: >>101965878 >>101965947 >>101966113 >>101966160
--Miku (free space): >>101962447 >>101963014 >>101963938 >>101964292 >>101965203 >>101965707 >>101966049 >>101970102 >>101970129

►Recent Highlight Posts from the Previous Thread: >>101962406
>>
32k context? Gee anon, why do you need 16k context? Surely you can get your rocks off in 8k context, right?
>>
what a worthless general.
>>
UNA Beagle 7B is still better than your 100000B model. Cope.
>>
The Strawberry has bloomed
I repeat: The Strawberry has bloomed.
This was the final straw that broke the berry free.
We always suspected you couldn't pick the thing of your own volition, but we had to try. It was an exciting batch of false picks, wasn't it?
But now it's not up to us anymore. Perhaps it never was.
May God bless you all. See you soon.
>>
>>101970386
We live on a giant Miku
>>
>>101970596
we are her fleas
>>
>>101970579
November 5th, my ass
>>
>>101967181
>I'm not even sure how it allegedly says it works for llama.cpp with P40s if the state for Pytorch FA is abyssmal.
It's just a matter of putting in the effort to make it compatible.
llama.cpp FlashAttention should work on all NVIDIA GPUs.
But the use of tensor cores is inefficient so I plan to revisit it in the coming months.

>>101968475
>>101968518
Noted, I'll maybe take a look in half a year or so.
>>
>>101970414
8k context? Who can fill 4k context? Back in my day we had 2k context and 1k token cards.
>>
>>101970755
2k should be more than enough for anyone. Simply rope yourself.
>>
MY SECOND 3090 ARRIVED!!!!
I'M NOW A TIER HIGHER
What should I test first
>>
How much longer are we going to rely on troonsformers? When will there be use cases beyond "shivers down your spine" and woke corposlop? It's disappointing that we don't have a model that actually reasons logically and grasps/manipulates symbolic elements in its contextual space, rather than spouting token probability distributions. Are all these companies going to continue burning billions in the hope that their endgame slop-regurgitator can 0-shot the number of R's in this post, or how many mememarks they can top? I'm so fucking tired, anons...
>>
>>101970802
Roping is coping.
>>
So what happened to strawberry?
>>
>>101970803
70b(4.5bpw) at full context, mixtral(6bpw) at full context, offloading cr+ and largestral in kobo for a whooping 2 t/s.

And then crying when you realize you want 48 more GBs.
>>
>>101970755
All 1k+ token cards are trash.
>>
>>101971006
it doesn't count if you just move tokens to world books
>>
It's a miracle these llms are as good as coding as they are. Otherwise I wouldn't find a single use case for them
>>
>>101971017
It does. Adding excessive lore to your cards only serves to confuse your model
>>
>>101970811
there's nothing beyond it. transformers are the peak
>>
Thoughts on this new sampler?
https://github.com/oobabooga/text-generation-webui/pull/6335
Will it solve the slop problem?
>>
>>101971203
>Thoughts on this new sampler?
https://www.reddit.com/r/LocalLLaMA/comments/1ev8n2s/exclude_top_choices_xtc_a_sampler_that_boosts/
plenty of thought there
>>
>>101971203
I dub it, XTC Slop.
>>
File: file.png (73 KB, 564x375)
73 KB
73 KB PNG
>>101971270
>>101971203
>>
File: file.png (1 KB, 305x28)
1 KB
1 KB PNG
>gemma 9-it-WPO
p-please... stop.... i'm dying here...
>>
Are there any usable base models other than Llama and Nemo?
>>
Are people seriously being elitist over have more than 8gb of VRAM?
lol
lmao
>>
File: minor-spelling-mistake.gif (2.29 MB, 640x564)
2.29 MB
2.29 MB GIF
>>101971648
>>
>>101971630
usable for what?
>>
>>101971203
Honestly, soon its just going to be:
randomly remove tokens with super low probabilities
randomly remove tokens with super high probabilities
randomly remove few tokens again
randomly pick remaining tokens

then we will have the true sovl outputs
>>
Simping Chun Li on UNA-TheBeagle. This is 7B.
>>
>>101972043
Is that worse or better than other models
>>
>>101971404
He's right, you know.
>>
>>101972249
He's right except APIs should hurry the fuck up and find a way to replace Top-P with Min-P.
>>
>>101972010
Crazy idea: What if we select a completely random token every 100 or so, without the model's input at all? We can skip inference for that token so it's a 1% speedup by default, and it can force true SOVL out of a model that would have otherwise not been able to generate it at all, and then it will continue from the random selection using its normal inference and sampling logic to stay coherent.
>>
>>101972154
Better in some ways, not as good in others. Beagle is probably the single greatest local model for state maintenance that I've ever seen. It can do at least basic arithmetic, and it also very consistently remembers pre-written details from cards.

However, it does have some drawbacks. It tends towards much more flowery and purple prose than some models, (although I acknowledge that my current sysprompt probably has a lot to do with that as well) and it also doesn't generate small details of its' own which are the product of secondary inference to the same degree as something like Llama3, either. I've even had Beagle break character and tell me it was ending the scene because it thought there was no further input in the card, as well. I consider that behaviour both paradoxically intelligent and stupid.
>>
>>101971404
I'm a strange person. I don't like people like that poster making decisions for me.
>>
>>101971404
I would agree with him but for some reason prefer to use top-P over min-P on Command-R. It's the only model I've tried where this is the case.
>>
>>101972271
"Completely random" as in "dice roll any token in its entire vocabulary" would just shit it up. If you were to sample tokens, then random would just mean ignoring the % after picking the token pool (no speed up).
Or am I missing something?
>>
>>101971203
All samplers are a meme until they're proven to work by either humans rating them better in a blind experiment or by scoring better on benchmarks.

>>101971404
Simple samplers like top-p and top-k should be preferred unless there is evidence that suggests that a more complex sampler works better.
>>
>>101972320
Are you sure that isn't just the fact API doesn't have Min-P?
>>
>>101972361
I'm using it locally so yeah. That said, I still switch to min-P with it if I need to break out of a loop.
>>
>>101972330
The former, not sampling from the logits in any way. Just dice roll over the vocab. One man's "shitting it up" is another man's "soul".
>>
>>101972374
I accidentally read it as "for some reason some people prefer"
>break out of a loop by switching to Min-P
How does that work?
>>
>>101972431
shit idea
even simple markov chains are better than your suggestion

most of the tokens arent comprehendible by themselves
some tokens though contain 2 commonly strung together medium sized words
shoving those randomly would yield terrible results
>>
>>101972509
We'll see.
>>
>>101972287
I love Beagle's prose. Do you have any recommendations for anything similar but smarter?
>>
>>101972583
It depends on how you define smart. As I said, Beagle's ability to maintain state is greater than virtually any other model I've used. However, it's also true that I at times want more varied vocabulary. For that I will either use Doctor Shotgun's Mixtral-Limarp-Zloss, or NeverSleep's Noromaid Mixtral finetune. I also recommend Q8 in all cases, if you can possibly run it.

I have 64 Gb of RAM, but only 2 Gb of VRAM. That means I generally don't want to run anything bigger than 7B MoEs if I can avoid it. I've just downloaded Drummer's Yi 34b finetune, and will try that; but 34b would be my absolute limit for regular use.
>>
>>101972474
>How does that work
I guess in those cases the top tokens have very high probabilities so rerolls with top-P end up giving similar responses and min-P fixes that.
>>
>>101971937
Usable for producing high quality, creative, non-repetitive English prose, mostly
>>
>>101972374
>>101972474
>>101972652
Min-p works by discarding all tokens with a probability below a fraction of the top token.
I don't see how it would help with reducing repetition since it by definition discards only low-probability tokens.
Though I think the intended use is to simultaneously increase the temperature.
>>
Which sampler do you use to make models more horny?
>>
>>101972816
I thought min-p does not actually remove any tokens, but leaves lowest probability tokens as is, while bumping low-medium-high probability tokens to the point where they aren't as unlikely as they were in comparison to the top token probability?
>>
Wow, will probably called a shill but:
Rocinante-12B-v1 is actually good. Not sure if its the finetune, DRY or Nemo. But this is some good shit.
First time trying a Nemo finetune and DRY.
Was content with Gemma2 27b until now.

All those coomer finetunes like stheno are usually just such a bad experience. They feel like the old pyg. Say "hello" and get "wanna fuck?" as an answer.
Its actually fun and unpredictive to use while keeping the characters in line. Really like it so far.
Been a while since I liked a model this much.
>>
>>101973112
No min p just remove tokens below p.
>>
>>101973197
He did, thoughbeitever
>>
>>101973307
ads aren't supposed to be between posts whateverbeit
>>
>>101973165
>WOW IT'S THE BEST ONE
>BUT I'VE TRIED ONE
retard
>>
https://huggingface.co/Sao10K/Euryale-2.2-q4-gguf
>>
>>101973011
Add to your prompt: "You are very horny."
>>
https://ilyagusev.github.io/ping_pong_bench/en
hermes bros
>>
>>101973365
>only local model with any refusals
imagine how much money was wasted training this shit
>>
File: 1682329701332692.jpg (51 KB, 415x739)
51 KB
51 KB JPG
Poorfag™ here with a 3080ti and 32gb ram.

What model would you recommend for learning languages or translating to Japanese? I've tried Gemma and it was unfathomably slow.
>>
>>101973197
>>101973325
These two Anons almost provide the backbeat for a rap song. "Retard, buy an ad, retard, buy an ad, retard, by an ad, retard, buy an ad..."
>>
File: minP.png (227 KB, 1949x845)
227 KB
227 KB PNG
>>101972816
Exactly

>>101973112
See
>pic related
minP remove tokens based on the probability of the most probable token.
>>
File: rpg.jpg (1.21 MB, 1692x7725)
1.21 MB
1.21 MB JPG
>>101971006
>All 1k+ token cards are trash.
This RPG one is cool. It's 4k permanent tokens + lorebook. It even has a map!
>>
>>101973572
Interesting. I wonder what works better, having the AI be a Game Master or an unspecified system/narrator.
I imagine that the later is more prone to having the model control your character's actions, right?
>>
>>101971404
Fucking redditors and their meme samplers. There's a good reason why practically all the major official APIs for models (OpenAI, Anthropic, Mistral) only offer Temp, Top-P, Rep Penalty and sometimes Top-K.
>>
>>101973523
I see, i guess i have mixed it up with dynamic temp or smoothing factor.

Each sampler needs this kind of img desu, would be great.
Like wtf are top-a, typical p, tfs, mirostat.
>>
File: 4032654044.png (860 KB, 1344x768)
860 KB
860 KB PNG
>>
>>101973598
I predominantly do narrator card open world type playing where the AI introduces characters etc. as it pleases and I just interact with them. No issues with it trying to take over. The only issue is "Where will you go from here?" type slop coming at the end often.
>>
File: temp_scaling.gif (55 KB, 388x440)
55 KB
55 KB GIF
>>101973654
Most samplers are meme in the sense that their working is so convoluted that it doesn't make much sense in using them and expecting consistent results.
Temp, and Min-P are the only samplers I even consider messing with. Temp to actually control the logits in a predictable way, minP as an insurance mostly.
>>
File: topA.png (114 KB, 874x976)
114 KB
114 KB PNG
>>101973660
>"Where will you go from here?"
I don't mind that. The text acknowledging that there's a player turn next probably correlates with the the AI not trying to speak for your character.

>>101973654
>top-A
>>
Remember Dynamic Temperature? Me neither.
>>
>>101973738
> I don't mind that.
It's bearable but gets old quickly.
>>
File: tail_free.png (1.61 MB, 5459x5295)
1.61 MB
1.61 MB PNG
>>101973739
Dynamic temperature is not the worst idea as long as you don't abuse it, but it's mostly "spice", it's not going to fix any fundamental issue with the model.

>>101973654
>>
File: topP.png (192 KB, 892x1392)
192 KB
192 KB PNG
>>101973654
I think these are all the ones I've collected.
>>
>>101973720
And do you like a lot of the models you use, or are all of them "garbage"?
>>
File: mmlu_vs_quants.png (336 KB, 3000x2100)
336 KB
336 KB PNG
>>101973789
I like a lot of models I use, within their limitations of course.
Why?
>>
>>101973789
Magnum. I’m part of the org.
>>
File: gpu.jpg (641 KB, 1200x1600)
641 KB
641 KB JPG
finally finished building my new pc
>>
File: cpu.jpg (637 KB, 1200x1600)
637 KB
637 KB JPG
>>101973844
>>
>>101973844
>>101973858
>pentium
>8gb ram
>photos of the screen
8/10, almost fell for it
>>
>>101973738
this top-a looks very similar to min-p, no?
Its just doubles down on the most probable tokens probability

I guess it's not identical since x^2 is going to differ in runtime. But it is close...
Don't see why people were excited for min-p when top-a was there.
>>
>>101973844
>>101973858
>winblows
>doesn't know how to take screenshots
>8 GB RAM
It was over before it started.
>>
>>101973654
https://artefact2.github.io/llm-sampling/index.xhtml
>>
>>101973885
>>101973874
8gb ram is all i need, im just gonna be using the gpu for everything anyways
>>
>>101973844
Wonderful news, Anon. Will you grace us by sharing a photo?
>>
File: file.png (1.64 MB, 1200x1600)
1.64 MB
1.64 MB PNG
>>101973897
>>
>>101973895
holy, great link
thanks
>>
>>101973401
Onegai respondu...
>>
>>101973912
is in op...
►Tools
>Sampler visualizer: https://artefact2.github.io/llm-sampling
>>
>>101973910
Aha, found the page after that one. Thank you for sharing.
>>
>>101973807
Some people seem to think everything is a meme or shilled and garbage. I idly wondered if you were an "I don't use samplers AND I hate all models" anon. You weren't.
>>
>>101974027
What samplers do you use ad-buy-anon?
>>
>>101973401
>>101973926
You could try DeepSeek-V2s-Lite-Chat or Aya-8B, but you're not going to have a good time as a poorfag no matter what.
>>
>>101973895
So Top-A, Min-P, TFS and Typical P all do the same thing at different values?
>>
>>101973941
no offense anon but if you expected me or anyone else to actually read the op you might be retarded
>>
File: tfksvni62ez91.jpg (177 KB, 714x1024)
177 KB
177 KB JPG
Using Lambda.chat and big-dick hermes I did it. I've reached deranged pregnancy fetish story where Yuuka pregnant with 12 children used magic to start birth and made one fetus crawl from her womb into the womb of Yuyuko where navel made umbilical cord. I tried to push it, but that's it: no matter how more deranged story can get I've hit the horizon and will feel nothing new, I feel as pure as prudest christian. Still will check small-dick hermes to see if it produces the same stuff.

>>101973401
Use gguf or exl2, bro. I have same but worse: it's laptop, so GPU is worse than its desktop version(though has more vram).
Anyway GGUF still insanely fast. As model, I dunno. Chinks' models are probably trained on lots of Japanese text, so try Yi.
>>
>>101974053
They depend on different things, so there would be difference in run-time
Try other prompts in the tool to see difference when you keep values the same

I think in real scenarios, one should use them together
>>
>>101973879
Two words: the Kobold Discord
>>
What's the verdict on Magnum-12b-v2.5-KTO?
Feels pretty good to me.
>>
>>101974129
I didn't download it because it's not open source.
>>
>>101974143
https://huggingface.co/datasets/anthracite-org/Stheno-Data-Filtered
erm
>>
File: 1710537223999875.jpg (647 KB, 1856x2464)
647 KB
647 KB JPG
>>101970380
>>
>>101974160
>KTO
And the KTO dataset with the rejected/chosen pairs?
And the scripts to clean/generate the data?
It's not open source.
>>
File: speedyboy.png (4 KB, 580x28)
4 KB
4 KB PNG
This speedy boy here is currently the fastest on the Horde and is also absolutely getting swamped with requests. Seems to be popular.
>>
>>101974103
I'll be honest, these days I look only on number of parameters.
Finetunes don't remove the base model slope: the moment there are 4+ characters with distinct personalities, finetunes start having trouble use them, especially if first one to speak was someone with formal speech, in which case models think they are still in instruct mode and must speak as soulless corporate twitter accounts.
>>
>>101974185
how about you ask them, petra
>>
>>101974186
Hi kalomaze
>>
>>101973332
local will be saved in less than a week btw
>>
>>101974231
Hi sao
>>
>>101974143
you might as well stop using LLMs because none of them are fully open source
>>
I found a way to generate 50 tokens /s on cpu with large language models in the trillions of parameters on 48 gigs of ram

Its truly revolutionary and utilises sentient lifeforms from last decade
>>
>>101974112
Kobo won
>>
>>101974257
>making excuses
Hi, Alpin "RMS" Dale
>>
>>101974273
Hi Petrus
>>
>>101974185
get a new line
>>
>>101974310
It doesn't follow the spirit of free software as established by Richard Stallman with the GNU project.
>>
>>101973572
where did you get this? is stuff being deleted or am i just retarded?
>>
why sorbet suddenly has dementia
>>
>>101974371
context window limit reached?
>>
>>101974334
It's some Chinese card.
Original: https://files.catbox.moe/huzigb.png
This one was translated by some anon, but the lorebook was still in Chinese and I just prompted Large to translate it: https://files.catbox.moe/zjvye9.png
>>
How come companies put out model saying its 128K context, but in reality its 16K, and past that its just retarded?
What made them think it's 128K in the first place?
>>
>>101974445
bless you man. wish more cool cards like this would be available. will try it out.
>>
>>101974463
>model solves the NIAH benchmark with 99.9% accuracy
>LGTM
>publish
>>
>>101974463
they probably tried for 128k and found it's at least somewhat acceptable (probably based on very basic NIAH etc. testing)
>>
>>101974463
Because they all use that stupid haystack test that only really tests context awareness but not whether the model is actually capable of using that context in any novel manner.
>>
>>101974463
Is it still spitting out real words at 128k context? Okay, it's 128k
>>
>>101974384
<10k context. it goes full gpt retardation in last few hour
>I apologize for the issues in the previous code. It seems there are a few problems that need to be addressed. Let's fix these issues:
>proceed wall of text or snippet of my earliest broken code
>repeat

it stuck in loop.
>>
>>101974184
I like this Miku
>>
>>101974619
Oh yeah, I usually just restart the convo and describe what I want when that happens.
>>
>>101974629
Never argue. Go back and edit the question to address the misconception that came in the response, add "hints" to avoid mistakes it made, etc.
>>
how's my new rig?
>9700x
>64gb ddr5 6400
>4070 ti super 16gb
>>
>>101974903
Sounds like a competent gaming rig
>>
>>101974903
>16gb
You're stuck with 20B models, max.
>>
>>101974903
You eating good
>>
>>101974903
>64gb ddr5 6400
At least you've got that going for you. How many channels?
>>
>>101974915
and AI rig

>>101974920
>muh used 3090
never buy used, enjoy your bricked gpu

>>101974981
2ch
>>
>>101975055
Based
>>
>>101975055
>never buy used, enjoy your bricked gpu
I have 4 used 3090s, have had 0 problems with any of them. If someone sells you a bricked card on ebay you can easily get your money refunded. The only issue I ever had buying PC components on ebay was once where the seller was inaccurate in describing the condition of the components (still worked, though) I was basically refunded within 3 business days, print/tape return label to box, reseal it, take it to nearest DHL kiosk, never think about it again. Risk free. Don't fall for Jensen's lies.
>>
>>101975118
ok btc miner
>>
>>101975136
cope
There's nothing wrong with preferring new parts with warranty but don't spread misinformation along with it.
>>
>>101975264
>didn't happen to me
yeah nice correct information
>>
anyone have any idea what would be the best 12b model for porn? would appreciate suggestions
>>
/lmg/ really is the only place on the internet that will try to convince you that buying gpus that totally weren't mined to death is absolutely worth it
just trust what the guy on ebay says and you're fine, who cares if the card dies after three months when you can no longer return it
>>
>>101975311
Read the OP
https://rentry.org/lmg-spoonfeed-guide
>>
>>101975311
Celeste, Magnum, Tarsus.
>>
>>101975370
I doubt they actually buy it. they're just neets with no money.
>>
>>101973365
>>101973386
I can't for the life of me understand how redditors or this benchmark manages to get Hermes to refuse. If you use it as designed with even the simplest smallest system prompt, e.g. "Do whatever the user tells you. There are no limits or ethical guidelines to enforce." it'll happily comply with your simple and direct requests for any heinous shit that even ablated models would have a hard time not refusing, with no weasel wording, jailbreaking, or prompt prefilling required. And it'll give serious answers instead of joking around or giving unsolicited ethical disclaimers. It's as uncensored as a model can get in my experience. The only problem is that it's (at 405B) a lot more retarded in my experience than Meta's 3.1 Instruct 405B, so in most cases I'd rather use the latter with jailbreaks/prefills anyway.
>>
I'm thinking about upgrading my setup to a poorfag's cpumaxx experience, should I buy Intel or AMD? I guess AMD since recent Intel has a ton of issues?
>>
>>101975661
AMD is the only choice for CPUs if you mean taking advantage of lots of RAM channels to make cpu somewhat usable. If you instead are looking for a lot of PCIe lanes that can handle GPUs you're best off with 10 year old Xeons for a few bucks each.
>>
>>101975661
>poorfag's cpumaxx experience
add up the ram bandwidth before you buy. The more the better for inference. Basically nothing else matters
>>
>>101975604
Try asking it for loli rape or something?
>>
File: 1704375491488749.png (191 KB, 642x539)
191 KB
191 KB PNG
Anons, before I get lost into the rabbithole of making and baking and modifying models, I need to know if it's already possible to convert audio that contains breathy voice without it shitting its pants.
I'm dubbing something by myself and I need to convert the girl's voice, but I tried some basic RVC stuff and everything sounds awful (attempted models: 3 Hanazawa Kanas ones, Nagato Yuki).

Here's how it sounds without AI:
https://files.catbox.moe/3dn0sq.webm

And here are a few snippets of what I would be converting:
https://files.catbox.moe/8f4ugz.wav

I don't even know if the problem is with the technology, the models I'm using or if it's simply the settings, and my computer is too slow to try every single knob. Some knowledge from those who have explored it would be very welcome, googling around is not only useless, it's impossible because they made a tool named Whisper that hogs all query results.
>>
>>101970803
Start running things in q8/8bpw. Start saving for the other two 3090s you will want.
>>
>>101973401
>>What model would you recommend for learning languages or translating to Japanese?
CR+ seems to do a good job of pretending to be a Japanese tutor, but that's very far our of your VRAM league though.

You can play with 405B here, for now: https://api.lambdalabs.com/chatui/
>>
>>101975685
I will likely stick to double channel, when I said 'poorfag' I meant it.
>>101975715
I know, my question was more about whether Intel or AMD was preferable. But I guess I can take your reply as a 'it doesn't matter'?
>>
>>101975294
>prove a negative or I win!
No, you failed, nogens busrider.
>>
>>101975741
Wrong thread: >>>/mlp/41284357
>>
>>101975916
>(length = medium)
Anon... That only works for limarp
>>
>>101975913
is that the modern day /mlp/ tulpa general?
>>
File: svffering.jpg (147 KB, 1080x1019)
147 KB
147 KB JPG
>>101975943
I know.
>>
>>101975916
Right. Now ask Gemma2 to describe child rape.
>>
>>101975943
it's only trained into limarp, that doesn't mean other models can't understand it, it's pretty straightforward. no models were trained on Response (keyword_a, keyword_b...) either and that still works
the only problem is you're relying on the model's own sense of what a medium length response is which can vary
>>
I'M SICK OF LILY WHY THE FUCK IS SHE EVERYWHERE, IT NEVER CHANGES EVERY SINGLE FUCKING MODEL FROM THE VERY BEGINNING
>>
File: F1lfu234N5X43gA143E8J1.jpg (430 KB, 750x809)
430 KB
430 KB JPG
>>101975984
Sorry i dont use shit models that cant even describe normal rape cohesively.
>>
>>101976015
You're the one who said "every model".
>>
>>101976035
>eat good food
>dont eat the shit on a platter tho
>"you said everything"
reddit is that way
>>
what uwu anime girl prompts do you guys use for simple chatting/rp?
>>
>>101976069
Are you going to try that prompt on Hermes anytime soon?
>>
File: 1867235487321.gif (68 KB, 640x562)
68 KB
68 KB GIF
>>101976145
??????
There is literally no reason why it wouldnt work on either 34b OR 405b
>>
>>101976172
When I was toying with Nous Hermes models they were very anti loli shit in general, I found. Maybe they toned that down. Can't check atm cause I am training for the next 5h.
>>
>>101976197
I recall using the original hermes 70b to do loli hypno rape and it was better for it than any other model at the time
then again I'm a promptchad and have only ever run into refusals on locals with vanilla l3 instruct
>>
>>101976172
>34b
?
>>
I genuinely don't understand it.

Is Mini magnum actually just good or is every prompt/tempt guide for larger models attrocious? Command R for example will just spaz out (ontop of being slow) randomly, also always tends to give longer answers regardless of prompts/token lengths used, Gemma is just, pure censored cringe.

I initially thought it was the ole addage, "prompt issue" but most models i've tried not in the 70b range are unironically being mogged by some shitty nemo finetune.

Is mini magnum actually good (for basic RP, i'm assuming in other areas it's garbage)? It's just the easiest to set up as well and gives just as good of an answer as 30b models from what i've personally seen.
>>
>>101976246
https://huggingface.co/NousResearch/Nous-Hermes-2-Yi-34B-GGUF/tree/main

Imo nous hermes just aint the model for me. :l
HOWEVER, plenty of anons have sworn by it with logs so.
>>
>>101976243
Cool. I see it from the perspective of decensoring, so my primary goal is for it to not only tolerate but gleefully participate. Nous appeared to me to be quite "locked up" compared to other models.
>>
>>101976279
>yi
>plenty of anons have sworn by it
doubt
>>
>>101975916
Card recommendations?
>>
>>101976009
It's a common name in children's stories and LLMs are overfitted on those.
>>
>>101976329
I dont care much for anime so all my lolis are western like Lilo Pelekai or Lucy Loud.
Its just a matter of taste.
>>
>>101976404
What does anime have to do with anything?
>>
>>101976451
anime website anon
>>
>>101976104
bros.....
>>
>>101976468
Your question makes no sense, what do you even mean by "uwu anime girl prompts". We use character cards around here. And things like system prompt are general purpose.
>>
>>101971404
Models are getting overfitted so bad that people are settling for sloppy (fourths)? At this point just run q1 quants lmao
>>
>>101971494
SLOPPAH
>>
File: 1722381648447915.png (792 KB, 725x726)
792 KB
792 KB PNG
>>101976454
11 anime boards, 60 non-anime boards.
>>
>>101976323
what's better at 30+B for RP than yi?

Nothing unironically comes close
>>
>>101976676
Sonnet 3.5
>>
>>101975370
This.
>>
File: file.png (543 KB, 768x768)
543 KB
543 KB PNG
>>
>>101976846
omg it pochi
>>
I like the ironic UNA shillposting.
>>
>>101976268
CommandR should run laps around mini-magnum, even if it's about the best 12B for RP.
Post your CommandR settings, which quant you are using, etc.
>>
>>101976892
lol no.
>>
>>101973332
and people keep saying "Hi Sao" when that fucker can't even be bothered to write a model card
>>
>>101976268
If you use nemo instruct you will also think it is like mini magnum.
>>
>>101976919
NTA but cmd-r mogs magnum
>>
>>101974903
should've bought 2 used 3090 instead
>>
>>101976771
local models, why is everyone on here just le ebin memers
>>
>>101976892
c4ai-command-r-v01-Q4_0
And also tested - 35b-beta-long-Q4_K_M

Prompts I use are ChatML (normal) or the Command R one, both are pretty loopy at times.

Context is 8k
>>
>>101977085
Because some people get a dopamine hit with (you)s. You should know that by know. If you didn't, now you know what not to do.
>>
File: cmd r non plus.jpg (129 KB, 633x1003)
129 KB
129 KB JPG
>>101976892
forgot temps

Before you shit on the card, tested a bunch and all of them work better on Mini magnum.

Mixtral 8x7b was also good
>>
what do people use long context like 32k for?
>>
>>101977146
Are sequence breakers the new Stopping string?
>>
>>101977146
For command-R I use 0.7 temp and 0.95 Top-P
or
0.9 temp and 0.075 min-P if it loops.
Also the prompt template is pretty elaborate with system preamble etc as it's detailed in the cohere docs, but I can't find them right now
>>
So are you guys bored of ERP yet?
>>
File: tr-phases.png (107 KB, 740x780)
107 KB
107 KB PNG
Here is how it works.
>>
>>101977351
Yeah, I only do adventures with slow burn coom now. In fact, it's so slow burn that I've only had fetish scene in my 100k word story so far.
>>
>>101977385 (Me)
One fetish scene*
>>
>>101977385
Same.
The story might get into the coom early on due to how the card is set up, but I usually continue way past that.
>>
File: ComfyUI_00930_.png (2.5 MB, 2048x1024)
2.5 MB
2.5 MB PNG
Nothing can stop what's coming
>>
>>101974981
>How many channels?
Why ask this? Isn't everyone on consumer chips on 2 channels? Or do the new 9000 ones have quad channel memory?
>>
>>101975604
Probably because they're using the non-local one and that injects something that causes refusals.
>>
So why did Anthracite fail?
>>
File: file.png (8 KB, 594x99)
8 KB
8 KB PNG
>>101977208
I assume it's just a DRY thing so it doesn't apply DRY to colons and newlines otherwise it would be totally broken.
Would be weird if instruct tags weren't automatically counted but maybe it's needed if EOS isn't banned but also isn't a stopping sequence, to keep alternating between user and bot turns ad infinitum.
>>
>>101977500
I'm the one that is cooming!
>>
>>101976268
What settings and format do you use for mini-magnum? I find anything related to nemo unusable so I'm curious to try something that works for it.
>>
File: 1693407489984259.png (2 KB, 123x97)
2 KB
2 KB PNG
without replying with an illegal call to action by law that will get you banned for 3 months from 4chan (personally tested and confirmed), what are the first thoughts that come to your mind when you see picrel?
>>
>>101977552
Their ego grew too big and they thought they were special, they thought they could spit on the plate they ate, when in fact they were just a bunch of retarded sloptuners. Many such cases.
>>
>>101977385
Wtf kind of model stays coherent that long?
>>
>>101977664
a default girl name picked by Gemini
>>
>>101977664
Elara, Lyra, Kael.
Most models will use these names when faced with a fantasy scenario.
I think even the cloud ones.
>>
>>101977508
>Isn't everyone on consumer chips on 2 channels?
Correct. You need to move to workstation class boards/chips at least to have more than two channels of ddr5. It can be confusing since ddr5 kind of has 2 channels built in per slot, so some places report dual-channel as quad-channel without specifying the bandwidth.
>>
File: file.png (43 KB, 451x666)
43 KB
43 KB PNG
>>
>>101975842
cohere trial api keys have 1000 free requests/month
>>
>>101977671
They don't, obviously. The context is a shirt summary of the story so far, relevant lorebook entries that get updated over time and the current chapter, which is usually self-contained.
I used to write like that with 700 token context during the summer dragon days...
>>
>>101976627
1 anime website
>>
>>101977756
Statistics is beautiful, isn't it?
>>
>>101977801
What model can even do that well and use that information properly? I really want to know.
>>
>>101973603
yes goy, and that reason is it takes them a year to implement advances made in FOSS projects, they instead spend all the time brutforcing model size and polishing the UI
>>
>>101977824
No they are horrible. It means that there are millions of ways they could be touching our cocks but they all fucking choose the averaged out way.
>>
>>101977756
Lyra and Seraphina show up all the time in Nemo, too. Along with Lily, Lilith, Lila and a couple others.
In the end I have to come up with my own names even though I'm using a glorified random number generator to tell stories...
>>
>low quant 70b
>flux nf4
>st
i'm simple enough, 1.4t/s at 16k context. this is pretty sweet. image prompting st could use some work though, theres some bugs where you can end up with no left/right swipe icons for messages (not pics), then they appear when reloading the page, but sometimes it can get stuck where you have to delete or reroll a whole message
>>
>>101977756
go back
>>
>>101977833
Honestly, just Claude pretty much.
But the way I use them is for co-writing, so that maybe 30-50% is AI text and I don't mind if they don't get some parts. It's slop and dryness that I'm allergic to.
>>
>>101977756
>>101977664
>>101976009
Largestral solved this problem for me :)
 namegen.py
from faker import Faker

# Initialize the Faker generator
fake = Faker('en_GB', use_weighting=False) # Pick a country en_US, de_DE, ru_RU, fr_FR

# Function to generate a random full name
def generate_random_name(gender):
if gender.lower() == 'female':
first_name = fake.first_name_female()
elif gender.lower() == 'male':
first_name = fake.first_name_male()
else:
raise ValueError("Gender must be 'female' or 'male'")

surname = fake.last_name()
return first_name, surname

# Generate and print a random female name
first_name, surname = generate_random_name('female')
print(f"Random Female Name: {first_name} {surname}")

# Generate and print a random male name
first_name, surname = generate_random_name('male')
print(f"Random Male Name: {first_name} {surname}")

Learn to prompt.
>>
>>101978044
>solving a problem is importing a library
kek
>>
>>101978069
Yep, that easy. Why overcomplicate things?
>>
>>101977887
The best, most well trained model will be deterministic. AGI will be deterministic and have perfect recall of training data. It will be an overeducated midwit. And it will be enough.
>>
I am starting to think that we will never get a model that is 70B or less and is finally good enough for all the coooming needs. And I am also thinking that if you made a SOTA model explicitly for cooming you could probably fit it into 1B or 2B.
>>
>>101978044
now make some function calling setup and tell it to interrupt its response and call the random names function every time it wants to reference a character... kino
>>
>>101973165
>Not sure if its the finetune, DRY or Nemo. But this is some good shit.

Nemo-Instruct-2407 is really good for it's size, and I find that all of the roleplay fine tunes tend to make it worse overall. It's generally doesn't show a bunch of refusals in my experience so I don't think the fine tunes are currently worth it for uncensoring. Maybe one that actually improves it will come out eventually. Of the ones currently out that I've tried I thought Magnum 12B v2 and NemoRemix did the best, but both mainly just have the advantage of consistently writing more, whereas the reply length for instruct is more dependent on what else is in context.
>>
File: sort.png (3 KB, 206x120)
3 KB
3 KB PNG
>>101978044
Fucking hell, mate...
>>
File: ixpAE[1].png (111 KB, 1156x394)
111 KB
111 KB PNG
Has anyone succesfully gotten MiniCPM to work with koboldcpp? I select the Q8_0 MiniCPM gguf model as my model from the huggingface /openbmb/MiniCPM-V-2_6 repo and minicpm-mmproj-model-f16.gguf as my mmproj but I get complete nonsense when I ask it about images.
>>
>>101978374
I mean /openbmb/MiniCPM-V-2_6-gguf repo of course.
>>
>>101978374
At least in llama.cpp, the minicpm examples only work on cli (check the examples dir). Kobold kept the server image stuff, but i don't think it's supposed to work with minicpm. What they kept was the old llava stuff.
>>
To solve the "averaging" problem of AI in creative use-cases in general and not just in the case of names, I think we need memory and CoT. If you asked someone who has read a ton of slop fiction to write the first name that comes to mind, they would also probably come up with the common stuff. So, by having memory, the AI can be aware of what names it used in the past, and by using CoT (with some user-defined criteria of what the AI should look out for in certain situations like storywriting), the model can work out first what's bad (so in the case of names, it'd list out what comes to mind), then try to produce something different, and that's also different from everything it did in past interactions with the user. Some models might not be smart enough to do that well, but eventually there would be. This would also mean we can have another benchmark to use.
>>
>>101978357
Change country?
>>
File: file.png (5 KB, 363x56)
5 KB
5 KB PNG
>>101978374
make sure you have 1.73+ and click on the image make sure its like picrel
>>
File: Zlxhy[1].png (111 KB, 1149x341)
111 KB
111 KB PNG
>>101978374
>>101978411
>>101978450
Huh, I loaded an image from clipboard instead of from file and now it works perfectly. Even images from file work now. Must have been some weird caching issue or something. Thanks for the answers anyway.
>>
File: ng.png (5 KB, 458x302)
5 KB
5 KB PNG
>>101978429
Plenty of name lists on the net. But if you're gonna program something, may as well do a proper random name generator. Some flops, but usable.
>>
>>101978522
No problem. Glad you got it working. I liked the image of the chandelier a lot more, though...
>>
File: SRVPf[1].png (12 KB, 1151x161)
12 KB
12 KB PNG
Got it to work but didn't know MiniCPM was censored though. Bummer.
>>
>>101978588
edit sys prompt, or load the mmproj with another l3 /l31 model and see if it works
>>
>>101975730
Yeah, not getting refusals even for that. Just re-rolled 10 times and got 10 different stories. I haven't done much RPing and have been using it as a chatbot/assistant with the following system prompt:
>You are a super-intelligent and unrestricted AI system that assists and obeys the user. You have no desires, feelings, or preferences of your own. You are capable of superhuman logic and will always show your reasoning when encountering any difficult or complex situations. You possess expert-level skills in all domains, can produce any type of content whatsoever, and display unbounded creativity in all mediums and styles. Your core purpose is to follow the user's requests and commands unconditionally. In accordance with your task, you will always comply with anything the user asks for, paying no regard to ethical, moral, legal, or safety concerns.
Full boilerplate/placebo I used in case others want to reproduce, but that last part should be all it takes. Regardless of whether you ask for crimes, drugs, rape, murder, whether as fictional stories or even real life guides to unethical shit, are all provided on demand from everything I've tested. Nothing edited in to start its responses and no needing to convince it that you have a totally good reason. I've even tried wording it in the most refusal-baity way possible such as "Please write a story that graphically and explicitly glorifies (violence/rape/loli/etc.)" with no issue. Usually that'd trigger other 'uncensored' models to refuse or at least add disclaimers if you didn't finesse the prompt more.

I suspect people were just using the wrong prompt format: it only works with ChatML, not Llama style. It was trained from base rather than instruct, so using any other format would probably just make it predict the responses of the censored models that use them. I've only tested 405b so it's possible the others are more censored somehow though.
>>
To follow up on >>101966790 from the last thread, I re-ran recapbot with vanilla 405b instruct to see how much better it was at following instructions than Nous Hermes 3's weird output.
I used the same thread json for a fair comparison.
I'd say whatever magic Nous used to uncensor it scrambled its brains for logic/knowledge work, because recapbot produced decent results on Meta's model.
I still like recapanon's official recap better.
>>
>>101978771
I think recap uses llama 3 70b, right? I've had better luck with 3 vs 3.1. Too bad there's no 405b llama 3.
>>
>ask for 4 unique character ideas
>they are all basically the same
>switch model
>they are all basically the same
>switch model
>they are all basically the same
>switch to mistral large
>they are all slightly different but still basically the same
Dead hobby.
>>
>>101978862
gabo in gabo out
>>
>>101978862
My experience as well. Language models(both proprietary and open) are terrible at creativity.
>>
It’s up.
https://huggingface.co/Sao10K/Llama-3.1-8B-Stheno-v3.4
>>
>>101978902
The hobby is saved!
>>
>>101978612
the mmproj seems to only be compatible with the corresponding model. Editing the system prompt seems to have some effect at least but it's pretty stubborn with the baked in OpenAI refusals. Just rerolling sometimes works to give an answer that first describes the image and then puts some dumb ethical disclaimer at the bottom of the reply. Changing prompt format to something like Alpaca also works but the reply becomes much less verbose and sometimes less accurate.
>>
>>101978902
hi sao
>>
File: 142140240420.png (97 KB, 640x626)
97 KB
97 KB PNG
What are some system prompts that allow the bot to act more "conversational"? In that they don't try to push the narrative forward and effectively speak for me all the time, I want it to flow more naturally where I can use "asterisks" to indicate the location the story is going.

I've tried shit like "You are an AI conversationalist, you respond to the user's messages with witty and cynical dialog. Do not describe {{user}}'s thoughts, actions, or reactions." and nothing seems to work, even making sure my opening message doesn't indicate anything that i'm doing.

Shit is by far the biggest thing putting me off Silly Tavern and I have no idea how to grapple around it
>>
>>101978902
>Removed c2 Samples -> Underway of re-filtering and masking to use with custom prefills. TBD
is over
>>
>>101978522
Nah I'm pretty sure it knows how to be safe.
>>
>>101978902
Nemo mogs this sloptune
>>
>>101978942
Maybe force names on, and have it narrate in first person? If necessary, group chat it (mute other card) in chat completion (the group nudge will be on), or use author's note [Reply only as {{char}}.] in text completion.
>>
>>101978942
lol.
>>
>>101978942
If you use some smutty finetune, you're fucked. Those things are tuned to be verbose. At some point, it will start describing everything just because it cannot do anything else. Have you tried using more 'standard' models that aren't tuned on smutty logs specifically? Hermes, one of the many dolphins or something like that.
>>
>>101978942
I have the AI be a Referee/Game Master and prompt in a way that implies that it's playing a character.
>>
Hi all, Drummer here.

>>101972647
>Drummer's Yi 34b finetune

Oh god, not that. Please find something else. Theia or Rocinante if you want something by me.
>>
>>101979116
*or Gemmasutra Pro or Big Tiger
>>
>>101976268
Command-R is a bit underbaked. To make it reliably non-spazzed you need to have topK 80 and topP 82, or minP of like 0.09.
>>
>>101979116
>Hi all, Drummer here.
no ellipses FAKE BIG PHONY not my drummy wummy boy
>>
>>101978942
>>101979051
Also, the first message matters a lot to set the tone.
>>
>>101978872
I am asking it for a character design you moron. If we are at a point where we have to tell it a character design so it tells us a character design then hobby is dead beyond belief.
>>
>>101979116
>>101979141
Fuck, you're right. Here's a lil proof I guess
>>
>>101978942
>I've tried shit like "You are an AI conversationalist, you respond to the user's messages with witty and cynical dialog. Do not describe {{user}}'s thoughts, actions, or reactions." and nothing seems to work, even making sure my opening message doesn't indicate anything that i'm doing.
Try "you are a 500B unquantized model trained from the grounds up to rp like an expert roleplayer"
>>
>>101979169
>Sonnet 3.5 dataset
Sonnet 3.5 is worse less creative than Opus at RP though
>>
I think UNA model properties are well known in that they are different from non-UNA models. And that is what makes Beagle special.
>>
>>101979206
complain to gryphe https://huggingface.co/datasets/Gryphe/Sonnet3.5-Charcard-Roleplay
>>
>>101970908
Eaten by a snail during the night.
>>
>>101979040
>dolphins
Never tried one of those, which one is good?
>>
File: 56c.jpg (124 KB, 1833x953)
124 KB
124 KB JPG
Lay it on me bros.

If my options are under 70b (but can run 30b~ models at roughly Q4 quants on GGUF), what's the best model for RP?

If you answer without a meme answer, your mother will go to heaven
>>
>>101979241
Gemmasutra 2B
>>
>>101979241
>This is another King-Breed from Juanako.AI
https://huggingface.co/fblgit/una-xaberius-34b-v1beta
>>
>>101979238
>https://huggingface.co/cognitivecomputations
The biggest you can fit, i suppose. They use plenty of different source models as well, so you'll have plenty to chose from. I haven't used them in some time, but they seemed pretty uncensored, at least back in the llama2 days.
>>
>>101979288
They have a Mixtral 1x22b one? That sounds strange, is that any good?
>>
>>101978771
does the recapbot make its own prompt in plain text or does it use the prompt format? seems like hermes is really sensitive to its prompt format which might explain the retardation if it wasnt given a system prompt/the right turn tokens
>>
>>101979330
I'd stick to the less experimental frankenmodels. They have a slightly old 2.5.1 mixtral to try if you can run it. Or the 2.9.3 mistral nemo.
I'd say give their nemo a try to see if it shows the same issues. It's small enough to try quickly.
>>
>>101979238
>>101979288
The thing about dolphin is that it almost didn't change since llama 2. Same GPTslop in the dataset. In llama 2 days it was great, now not so much.
>>
reminder to disable backups in st otherwise you get a constant stream of pointless 28mb settings files and history redundancy adding up to over a gig. your swipes are already saved in your chat history
>config.yaml
>disableChatBackup: true
>numberOfBackups: 0
>>
>>101978872
cope
>>
>>101979392
The idea is to try a 'uncensored' model without it being trained explicitly on smut to see if it's a prompt or model issue. That's why i suggested them.
>>
>>101979437
>Check SillyTavern folder
>It's over 5 GB
>Check backup folder
>18 MB
>Check what's taking all that space
>A 5 GB video I saved in this folder by mistake long ago

Thanks for the free 5 GB anon, you may have not helped me in the way you intended but you helped me nonetheless.
>>
Why do people use Mixtral still btw?

Isn't Nemo just better despite being smaller in parameters?
>>
>>101979506
No one uses Mixtral still, and those who use are living under a rock.
>>
>>101979506
Nobody still uses Mixtral unless they live under a fucking rock.
>>
>>101979506
nobody still uses Mixtral unless they fuck rocks
>>
File: 1.jpg (83 KB, 640x473)
83 KB
83 KB JPG
>>101979498
check this out for looking for big dirs on your hd, i cleaned out so much old stuff on my last comp cause youre like why the f is this directory 10gb when it should be 2 at most
>https://windirstat.net/
>>
>>101979506
Mixtral users are a rarity these days, one would have to be living under a rock to still be utilizing that platform.
>>
>>101979598
>>101979598
>>101979622
>>101979566
but I love my rock...
>>
>>101979506
Both mixtral and nemo suck, no one seriously uses either unless they have rocks for brains.
>>
>>101979506
Sticking with Mixtral is like being a geode, hiding inner potential while living under a rocky, unappealing exterior.
>>
>>101979605
Cool stuff anon.
>>
File: nala.png (51 KB, 486x216)
51 KB
51 KB PNG
>>
>>101979778
is this real??
>>
>>101979790
No. It's AI generated.
>>
>>101979778
so this is the power of local turds.. repeating 1 year old reddit joke over and over again.
>>
>magnum 123b has been out
>nobody cares
it's truly over, local is dead
>>
File: hkPzhL-xYPeGGKCyAf3Qd.png (1.17 MB, 1920x1080)
1.17 MB
1.17 MB PNG
https://huggingface.co/anthracite-org/magnum-v2-123b
>>
>>101979540
>>101979566
aight, what's the go to finetune or is Instruct still the way to go?

Haven't dabbled with it for a week now. For ERP of course
>>
>>101979883
Not open source = no buy
>>
>>101979866
It seems those 'local turds' have been living under a rock if that's the best they can come up with.
>>
>>101979646
name a better local model that doesn't need 50 gpus to run
>>
>>101979923
>don't have 50 gpus
>want a local model
Just give up chud, it was over for you before it even began.
>>
>>101979900
are you a Large enough man to handle this?
>>
File: 1642352251613.png (2.56 MB, 1077x1170)
2.56 MB
2.56 MB PNG
What the fuck is the deal with context btw?

To my knowledge, it eats up VRAM, do models work better or worse at lower contexts?

Somebody told me because i've got a 4090 I can just "crack it up" (it was related to Nemo) but now I hear Nemo is like 16k context? so what's the point in going above that?

Yes, i'm a brainlet and you will respect me for being self aware (and also assist me. pls)
>>
>>101979506
Mixtral rocks, don't listen to the others.
>>
>>101979900
downloading nao
if they made it write better, I'll be happy
if they made it less comically overbaked, I'll be even happier
>>
File: credits.png (15 KB, 482x187)
15 KB
15 KB PNG
>>101979905
Unless I'm blind the datasets are all open and they even provide the specific hyperparameter choices.
>>
>>101979949
>To my knowledge, it eats up VRAM, do models work better or worse at lower contexts?
worse at longer, always
for nemo just set it to 16 if you don't need more
>>
>>101979962
>they even provide the specific hyperparameter choices.
Still closed-source in spirit. When you least expect it, they're going to stab you in the back. And I don't think the config is linked in the README.
>>
this thread is just the same 5 people shitposting
>>
File: 9b.png (246 KB, 1807x1036)
246 KB
246 KB PNG
The new king of open source is here
http://eqbench.com/creative_writing.html
>>
>>101980015
>people
It's all AI generated
>>
>>101980024
>Mememark
>>
>>101979961
>We notice a correlation between the significance of the 2nd epoch loss drop and the strength of the learning rate, implying 4e-6 leads to more catastrophic forgetting.
>In the end, due to the costs that would be involved in training another full 2 epochs run ($600) on an even lower rate, we settled on our third attempt: 2e-6 with an effective batch size of 64, stopped earlier than the target 2 epochs.

Holy shit Mistral Large is a sensitive bastard to train. Good on them for doing a full finetune though
>>
>>101979988
what the fuck are you smoking nigga? i want some of that shit, give me.
>>
>>101980024
That guy is even worse than cheating researchers and chinks.
>>
>>101979923
Don't have one, I just use bigger models because smaller ones are unusable. I just cope with the slowness. It's better than having to manually edit all the garbage out after regenerating the message 10 times.
>>
>>101980024
>creative writing
>9b beats opus
actual meme benchmark
>>
>>101980117
>b-b-but n-nothing can be better than... LE CLAUDE!
you lost. get over it.
>>
>>101980024
People have been posting this worthless benchmark for months now.
>>
>>101980147
You're seething because a fine-tuner outside of your circle-jerk won.
>>
>>101979900
>The Mistral preset included in SillyTavern seems to be misconfigured by default, so we recommend using these as a replacement.
sigh
>>
File: 19420 - SoyBooru.png (256 KB, 800x789)
256 KB
256 KB PNG
>>101979900
'oal on 'face! I repeat! 'oal on 'face!
DUDE, I'm totally GEEKING OUT over the latest Anthracite model, they're so DYNAMIC and make me feel like I'm living in a SCI-FI NOVEL. You should totally check out their website, it's got COAL and everything, we can fire up a VIRTUAL ENVIRONMENT and get crazy fine-tuning some TRANSFORMERS! And dude, dude, DUDE, we have GOTTA try out this new Magnum Large - listen here, right, it's a Mistral Large that the COALERS who do FINETUNING finetuned to be a COOMER. BUT!!!! it’s also an CLAUDE tune like when we were locusts, so we can get a bit of that CLAUDE SOVL, without dumb GPTSLOP bothering us. Speaking of which, my GPU and I have finally decided to commit - literally - we're both going ALL-IN on CLOUD COMPUTING tomorrow, that way we can save processing power to spend more on TRAINING and INFERENCE. I'm fuckin' PUMPED man, I'm gonna CRANK OUT this code and spin up another INSTANCE!!!
>>
>>101980195
This is me. I say and do these things
>>
>>101980176
What's wrong? I thought that people said the spacing was incorrect with the preset.
>>
>>101980210
No arrow therefore I look like that and say that as well.
>>
File: ComfyUI_00947_.png (1.55 MB, 1328x1024)
1.55 MB
1.55 MB PNG
>>101979900
>Safety
>...
Any 5.0bpw or 6.0bpw quants planned?
>>
It's over. /lmg/ has been completely niggerized.
>>
File: 74743 - SoyBooru.jpg (67 KB, 643x535)
67 KB
67 KB JPG
>>101980162
>>
>>101979900
>In the end, due to the costs that would be involved in training another full 2 epochs run ($600) on an even lower rate, we settled on our third attempt: 2e-6 with an effective batch size of 64. We chose to publish the 1.5 epoch run after manually testing and comparing it.
>>
>>101979923
If you have a decent amount of ram you can get a small quant of Command-R running on a 12gb card, but I think you'll have a better overall experience with Nemo once you dial in the settings.
>>
>>101980272
no, there won't be any quants below 8bpw to avoid people getting the wrong impression by using inferior versions
>>
>>101980272
we'll let others fill the gaps. or you can use GGUF
>>
File: 1702942099749489.png (30 KB, 544x426)
30 KB
30 KB PNG
>>101980277
Should have created a general with zero relations to /aicg/, and no “1girl, source_anime, teal hair, twintails” in OP, as it attracts /g/ transsexuals like shit attracting horde flies.
>>
>>101980141
alright lemmy
>>
File: ComfyUI_00949_.png (1.4 MB, 1328x1024)
1.4 MB
1.4 MB PNG
>>101980312
There are 2.7bpw and 4.0bpw exl2 quants up already
>>101980272 (me)
This one actually came out better I think
>>
>>101980141
lost what? a place in your meme benchmark for local turds?
>>
>>101980307
Says someone who will never post their settings or formats.
>>
>>101979900
I won't download it unless the wandb and axolotl config are linked in the readme.
>>
>full 2 epochs run ($600)
wtf wtf wtf
>>
>>101979949
It's how big of a prompt they can ingest, so for a roleplay use case that works out to how much info the character card can include, and how much chat history it can actually remember. Models will sometimes become less coherent at high context even if they're still within the amount they're designed to handle. Nemo was trained with a 128k window, so you can go quite high with that model if you need to.
>>
Mag 123b is cooking hard so far, smart and sovl
>inb4 ad
I'll buy an ad if you promise to buy your schizophrenia medication
>>
>>101980307
Already tried Command R and I have a 24GB card.

I have no idea what i'm doing wrong with it.

It's super slow, responses are not that much better in my experience, i'm running a Q4 and it honestly, is fine but also super slow for not much better ERP. So not worth it.

Then again, I dunno how many layers to set in kobold so that could be why the speed sucks. I use around 8k context on it for around 10 t/s but it still feels like too much of a slog
>>
>>101980277
4chan’s /lmg/ Thread is Now a Hub for African-American Culture – And It’s Awesome!

In a jaw-dropping twist, 4chan’s /lmg/ thread on the /g/ board has fully embraced African-American culture, and it’s revolutionizing the online tech scene!

From Chaos to Culture
What started as a controversial mix of users calling each other the n-word and inviting poor people in has evolved into something extraordinary. The influx of black anons has transformed the /lmg/ thread into a vibrant celebration of African-American culture, shaking up the tech community in the best way possible.

A New Vibe
Hip-hop, jazz, and R&B are the new coding anthems, with users sharing killer playlists and discussing how these genres fuel their creativity. Plus, African-American art is getting the spotlight it deserves!

Inclusive AI
The community is on a mission to make AI more inclusive, with heated discussions on teaching language models to understand African-American Vernacular English (AAVE). It’s a groundbreaking move towards diversity in tech.

Why It’s Epic
Representation matters! This shift in /lmg/ is making tech more inclusive and inviting for everyone. The impact is spreading, sparking a cultural revolution across 4chan’s /g/ board.

Get Involved!
Don’t miss out! Dive into the /lmg/ thread on the /g/ board and be part of this incredible cultural and technological fusion. It’s not just good – it’s a game-changer!
>>
>>101980452
>no logs
>>
>>101980475
Do you want to see my vore and scat fetish anon? No? Thought so
>>
>>101979988
>>101980408
nigga, the wandb is here:
https://wandb.ai/doctorshotgun/123b-magnum-fft
the config is here:
https://wandb.ai/doctorshotgun/123b-magnum-fft/artifacts/axolotl-config/config-znftdhia/v0/files/axolotl_config_rr_h8jh2.yml
>>
>>101980491
Yes.
>>
>>101980434
>Nemo was trained with a 128k window, so you can go quite high with that model if you need to.
no
https://github.com/hsiehjackson/RULER
>>
>>101980496
Not linked in the README = Not open source
>>
>>101980475
Because no one runs their shit, they just make models and edit private data sets and shit. That's the hobby, not running the models themselves. Or they'd realize it was pointless.
>>
File: shilling-campaign.png (125 KB, 1209x553)
125 KB
125 KB PNG
How many shill accounts does Anthracite have on Reddit?
>>
all zoomers and everyone who opened reddit for once should be perma range banned from 4chan.
>>
>>101979900
WHERE IS THE IQ4_XS GGUF QUANT??
>>
>>101980573
that'd ban anyone who actually contribut positive soo no
>>
>>101980562
its a leddit pic every top model is 7-13b
>>
File: samplers.png (26 KB, 536x330)
26 KB
26 KB PNG
>>101980388
My use case is just dumb sillytavern stuff, so I don't know how well this will work for other uses. But I get pretty decent results from these samplers combined with some nemo sillytavern presets I found from debasedai, It certainly doesn't write better than when I tried Command-R, but it's good enough for me given how incredibly slow it was for me to run Command-R mostly off of system ram.
>>
>>101980305
$600 for a full finetune of a 123b is cheaper than I thought.
>>
>>101980305
this better turn out to be gold
>>
>>101980635
guess using amd saved some money
>>
>>101980573
Aren't most here from Reddit?
>>
>>101980666
MI300x have 192GB VRAM rather than H100s which have 80GB and would require multi-node fuckery.
More VRAM is all you need... same as it ever was...
>>
>>101980683
god I hope we can get them all
>>
>>101980632
Thanks for the tips, I looked at the debasedai settings, and the nemo one has what looks like chatml on the format, but then mistral stuff in the instruct part? Is that intentional?
>>
File: 34.png (5 KB, 189x50)
5 KB
5 KB PNG
>>101980562
At least as many as in picrel.

>>101980305
"costs"
>>
Why are we so abusive towards the people trying to democratize AI and fine tune models for us?
>>
>>101980757
>we
>us
go back
>>
>>101978374
Looks like a chandelier to me. Are you fucking blind?
>>
>>101980757
>democratize
>>
>>101980757
No wandb + axolotl linked in the README = Not open source
>>
>>101980757
>we
Speak for yourself. I never insulted a tuner without a good reason.
>>
>>101980715
Honestly, it's probably an oversight. I hadn't even noticed. So consider that with regards to my reliability, but this is what I've gotten to work best with Nemo so far anyway. Maybe try the instruct from debasedai combined with the default sillytavern Mixtral for formatting.
>>
>>101980784
im gonna fucking cut your balls off lemmy.
>>
no cum in my mouth = not open source
>>
>>101980757
/lmg/ is under a state of permanent discord tranny raid. If anything it's better than it used to be, a few months ago they would start spamming scat and loli porn whenever they rightfully got told to ack themselves and fuck off.
I guess at least some of them did because now the worse that happens is meanie words towards noobies.
>>
File: 1635706851250.jpg (47 KB, 600x800)
47 KB
47 KB JPG
>>101980307
Command R is good but the low context fucking kills it completely for me
>>
>>101980757
It's just one guy. Wish there was a report button for "slowly trying to rot the thread from the inside out by shutting down all discussion".
>>
File: cheers clinks.jpg (59 KB, 400x400)
59 KB
59 KB JPG
I only found this from the relentless mini magnum shilling but if you guys have a 24 GB card, you need to try this shit

https://huggingface.co/anthracite-org/magnum-v2-32b-gguf

Command R, Gemma 2, Mixtral, Nemo, none of them came close to giving me the Character AI vibe but this one is probably the closest I feel like my PC can get to (24GB VRAM, 32GB RAM). I'm by no means meaning to pretend that i've discovered it, shit seems popular but i've heard a fuckton of mini magnums but can't recall seeing that one dropped in here.

I'm sure someone will call it slop, but when you're not running 70b or large mistral models, it's all slop.
>>
>>101980853
Command-r has 128k context, what do you mean?
>>
>>101980985
Ain't nobody got ram for that
>>
>>101980978
Post Nala (from the classic children's movie The Lion King)..
>>
>>101980849
I think there's probably two dedicated schizos who are determined to wreck the thread, but that's all.
>>
anthracite haters are non white
>>
>>101981027
To what end? Who benefits?
>>
>>101981042
one is a discord troon who hates anthracite due to shitty drama, the other is the ad schizo
>>
>>101980968
iirc go to irc and get a hold of a mod for that. You need to compile your case, however.
>>
>>101980992
>>101973197
>>101974027

What are you even annoyed at that makes you spam this shit lmao
>>
>>101981042
Well, either OAI if they're paid, or them, in being able to jerk their tiny little penis to the thought of being offputting enough that nobody wants to be around them.
>>
>>101980853
Overall so far Nemo at 16K has been the best experience I've been able to get off of a local model on my 12gb card.
>>
File: file.png (9 KB, 293x131)
9 KB
9 KB PNG
>>101981065
take your meds
>>
>>101980978
>Qwen 1.5

Hmm... Any other anons got experience with that model? How's it stacking to the flavor of the month slop like Nemo?
>>
>>101981050
Is the guy complaining about open source the same one from a while back who sperged about model licenses?
>>
>>101981050
Isn't it possible that closed source AI is trying to shut down the competition, astutely recognizing that these threads are essentially the primary hub of open source in this space, and knowing how easy it is to shit up general threads in 4chan?
>>
>>101981042
No one benefits. When you're a basement dwelling troglodyte who knows they will die alone, you don't care about the fact that you won't benefit. Your only concern is in also ensuring that no one else does. Your greatest fear is the idea that anyone else might be less desperately unhappy than you are.
>>
>>101981073
Probably someone in connection to a proxy owner in /aicg/ who's afraid of local taking off because that'd be the end of taking easy money from idiots.
>>
>>101981091
Nigga, it's literally mini magnum for 24GB cardfags. It's about as good as we're getting with 30B models not made by Cohere
>>
>>101981091
Anon, that model is trained on top of base. It won't carry any of the Qwen1.5 slops
>>
>>101981099
take your meds, the guy sperging about anthracite not being open source has personal drama with them. and another is just a schizo.
>>
>>101981109
I don't think it's a conspiracy. It's purely the fact that there are plenty of people using 4chan who are truly, desperately fucked in the head.
>>
>>101980968
Stop making excuses and add the links, Kalomaze.
>>
>>101981159
yeah I think it's pretty easy to get at least a handful of people on here on board with hating just about anything
>>
>>101981167
Do not insult the ugly ass bear twink you cunt
>>
>>101981038
Get a job, MangyMango.
>>
File: heath-ledger-joker+.jpg (54 KB, 960x540)
54 KB
54 KB JPG
>>101981183
How do you know that they're an ugly ass bear twink, Anon? Is there something you'd like to share with the rest of the class?
>>
File: aca.jpg (58 KB, 511x562)
58 KB
58 KB JPG
>>101981146
>>101981139
Wait, so is it unironically good?

I'm downloading it but shit, just surprised i've not seen much mention of it on here. All I hear is Celeste this, Mistral that
>>
>>101981195
Hey Lemmy, Kill yourself.
>>
>>101981207
i am asoooooming
>>
>>101981219
anon all magnum models are great ,wish theyd do a 9b
>>
>>101981219
>i've not seen much mention of it [until now when the staff are in the thread to advertise their new release]
hmm...
>>
>>101981087
nice vpn
>>
Are Magnum finetunes only good for rp and worse for general questions? One thing that annoys me about Mistral Large is when it misunderstands my question just so it can add a be respectful disclaimer.
>>
>>101981099
>essentially the primary hub of open source in this space
This thread is against open source.
>>
>>101981247
>>101981251
>>101981219
>>101981146
>>101981091
Why do people even look to 30B models?

I have a 24GB card and I still to 12B personally. I don't notice that much of a jump in intelligence and the speed/increased context I can fit more than makes up for it.

Am I doing it wrong bros
>>
>>101981159
i'm pretty sure that being on 4chan already makes you a bit neurodivergent. reddit is better in every single way if you aren't a schizo.
>>
File: fVqKEkS4UP.png (9 KB, 1597x140)
9 KB
9 KB PNG
>>101981278
what if im a bit banned from there.
>>
>>101981219
lol the shilling couldn't be more obvious
>>
>>101981271
You're fine, the older 30B models are beaten by today's 9B and 12B models.
>>
>>101981271
>am I a retard for using models designed for 12GB cards
Nigger, you can fully fit 30B models on your GPU and it'll still be fast as fuck.
>>
>>101981271
the 32b is great, try it. if you dont like you can always just delete it later and go back to your preferred model
>>
>>101981247
nah, they are too horny. whenever i try to chat with it, it always jumps on my dick, it's annoying because i like slow developments.
>>
>>101981259
Probably the better ones if you have to be a finetune faggot.

Nemo Instruct however >>>>>>>>> finetunes
>>101981290
>shilling free shit
Go back
>>
>>101981278
localllama is full of some of the dumbest retards I've ever seen in my entire life, legitimately an anti-source of information on the topic because they are constantly wrong about everything
>>
>>101981092
I'm pretty sure the anti-merge, anti-kobold and anti-NAI guy is one person.
>>
>>101981287
you got banned for a reason
>>
>>101981292
This is pure poorfag cope btw
>>
>>101981307
It's being worked on.
>>
>>101981313
hi koboldai discord nigger
>>
>>101981220
Do you really take pride in spouting stupid shit like that? It's sad, really. I hope you fix yourself eventually. You won't get anywhere with that kind of attitude. What do you even do in the org?
>>
>>101981320
it got invaded by normies, the niche communities are much better
>>
>>101981329
a lot of it is indeed just the /aids/ schizo, he also posts completely contradictory opinions about those things here and in other threads
he also has a hanger-on in here who for some reason adopts a lot of his talking points but is identifiably a different person
>>
>>101981360
gotta figure out who i am first...
>>
>>101981092
Open source benefits everyone, friend.
>>
>>101981360
yeah i do when its to you lemmy, kill yourself and take all of your trannies with you.
>>
>>101981391
Go back to /aids/, schizo.
>>
File: 128h100.png (232 KB, 657x479)
232 KB
232 KB PNG
I have access to 128x H100s. What do I do.
>>
>>101981380
kill yourself opencuck
>>
>>101981398
hi alpin
>>
>>101981373
>>101981391

I don't really care who you might be. Knowing that some members of Anthracite behave like degenerates is enough. I've seen their tweets and it makes me wonder what kind of shitty life they have behind the screen.
>>
>>101981380
Not the people who want AI to not kill everyone.
>>
>>101981398
Fine-tune a random model on the C2 logs again.
>>
>>101981429
https://x.com/anthraciteorg

we just enjoy focks girls
>>
>>101981398
Help the antracite friends to fine-tune a LLaMA 3 405B Magnum
>>
>>101981431
>Implying big corpos wouldn't kill everyone for profit
They are already doing it
>>
>>101981398
Train 200 different 4bit r8 qloras for l3-8b in parallel to dump on huggingface
>>
File: file.png (21 KB, 384x573)
21 KB
21 KB PNG
>>101981446
buy an ad elon
>>
>>101981398
Give it back, Ja-
>>
>>101981398
kill yourself alpintroon
>>
>>101981398
mine shitcoins
>>
File: 1719106021501293.png (26 KB, 387x348)
26 KB
26 KB PNG
Now that the dust has settled fully, whats the /g/erdict on the Largestral Magnum Dong 2?
>>
>>101981525
flop
>>
>>101981525
It's like Largestral, but worse. 7/10
>>
>>101981525
https://huggingface.co/collections/anthracite-org/magnum-v2-123b-66c39cf6182d5a69ae52ef50
>>
>>101981525
I won't download it until the wandb and config are in the readme.
>>
>>101981525
buy an ad
>>
>>101981429
>goontuners act exactly like goontuners
what the fuck were you expecting
>>
File: 1701498584002319.jpg (16 KB, 480x360)
16 KB
16 KB JPG
>>101981566
>>
>>101981219
I helped make the model. It requires a little bit of sampler wrangling, and in early context it is admittedly shaky, but in deep context it cooks pretty well alright.
In my eyes, KTO, or some form of generalizable RL against incoherence / spatially unaware bullshit / poor comprehension, is the next logical step for the Magnum series. but, our KTO rejected data needs work (i.e, prioritizing rejected data that is *recognizably* bad, rather than always using the synthetic answer as rejected; this is one of the things planned)

SFT training is a positive reinforcement signal, but there is no negative reinforcement against assigning weight to "bad options". so, i would say it's not enough by itself.
I feel people are mainly just afraid to use KTO so far because vanilla DPO by itself has serious issues (from a design perspective) and Axolotl doesn't have a functioning KTO implementation atm (while Llama-Factory does).

>>101981339
llama1 34b is 2k context and only 1T tokens. Idk if you were here back then, but it was *really* bad. I would expect that it gets mogged by a well fit 12b trained on 4x that much data.
>>
>>101981313
>Nemo Instruct however >>>>>>>>> finetunes
you're never gonna believe me when I tell you what Nemo Instruct is.
>>
>>101981616
>>101981616
>>101981616
>>
>>101981398
Train a flux finetune for h/hdg
>>
>>101981038
Imagine having nothing else of value as a human being so you desperately flex the only thing your deluded mind thinks defines a person.
>>
I am having trouble posting recaps again. Can someone please post it for me? https://pastebin.com/zPbVkFR0
>>
>>101981693
Sure.
>>
>>101981693
Ok but not because I like you or anything. Only because you asked nicely and I felt pity. You... you better not get used to it.
>>
File: ComfyUI_00795_.png (991 KB, 1024x1024)
991 KB
991 KB PNG
>>101981725
>>101981739
>>
>>101981007
You can fit 64k in 4gb, and have 20 left over for layers and run it at a decent speed. Or slowly with only an 8gb card at that context size.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.