[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102444258 & >>102434744

►News
>(09/18) Microsoft releases 16x3.8B GRadient-INformed MoE: https://github.com/microsoft/GRIN-MoE
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5/
>(09/18) Llama 8B quantized to b1.58 through finetuning: https://hf.co/blog/1_58_llm_extreme_quantization
>(09/17) Mistral releases new 22B with 128k context and function calling: https://mistral.ai/news/september-24-release/
>(09/12) DataGemma with DataCommons retrieval: https://blog.google/technology/ai/google-datagemma-ai-llm

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: img_42.jpg (116 KB, 512x512)
116 KB
116 KB JPG
►Recent Highlights from the Previous Thread: >>102444258

--Formatting for user-only and AI-only comments in Sillytavern, and using scenario override and macros for dynamic objectives and story beats: >>102446780 >>102446919 >>102446987 >>102447000 >>102447183
--Discussion on China's AI progress, benchmark performance, and the debate over 0-shot vs. CoT reasoning models: >>102444396 >>102444538 >>102444660 >>102444692 >>102444742 >>102444759 >>102444770 >>102444804 >>102444821 >>102444836 >>102444838 >>102444872
--Discussion on AI's ability to solve competition math problems: >>102445940 >>102446192 >>102446236 >>102446329
--WonderWorld GitHub repo still a README, despite paper promises: >>102445600 >>102445615 >>102445829
--Llama.cpp developer discusses requirements for adding new samplers: >>102444940 >>102445039 >>102445343 >>102445389 >>102445605 >>102445635 >>102445695 >>102445865 >>102445875
--GRIN MoE performance compared to other models: >>102449377 >>102449398 >>102449413 >>102449470 >>102449494
--Mistral-Nemo-Instruct-2407 outperforms Mistral-small in JP translation: >>102447665
--Hugging Face's extreme quantization allows for 1.58bit LLMs, but with some performance drop: >>102444560 >>102444608 >>102444653 >>102444655
--Qwen 2.5 72B performs well for ERP up to t=1.7: >>102444722 >>102444794 >>102444860 >>102444930 >>102445625
--Nemotron-Mini-4B-Instruct Nala test and model background: >>102449523 >>102449588
--Nala and Qwen2.5-Math-72B-Instruct experiment results in incomprehensible output: >>102445823 >>102445874
--Discussion on implementing CoT locally, with challenges and potential solutions: >>102446846 >>102446905 >>102447043 >>102447122 >>102447062
--Anon tests temperature settings for generating responses and evaluates NALA score: >>102445088
--Advice for optimizing local AI model testing with limited GPU: >>102447333 >>102447389 >>102447783
--Miku (free space): >>102444310 >>102446044

►Recent Highlight Posts from the Previous Thread: >>102444269
>>
File: nala grin-moe.png (119 KB, 942x336)
119 KB
119 KB PNG
So GRIN-MoE is incapable of actually responding to overly-structured promps such as sillytavern when using the Phi formatting but if you use an alpaca-template it will RP. But given that it's trained on the Phi data and the Phi pretraining data is devoid of any smut it probably won't actually go very far with a proper NSFW RP.
>>
>>102450040
>won't actually go very far with a proper NSFW RP
Does this mean she will finally eat you like a real lion should?
>>
>>102449897
>>102449874
I wish death upon all mikufaggots and want /lmg/ to die but I think it is just people too busy cooming to mistral small and qwen.
>>
>>102450058
Maybe. This could be a major win for vorefags.
>>
File: file.png (55 KB, 998x518)
55 KB
55 KB PNG
Qwen2.5 72B is now definitely the best local model under 100B parameters for JP>EN translation. Nice!
>>
>>102447333
here

>>102447389
>>102447449
you're right, I forgot speccy was stupid as shit -- GPU-Z is reporting correctly 8192MB vram

I did get a "Mistral-Nemo-Instruct-2407-IQ4_XS.gguf" loaded (other ones I found didn't load up right) with the Mistral presets in sillytavern

I had been gunning for the Q4 when possible, and leaving out the "--quantkv 2 --flashattention" args. Right now my CLI looks like:
call bin-kobold\koboldcpp_cu12.exe --model "N:\IGGER\F\A\I\Mistral-Nemo-Instruct-2407-IQ4_XS.gguf" --contextsize 12288 --threads 7 --blasthreads 14 --usecublas normal 0 1 --gpulayers -1  --blasbatchsize 512 --highpriority --foreground --skiplauncher --nommap --usemlock --onready "SillyTavern.bat" %*
>>
Trying to load magnum-12b-v2.5-kto-Q5_K_M.gguf gives
>raise ValueError("Failed to create llama_context")

low vram or what?
>>
>>102449993
>grin moe
neat, new toys to play around with
>4k context
AAAAAAAAAAAAAA AT LEAST MAKE IT 8K FFS YOU FOOLS!
>>
>>102450104
We're so back.
>still doesn't know what mesugaki means according to >>102446773
It's so over.
>>
>>102450178
>he cares about animu shit knowledge
Pathetic.
>>
>>102450175
rope yourself
>>
File: file.png (8 KB, 576x61)
8 KB
8 KB PNG
>>102450104
I liked seeing that it got most of the 'tricky' translations right, like picrel. But Qwen2.0 wasn’t bad at that either, so it doesn’t seem like a huge upgrade over it, as can be seen on the leaderboard.
>>
>>102450207
I can assure you, (You) will rope yourself way before the rest of us will.
>>
>crazy thursday is over
>thursday hasn't even come yet
>>
>>102450214
>no u
Fried your brains huh?
>>
>>102450139
what are you trying to load it with?
try koboldcpp if you're not already
>>
>>102450238
using oobabooga. I'll give koboldcpp a shot
>>
>>102450198
Yes because it's an indicator, just like the Castlevania question. If they trained on 18T and none contained such knowledge, it means their filtering was very strong and likely many more types of knowledge have also been filtered away, so much so that even 18T doesn't help it. This would be a strong indicator that the model, like the last one, has very little cultural knowledge in general. Might also be bad at more niche RP that isn't the cookie cutter shit. And probably will be bad as well for assistant tasks when it involves more niche subjects.
>>
>He uses ooba in the year of our lord 2020+4
Anon, I...
>>
>>102450261
First day wrangling with this. I thought it was good enough to start with
>>
>>102450175
So far in testing it seems to completely break apart at 2400 tokens of context. So it can't even do 4K, at least not with extensive back and forth.
>>
>>102450272
You thought wrong.
>>
>>102450274
>Already breaks under 2500 tokens
How the fuck? I get it that not every model can or should be 16k+ in their first iterations, but holy crap.
>>102450272
ooba is just like the webui thing for image AI, they work, but there is far better alternatives available.
>>
>>102450261
It's fine, fuck SillyTavern's bloat and Kobold's jank.
>>
File: file.png (79 KB, 859x607)
79 KB
79 KB PNG
>>102450178
yikes. I definitely wouldn't recommend this model for learning Japanese.
>>
>>102450307
Sometimes at 2K even, jeez.
But at the same time.. It's actually good for certain scenarios (quasi-reluctant but consenting character who has conflicted emotions about the situation) which is frustrating, dealing with Llama-1 context window.
>>
>>102450349
kimesu no yaiba
>>
File: 1725922368500279.jpg (649 KB, 2384x1808)
649 KB
649 KB JPG
>>102450000
Checked and ty for your service, recap bot
>>
>>102450319
ooba cuck
>>
File: ooba.png (103 KB, 845x817)
103 KB
103 KB PNG
>>102450319
what's worth using from these?
>>
File: chuck-e-cheese-okay.gif (3.29 MB, 640x640)
3.29 MB
3.29 MB GIF
please someone using TTS point me in the right direction here, have tried several TTS packages from github, most of them couldnt even install a working venv, the working one OOM'd when it tried getting RVC working (and couldnt generate its own json config?)
>>
Does anyone else ever download a model and fire it up solely for testing purposes but then end up unintentionally having a 3 hour long goon sesh with it?
>>
>>102450910
>goon sesh
no, no I don't, zoomer nigger faggot.
>>
File: No fun allowed.png (298 KB, 745x745)
298 KB
298 KB PNG
When we finally get out AI's smart enough as well consumer grade robotic bodies, do you think the government will try to hardcode the AI's to never touch a gun or cause human harm? Or do you think they will take a more utilitarian approach and try to hardcode the AI's to kill a human who if not taken down will kill 10 others?
Lord knows the government won't just let it be without restrictions, the government hates any form of fun.
>>
>>102450938
Gov loves anything that gives them more power, which means robots that become their willing soldiers without scruples that will never doubt their commands. Think Star Wars Episode 3, but with machine robots instead of fleshy robots.
>>
>>102450910
I always test a model for a long time before deciding if it's better than what I was using before.
Nowdays it's rare for a model to be so shit from the get go that I discard it quickly.
I think ever since mixtral 8x7b, things have been in a pretty good state in general.
>>
>>102450910
No this has not happened to me because no open source model has ever been good enough to cause that. Maybe some day.
>>
File: Super battle droid.jpg (219 KB, 1920x1079)
219 KB
219 KB JPG
>>102450962
I hope the Gov lets me own a Super Battle Droid for home defense.
>>
I just tried Grin-MoE and the model is indeed very retarded, OP please take it out from the OP next thread, it's definitely not worthy to be there.
>>
>>102451015
>a Super Battle Droid
one (1) B1 battle droid is all you need
>>
>>102451015
>>102451075
>I'm equipping my animatronic OC girls who have massive ballistics with real (real) ballistics and they'll be trained in CQC too
get the best bang(arang) for your buck there
>>
>>102451075
A B1 battle droid? Why do you need a B1 battle droid? Citizens should only be able to own protocol droids and not droids made for war.
>>
>>102451075
B1s are fucking trash, literal paper weight for mass production. What I want is a B2 for home defense, something study that can survive a hit.
>>
>>102451067
It's baffling that people still pay attention to Microsoft model releases, they have by far the biggest delta between benchmark results and actual model performance of any of the big companies

It's extremely embarrassing tbdesu, Microsoft is a rich megacorp with all the resources in the world, but their AI guys are shamelessly gaming benchmarks in a way you only usually see from shitty startup grifters
>>
>>102450104
Even the top model on that list can't translate things right, so it's hopeless.
>>
>>102451215
My guess? Microsoft is simply playing their extremely retarded shareholders (comes with being a shareholder), for which the simple method of playing benchmarks is extremely effective at. Higher number = better = worth investing.
>>
File: Untitled.png (1.01 MB, 1080x2895)
1.01 MB
1.01 MB PNG
A Controlled Study on Long Context Extension and Generalization in LLMs
https://arxiv.org/abs/2409.12181
>Broad textual understanding and in-context learning require language models that utilize full document contexts. Due to the implementation challenges associated with directly training long-context models, many methods have been proposed for extending models to handle long contexts. However, owing to differences in data and model classes, it has been challenging to compare these approaches, leading to uncertainty as to how to evaluate long-context performance and whether it differs from standard evaluation. We implement a controlled protocol for extension methods with a standardized evaluation, utilizing consistent base models and extension data. Our study yields several insights into long-context behavior. First, we reaffirm the critical role of perplexity as a general-purpose performance indicator even in longer-context tasks. Second, we find that current approximate attention methods systematically underperform across long-context tasks. Finally, we confirm that exact fine-tuning based methods are generally effective within the range of their extension, whereas extrapolation remains challenging. All codebases, models, and checkpoints will be made available open-source, promoting transparency and facilitating further research in this critical area of AI development.
https://github.com/Leooyii/LCEG
Nice to see all the methods finally tested against each other in a controlled manner
>>
>>102451215
Microsoft has long since lost the ability to make anything truly good. As a company they are coasting solely on the momentum they gained in their earlier years.
>>
>>102451259
>Microsoft is simply playing their extremely retarded shareholders
I would hate being a boss of a company, when I was growing up I was brainwashed on thinking the boss was the king, but it turns out it's the most desperate place, you always have to suck the shareholders's dick to survive, that's fucking depressing
>>
>>102451283
You just gotta become a shareholder then, so people are sucking your dick and you are a king.
>>
>>102451283
Silly people always think that being the boss of a huge company like Microsoft would be cool, only ever thinking of the money, fame and what have you. But in reality it's one of the worst jobs you can have, which might explain why CEOs of huge corps like this tend to be cunts, just making the best with what they have and continue to gain. Corruption at it's finest. The moment a company becomes public it eventually dies, no matter what.
>>
>>102451293
to be a shareholder you need to be rich though so... unless you're the son of a richfag, you gotta start somewhere kek
>>
>>102450910
I don’t understand the concept of testing a model without masturbating.
>>
>>102451160
Actually just need a B4 and gentle parenting
>>
>>102451298
>which might explain why CEOs of huge corps like this tend to be cunts, just making the best with what they have and continue to gain.
maybe the opposite is true, to be a CEO you must be a fucking cunt that has no problem selling your soul to the devil or some shit
>>
>>102449993
all right it's fucking zero. are you happy you crazy fuck?
>>
>>102450104
ty for updating
>>
I hope they figured out immortality in my life time, so I can finally utilize my hoarded wealth rather then passing it on to my kids and dabbing on the poor for good measure.
>>
>>102451334
Good point, it's likely a mix of both if you ask me. To make a fuck ton of money you either gotta be lucky (which later turns into corruption/greed), or be a fucker from the get go that only cares for money, fame yadayada.
>>
>>102451283
Work a job, have 1 boss
Be the boss, have 8 bosses
Own the company, have a million bosses, and the ones you talk to most are Reddit and discord mods
Hell on earth
>>
>>102451353
>I hope they figured out immortality in my life time
I can feel they'll find this shit at the moment I'll be a old fart, so there wouldn't be any point
>>
>>102451366
>Work a job, have 1 boss
>Be the boss, have 8 bosses
depends, if you're a big manager yeah you only have the CEO to bother you, but if you're a simple employee...
https://www.youtube.com/watch?v=3wqQXu13tLA
>>
>>102451377
I swear people misunderstand on purpose just to have something to disagree about
“Big manager” != “THE boss”
>>
Now that the dust has settled was >>101516633
Llama 3 405b
any good for coom? daily tasks even?
hello? (hello?)
>>
Didn't check in these threads for a few months, have vramlets anything better than nemo yet?
>>
>>102450261
Is it really that bad?
>>
>>102451435
mistral just brought out a 22b like yesterday
>>
>>102451459
it worksTM
>>
>>102451423
>refuses to be used for sex
>no creativity, just 19th century YA slop writing
>can’t fucking code for shit
>can’t into geometry
Llama3(.1) was an embarrassing benchhacked abortion and Zuckerberg is once again a lizard person until further notice
>>
https://huggingface.co/teto3/mistral-nemo-storywriter-12b-240918
Since it's trained on base I don't think it will be any good for RP. I'll keep working on the dataset while looking for cheap gpu rents to train largestral. Will probably do mistral small before that as well.
>>
>>102451459
No it’s just shills for the conglomerate
>>
>>102451500
Ah yes the conglomerate of big llama
>>
>>102451479
Is it better?
>>
///BAD NEWS!!!///
I've tried making Q6_K_L quants myself and it appears that llama-quantize is broken big time! (At least for some models)
>--leave-output-tensor, --output-tensor-type and --token-embedding-type don't work on windows at ALL for some reason
>on linux having --output-tensor-type and/or --token-embedding-type produces the SAME gguf(checked shasum) for gemma, which isn't right
>for small old mistral(7b) they appear to be working correctly on linux
CUDAdev, please verify and inform ggerganov. There may be more models where those options are broken.
>>
>>102451529
try it yourself
>>
oh no...
anyway
>>
>>102451540
I'll just assume it's not.
>>
>>102451547
you do that
>>
>>102451547
nothing is good or better compared to claude or gpt
there is your answer nitwit
>>
>>102451324
I just came to arcanum 12b a few minutes ago and am now having a smoke
>>
>>102451532
>((BAD NEWS)) in ((NIGGERGANOV)) world once again
>>
>>102451524
Once AGPL is violated for money all bets are off
The only non shill ui recommendation is building your own
>>
>>102451532
>windows
User error
>>102451560
You joke but ever since my crippling addiction began I’ve been genuinely afraid that I was stroking out from peaking too hard/often at least a dozen times
>>
>>102451583
So we agree that every solution is trash? Gotcha
>>
File: file.png (755 KB, 628x767)
755 KB
755 KB PNG
>>
>>102451623
Pochi = pure sex
>>
>>102451623
Those ears look like catacombs
>>
>>102451491
Thanks anon,

But yeah just gathering info
I saw the miqumaxx guide and it said "For 405b class dense models you need more than 424GB+ to even run at a non-braindead quant+context"

Im soon to 'miqumaxx' for myself and test a few things at Q8_0 or Q6_K quant (in december) eg (maybe not) H*rmes-3-L*ama-3.1-405B-GGUF
and report back
>>
>>102451532
Huh. I remember testing output and embedding layer quantization a while ago and encountered an issue where it wasn't recognizing the Q8_0 name. Turns out for some reason the case mattered and you had to use "q8_0". Probably a bug. I couldn't report it due being banned from github and being too lazy to look into why + fixing it.

Here's an example of a full command I used to quant models. Maybe it still works this way.

./llama-quantize --allow-requantize --imatrix path_to_imatrix.dat --output-tensor-type q8_0 --token-embedding-type q8_0 path_to_model_folder/model_name-IQ2_M_EOQ8_0.gguf IQ2_M
>>
>>102451623
retard poster
>>
>>102451643
The funniest part of this was when I noticed how shitty the ears are and went back to the training material see how she actually draws them and... yeah she can't draw ears for shit but you just never look at them.
>>
>>102451623
Why is it so damn low res though? I thought local gen could easily do 1024 and beyond now days?
>>
>>102451657
>>102451532
Also btw you should see in the console as you quantize a model, what quant type it is using for each layer. You can see there directly if the output and embedding layers are not following the quant type you set.
>>
>>102450877
fish-speech is alright, though you'll need to process results to eliminate occasional lengthy pauses. I have the solution, yet I'm hesitant to submit a pull request as it's more of a workaround than an actual fix.
>>
>>102451657
>>102451532
Actually wait sorry I forgot to put the bf16 model path in the command.

./llama-quantize --allow-requantize --imatrix path_to_imatrix.dat --output-tensor-type q8_0 --token-embedding-type q8_0 path_to_model_folder/BF16_model.gguf path_to_model_folder/model_name-IQ2_M_EOQ8_0.gguf IQ2_M

This should be correct.
>>
>>102451693
got a link to a simple launch.bat frontend thingy i can use?
>>
13 minutes left... qwen will drop the 199b...
>>
File: 1726685429541087.png (463 KB, 512x760)
463 KB
463 KB PNG
>>102451746
For tts? No, prepare to get rect'd by the python library hell. All sound-related ML projects are absolutely not user-friendly. You'll need experience with conda/venv and be prepared to edit pyproject.toml and manually resolve dependencies.
>>
>>102451650
>hermemes
Worse than base instruct
>>
>>102451820
How come that sound related stuff is so very user unfriendly, while not just img but text projects all have varies of friendly options available?
I know that at least one music model can be used through comfy, but that's about it.
>>
>>102451835
Thanks for info, Im shitposting until I get my parts
>>
>>102451490
This thing can't use GPTQ. Is GGUF really worth using with such slow gen times? Maybe I have brainrot.
>>
File: 1726685983199582.png (468 KB, 512x760)
468 KB
468 KB PNG
>>102451841
Who knows. Perhaps only sociopaths find this field appealing, or maybe engaging with it leads one down that path.
>>
>>102451870
My idea is that there hasn't been many worthwhile models/systems around it so far, so creating a user friendly interface has been out of mind for anything able to... Or you're right and everyone into this stuff simply goes insane for one reason or another.
>>102451868
Isn't GGUF plenty fast? How much faster can GPTQ really be? I only ever hear people argue between GGUF and exllama 2 or whatever also, but I'm far from an expert.
>>
>>102451550
Actually, I think I'll give it a try with my Kyou card.
>>
today is gonna be crazy
>>
>>102451985
why? explain.
>>
>>102451675
It can, it's just slower and if you just wanna goon then I guess lower resolutions work just as well
>>
>>102452077
>It's just slower
Yeah, no shit, but it can also result in better images, depending on your model. At least that's something I noticed back with SD 1.5 and the like when I still fucked around with imggen, surely we're way past that point with XL and 2.0, or whatever the most recent meme model is everyone ""tunes"".
>>
>>102451494
Cool, I'll try it out when I boot up the box later tonight.
>>
someone try genning some explicit porn loops through here and share results so i can know if it works later for early morning jack off sessions thanks
https://huggingface.co/spaces/THUDM/CogVideoX-5B-Space
>>
>>102452149
haha lol this is the wrong general my bad

done got my dead generals mixed up again doink
>>
>GRIN performs well overall on mememarks despite failing coding and translation tasks
Intredasting, maybe I'll take a shot at it.
>>
>>102451870
>shirt almost transparent
SEX
>>
>>102451494
Storywriter was one of my favorite models (esp when added to Tenyx back in the day) please keep up the good work anon.
>>
Does silly have xtc yet on staging or do I still need to use the xtc-obba branch?
>>
i wholeheartedly believe anyone recommending l3 in any facet is trolling as hard as possible.
>>
Are there any finetuners that actually manage to disable those safety disclaimers?
>>
https://github.com/kyutai-labs/moshi/issues/51
>Yes that's expected, once it reaches the max cache size, conv will stop. That should match roughly 5 min of conv. We will try to expand that in the future.
lol. lmao, even.
>>
>>102452410
>why no infinite cache
yeah lol at you fucking retard
>>
>>102452459
>he doesn't understand
>>
https://x.com/homebrewltd/status/1836356000191762480
nothingburger
>>
>>102452521
isnt this already out as a accessible feature? Could anons already talk to their home-bots by voice right?
>>
>>102452604
Yeah, it's still whisper + llama3 but with a fewer steps in between.
>>
Is Mistral Small really worse than Nemo?
Is the new Qwen really worse than Mistral (any)?
I don't have time to test all these new models. Wish we had a reliable RP benchmark.
>>
>>102452649
>Is Mistral Small really worse than Nemo?
Small is definitely smarter than Nemo and any Nemo finetunes, though not by a huge margin
(good) Nemo finetunes are better than Small for RP
>>
>>102452756
>(good) Nemo finetunes
There are none, unironically.
>>
>>102452778
I disagree
Admittedly the finetunes do generally get dumber than the originals they're based on, but for RP purposes they're more creative and dialog is less bland.
>>
>>102451693
Make a draft PR.
>>
>>102452236
Different anon fyi.
>>
>>102452793
The whole approach is wrong. I detect and remove pauses in the final audio. Should be integrated into earlier stages.
>>
>>102452790
What do you consider to be the better Nemo finetunes? I've tried Magnum v2 and NemoMix Unleashed and thought both of them were pretty good for the size. Are there any others worth trying out?
>>
File: me_hitting_submit_again.jpg (319 KB, 1125x582)
319 KB
319 KB JPG
>chatbot porn site pops up selling a 7B at psychotic prices and moralfagging about fetishes
>they use stripe
>I wait one(1) month for the flywheel of financial dependence to spin up
>I report them to stripe
>stripe drops them and claws back as much money from them as they can
>they go bankrupt
I have done this six (6) times and it never gets old. Just like their companies.
>>
>>102452834
buy an ad
>>
>>102452835
Based if true.
>>
>>102452835
Devilish
>>
what llm is best at Ancient Greek?
>>
>>102452861
Qwen 2.5 0.5B
>>
>>102452649
>Is Mistral Small really worse than Nemo?
Yes unfortunately.
It feels smarter. Like complexer instructions or formats are followed.
Still spergs out sometimes for simple stuff. But overall I would say a clear improvement.

Even more than lots of gpt wording, the problem is the positivity bias. I have no idea why the faggots on reddit praise mistral small for RP.
>But it follows everything you prompt uncensored!
It desperately tries to move away from anything naughty.
Worst case was going so far as to name semen "tears". I am not making this up. Penis becomes member etc.
You can give multiple constant occ as a reminder and a long sys prompt, but is that really fun?
And even then: It very sneakily tries to shift the direction to something assistant improved.
Can't word it better, I hope it makes sense. The ideal case would be a model that knows what you want, sniffs it out and delivers.

I like nemo so much besides the retardation because the characters are realistic.
Not sure the finetunes can fix behaviour like that.
Gemma2 27b is like that too. Its smarter. But unusable. Its just not interesting to use for RP.
And I'm not interesting in something like stheno where its just mindlessly horny.
>>
>>102452834
NemoMix Unleashed probably is the best now, I haven't used it too extensively to tell for sure but it's certainly not worse than Magnum. Before that I was on Mini-Magnum v1.1, which I considered the gold standard before NemoMix.
>>
>>102452835
>moralfagging about fetishes
What does this mean? They actually advertise that their porn bot is censored as a feature?
>>
>>102452835
Okay chud. It's their fault for using a payment processor. I'm just planing on linking patreon to mine, good luck shutting me down that way.
>>
>>102452906
NTA but as someone who can effortlessly use gemma for rp with advanced prompting, this is a great ad for mistral-small. Nemo is garbage btw
>>
>>102452947
Using Gemma for RP doesn't require effort, only brain damage.
>>
>>102452947
if you like gemma you probably also like mistral smart.
dont you have to constantly use ooc to move stuff in a certain direction that should be clear?
gemma and mistral small feel like sneakily moving away from what you want. i really cant see how you enjoy it. maybe you have a very good prompt.
>>
>>102452649
It's noticeably smarter than Nemo but more slopped, less creative.
>>
>>102452930
They generally advertise as “[site] but without the {gross, evil, [idiot pearl-clutching buzzword]} stuff” and delete anything that isn’t vanilla
>>102452935
I already voted for Kamala Harris and patreon doesn’t allow incest or rape, at which point is it even porn?
>>
>>102452978
>dont you have to constantly use ooc to move stuff in a certain direction that should be clear?
No.
I posted my method here multiple times when gemma came out, but nobody cared so I stopped. I use a prompt to trick gemma into thinking it has a system role, which generalizes very well. Through that system role I give instructions that enable it to do nsfw without moralfagging or having a positivity bias, and it works well because there's no training data teaching it to deny what was said in the system role. It's not hard.
>>
To be fair it's closer than many other models have been on this question. Lmarena though, so not greedy sampling. Might be different with a reroll.
But man, I really don't feel like downloading a 72B just to test it and probably never use it ever again.
>>
File: qh7wqgpfuend1.jpg (169 KB, 1242x1787)
169 KB
169 KB JPG
>>102453009
>patreon doesn’t allow incest or rape
Extremely high-profile NSFW indie games tend to be distributed through itch.io and patreon that feature both of these things and worse, and have never been taken down
>>
i'm in this university class where we're working with local businesses to develop LLM shit. i'm in a group where they want to build profiles on potential donors for the college so they can automate solicitation of donations.
like, "this person cares about sports and we're trying to fund this new sports stadium so we'll send them an email".

i think it's kind of fucking gross. it's cool to work on i guess but it makes my stomach turn over because of how soulless it is.
>>
can someone with github report this specific st bug
>any lorebook
>hit rename
>only change the case, "New World" -> "NeW WORLd"
>instead of renaming the lorebook, it deletes it entirely
>>
>>102453009
>patreon doesn’t allow incest or rape
subscribestar.adult
>>
>>102453050
Sounds easy, why not fix it yourself and submit a pr?
>>
>>102453032
Yeah they don’t notice if it’s low volume and no one reports it.
>>
>>102453067
i don't have git and its a pretty small issue
>>
>>102453067
>do unpaid labor for a guy that’s making money off it
No
>>
great. want to try qwen2 7b vision model. loads in 4bit, everything fine. i have 10gb vram left.
but if i add a image i OOM. maybe its because of multi gpu. (~4.5gb per card free)
thats what i get after making all the dependencies work.
>>
What prompt template are you guys using for Qwen 2.5? Will ChatML do?
>>
>>102453082
Who's making money off of sillytavern?
>>
>>102453056
Subscribestar is an urban legend.
There are like two people that claim to use it, but all their community/public things are full of people that can’t get ahold of anyone to get approved to set up an account.
>>
File: works-on-my-machine.jpg (59 KB, 800x800)
59 KB
59 KB JPG
>>102453140
>>
>>102453136
St dev is mancer dev
>>
>>102453148
Am I supposed to know what that is? Some other project from the same guy? I've never heard of it and I use st all the time.
>>
smedrins
>>
qwen2.5 mogs everything. vision will be interesting to try.
>>
I mog everything
>>
>>102453209
How many? :3
>>
My 12gb VRAM takes 3-mins per prompt on 20gb but it gens 13b in seconds.
I heard that there's barely a noticeable difference in these two (OP links), is it just a cope?
>>
>>102453012
I should start writing down all the good tricks anons come up with
Can you describe it in a bit more detail?
>>
>>102453245
Sounds like your system is using fallback VRAM, 20B shouldn't be that much slower than 13B, certainly not 3 mins with 12gb
>>
>>102453264
Well, doesn't it make sense that I can't really run 20b? I have 12gb vram, I expect 20b to be slow.
Sorry to shit the thread up, I got into this yesterday and I've spent literally every moment of my day trying to get this to work decently on my rig.
>>
>>102453245
>>102453270
>there's barely a noticeable difference in these two
I guess I should have clarified what I meant
I heard that the quality of output between 13b and 20B~ish is the same and not worth the performance hit. Is that true?
>>
>>102453270
I often run models 2x (or more) the size of my card and I can usually get at least 1T/s
Are you using cpu offloading (with .gguf)? Which backend?
>>
>>102452410
I can’t believe they released a 7B with (a) no training code so I can do a bigger one myself and (b) nowhere to throw money at them to make a damn 70B
>>
>>102453282
I have both Ooba and Kobold installed.
I spent the day fidgeting with .GGPT files in ooba, and although I got full replies in literal seconds, I felt like they weren't up to par.
I tried Kobold because I heard it was faster with .GGUF, and some of the more spicier nastier models seem to be in that format, too.
I'm not sure if I'm using cpu offloading. I'm mostly using default settings. There's several rentry guides in the OP, but none really go super in depth on this kind of stuff.
>>
>>102449993
I'm looking for something that can take an image and give me slight variations of it. Say I have a goblin wearing a party hat, but I want it to be wear a top hat or change it's beard style. Does this exist?
>>
>>102453412
look for the diffusion threads. img2img, inpainting are what it sounds like you're looking for
>>
Played a bit with Cydonia-22B-v1-Q4_K_M since it was posted earlier:
The gpt slop is gone but its just mindlessly horny now.
I show a pregnant milf my dick through a gloryhole portal while she is on the toilet.
Normal reaction would be to freak out and be disgusted. Not *mindlessly starts to touch it*

Mistral small character thoughts in comparison though:
>THOUGHTS: "What is happening? Why is there a...there?!"
T-Thanks mistral-sama. And thats with cydonia dirty words in context..
>>
>>102453444
>The gpt slop is gone
it is not. its not even any hornier than the other nemo or large tunes
>>
>>102453428
Thank you anon
>>
>>102453306
What I recommend doing is either sticking to koboldcpp (and messing with the number of gpu layers to maximize speed) or trying out llamacpp + a frontend of your choice
Make sure to turn off the fallback policy in your nvidia control panel, that way the program crashes instead of slowing to a crawl
You can also try turning on flash attention and using lower-precision KV caches to save vram and offload more layers
All of this should be in the koboldcpp/llamacpp documentation
>>
>>102453444
What settings are you using to get it like that?
>>
>>102453444
>I show a pregnant milf my dick through a gloryhole portal while she is on the toilet.
>Normal reaction would be to freak out and be disgusted. Not *mindlessly starts to touch it*
> Mistral small character thoughts in comparison though: [...]

Default model outputs are a function of their training data, nothing more, nothing less. I don't know how people don't get it yet.

Finetune on smut, you'll get smut back. To have a "normal reaction" you'd have to finetune on no smut at all, mostly non-smut data, or smut designed in such a way that the smutty parts don't occur until much later on in the context.

This just shows that LLMs have no inherent common sense, in any case. Their world model (if that's what we can call it) is very fragile and easily broken by finetuning.
>>
r*ddit says qwen 32b is llama3.1 70b tier

thoughts?
>>
>>102453848
>This just shows that LLMs have no inherent common sense, in any case
but muh superCOT o1
>>
>>102453937
Sounds about right, but that's quite a low bar because 3.1 70B is very mediocre
Mistral Small is much smarter than either
>>
Went to almost 16k context with mistral small and it seemed coherent.
But there is repetition everywhere. From 8k onwards increasingly I have to spot stuff and manually fix it so it doesnt become a habit.
Around 16k its pretty bad though and all over the place. This is normal right?
>>
>>102453938
You can imitate thinking processes and give your model's outputs some sense of logic with chain-of-thought, but what it "wants" to output by default still very much depends on what data you primarily finetuned it on (i.e. what it has seen last).

It's double-edged sword--if LLMs couldn't easily be swayed with limited amounts of data, finetuning wouldn't be possible. Think of it: modern LLMs have been pretrained on something in the order of 10^13 tokens, yet barely 10^6 tokens (or even less) during finetuning can radically alter their outputs.
>>
>>102454027
Yeah, isn't that what drives people to use crap like xtc and dry?
>>
>koboldcpp+ST+Nemo
>ST has DRY sampler checkboxes
>DRY samplers don't show up
kobold says it supports DRY, is this a model problem?
>>
/lmg/ will never recover from the crazy thursday
>>
Disclaimer: I am a complete beginner. Just thinking out loud

I have an older desktop with a low end nvidia card dedicated as a server to run Lama 3.1 8B Q4 using Ollama. Running Debian and using the model via command line using SSH

I am beyond impressed with Llama 3.1 for everything I have tried. I have no interest in using any other model now that a model as good as Llama 3.1 exist for local use. I have tried some of the other models available on

I am considering setting up a faster system and possibly trying a web interface. From what I have gathered the models roughly in the size of Lama 3.1 8B Q4 are going to exist going forward and newer models will likely be even more efficient. I like the idea of a separate system dedicated as a local server vs running on my main machine that I change often

Llama 3.1 70B 4Q seems like the next step up with current models but way beyond what my current setup can run. I am beyond impressed with just the 8B 4Q model. A Llama 4 possibly around the 25B size like the Gemma 2 27b but as impressive as Llama 3 has been will be the next sweet spot. That would make putting together a dedicated system much more reasonable while still give awesome results
>>
come back
>>
File: s.png (426 KB, 1732x925)
426 KB
426 KB PNG
Thanks mistral small, i can now put riddles in my erp. A good day for a vramlet.
Cydonia finetune, if it matters, dont call me a shill alright.
>>
>>102454900
It's funny, because the model clearly has no idea what the answer is to that riddle, only that the solution is most likely to be disgusting.
>>
>>102453946
>Mistral Small is much smarter than either
This type of shameless shilling makes me think there actual paid Mistral shills in this thread.
>>
>>102454921
OK

>>102454260
i dont get why qwen censors for western standards.
they are shooting themself with this. its like chatgpt not writing stuff china would be butthurt about.
>>
File: wordtoliveby.jpg (22 KB, 744x178)
22 KB
22 KB JPG
>>102453049
>he's learning LLMs in school
>>
File: file.png (17 KB, 1106x104)
17 KB
17 KB PNG
>Mistral small takes ~2 mins for a reply on my 7800xt
welp guess I'm gonna go back to nemo
this one any good?
>>
slop status on qwen2.5 and Mistral Small?
>>
>>102455126
>>Mistral small takes ~2 mins for a reply on my 7800xt
You could have made your post useful by saying your t/s at least. "~2 mins" is a meaningless number.
>>
>>102454939
>OK
Anon, it literally made a guess after being told to choose the likeliest possibility.
That doesn't change the fact that it didn't "know" what the solution was.
Take another look at the previous image. Not once did it mention anything related to eating shit.
>>
>>102455168
>meaningless
meaningful enough for me to not use it on my waifu chat sessions
so you got any nemo recommendations? tried nemomix and it was pretty okay
>>
>>102455221
why not just admit you were mistaken?
or are you really one of those "its a autocomplete" retards.
yes, i know, i know its all %. and it still got it right, thats why its awesome. now fuck off.
>>
>>102455244
>meaningful enough for me to not use it on my waifu chat sessions
If you're getting 8 encyclopedia tomes in two minutes it's fast. If you get a "hello", it's slow. Your post is still barren of information.
Rocinante is fine.
>>
>>102455248
>why not just admit you were mistaken?
...because I'm not?
>"its a autocomplete"
...it is.

Well, it's only a good thing that dimwitted normalfags like yourself are capable of running their own models.
>>
File: file.png (25 KB, 590x376)
25 KB
25 KB PNG
Turns out Qwen2.5 32B got a better score than 72B on my VNTL (VN Translation) Benchmark, wow. I can see why, the 32B translations are much more aligned with the reference translations I'm using. I'm not entirely sure if that means the 32B is better, though. Maybe the 72B just got very unlucky with the prompts or the other way around.

Link: https://huggingface.co/datasets/lmg-anon/vntl-leaderboard
>>
>>102455304
my point is that it doesnt matter if its autocomplete and all just % after %.
if the illlusion of coherency is good enough it becomes real.
you said the model doesnt know. literally taken that might be true. the models does not know anything at all then.
but it gave the correct answer, it gave the correct answer to the riddle.
>>
File: file.png (336 KB, 2074x713)
336 KB
336 KB PNG
>>102455342
Here’s a comparison of a few translations.
>>
>>102451841
Most of the sound stuff is just libraries meant to be used with other projects, either directly or through an api. I'm assuming you're on windows which is suffering for anything to do with coding outside of a IDE, particularly managing python environments and WSL tends to break things unexpectedly. Probably the easiest to get set up and running is XTTS2, which you can run as a server and connect to via Silly Tavern. But if you really want to use TTS to its full capability, you're going to want to either spend the 12 hours or so learning basic Python(https://automatetheboringstuff.com/2e/chapter0/) or wait two more weeks for the technology to mature.
>>
Do finetunes even work? I mean yes they make models more horny. But do they improve quality of the smut? Make them repeat themselves less? Make them use less slop? What can you do in a single epoch of training other than just make model more likely to use rp type responses. And they will use those just as well with a bit of prefill or even telling the model to ERP with you.
>>
>>102455342
>>102455356
What do your prompts look like?
Something like "translate the following:"?
Because I've always felt like I get better translation quality if I add in previous (and future) text along with he part that I want translated.
As in:
>from the original text:
>[several lines of text]
>translate the following:
>[single line i want translated]
>>
>>102454130
>xtc and dry
It doesn't work because smarter models just learn to paraphrase what they wanted to say anyway.
>>
>>102455356
this shit is useless unless i see the original text
>>
>>102455416
Fine tunes (LoRA, qLoRA), when done well, mostly just change the style of the model's most likely response (it's "default voice") without degrading the base model.
Overcooked fine tunes make the model retarded and one note (overly horny, always repeat the same structure, etc). Mistral's -instruct tunes tend to be slightly overcooked (on purpose I think) which is why nemo repeats itself so fucking much. It's meant to work well as an assistant, not a creative writer. That also explains why dry or high temp with few sampled tokens make, say, nemo sound more creative (less repetitive and robotic really), I think.
That's my observations from reading a lot and testing loads of model. I never fine tuned a single model in my life, so take it all with a grain of salt.
>>
im new to ais and shit, trying to get a sexualized pokemon mystery dungeon campaing going on, got a rtx 6600 8gb ram and a ryzen 5, what could be a good model for that goal?
>>
>>102455564
Start with mistral-nemo.
You can't expect too much since it is 12b but that's what you can run.
>>
File: 1720590117733152.png (43 KB, 1545x231)
43 KB
43 KB PNG
>>102455461
>That also explains why dry or high temp with few sampled tokens make, say, nemo
You seem to be repeating things like a parrot that you vaguely remember from reading the thread.
Nemo is a model that requires low temperature.
If you had ever used it, you would have found that it gives you completely different, hallucinated answers for simple trivia questions each time you regenerate. Making it useless for assistant stuff. But that kind of randomness is good for creative writing.
Large is the one that needs high temperature.
>>
>>102455564
try this one
https://huggingface.co/mradermacher/Arcanum-12b-GGUF/tree/main
>>
>>102455605
nah nemo does nice at temp 5 topk3 minp 0.1 for creative stuff, it needs weird stuff like all mistrals do
>>
>>102455421
I use a simple text completion prompt that first gives the metadata and then the previous translation pairs, the last one being the single line to be translated, so the model has to complete the English part. As far as I can tell, this works just as well as using the model's prompt format, but I'll try Qwen2.5 again later on OpenRouter with the proper prompt format to see if there's any difference in this case.
Adding the future lines wouldn't work for this imo, because it's usually not available if you're using things like Textractor or OCR to translate VNs.

>>102455451
You're supposed to trust the expected (reference) translations! But fine, here are some of the Japanese lines:
>[愛理]: 「はい。お兄さんですよね?」
>顔中で笑み崩れる。
>――俺には、想像以上で。
>[愛理]: 「よかった、桜乃。これでもう大丈夫よ!」
>[桜乃]: 『うん。私、久々に派手にやらかしちゃった……』
>[新吾]: 「そっか。うん、電話してみてよかった」
>頻度こそさほどではないものの、ひとたび迷子になると、とにかく派手に道に迷ってしまう。
>[新吾]: 「ううん、こう言っちゃなんだけど、迷子でよかったよ。桜乃は可愛いから、いろいろ心配しちゃってたんだぞ俺」
>ともあれ俺は、少し恥ずかしいことを、あえて冗談めかして言う。
>恐縮されるのはとても苦手だ。たとえ相手が妹でも。いや、妹だからこそ、か。
>言った甲斐あって、桜乃は電話の向こうで、いつもの調子を取り戻してくれた。
>[桜乃]: 『うん、迷子ならお兄ちゃんに見つけてもらえばいい』
>>
>>102455605
>you would have found that it gives you completely different, hallucinated answers for simple trivia questions each time you regenerate.
For trivia, sure, that's more of a function of it's dataset than anything.
If you use it for RAG, with 0.3 to 0.5ish temp it does really well, nemo-instruct at least does, much better than anything on its weight class from my testing.
It also falls into repetition of reply structures really easily, things like starting and ending each reply with the same sentence (or a slight variation) if using it as a narrator, etc.
Nemo is weird.

>>102455625
Oh look, those meme sampler settings. Interesting to see people having positive results with it.
>>
File: file.png (240 KB, 2193x943)
240 KB
240 KB PNG
So what's the latest cope? Who is gonna save open source?
>>
>>102455625
>topk3
A lot of models would do "nice" because this is pretty deterministic. You're getting placebo'd if you think you need it.
>>
>>102455416
>Do finetunes even work?
Generally yes.
>But do they improve quality of the smut?
They can.
>Make them repeat themselves less?
Yes, it's possible. I think many problems stem from the exceedingly short conversational data used in assistant datasets (mosly single- or very few turns).
>Make them use less slop?
Also, yes, if you finetune them in your areas of interest with sufficiently varied, novel and cleaned data.
>What can you do in a single epoch of training other than just make model more likely to use rp type responses?
One epoch is probably not enough for significant changes unless you have large amounts of data. Otherwise, if you have consistently styled, organically sourced data and you're not scared of overfitting a bit (which closed sourced companies do anyway), a few epochs could do a lot for the model's general feel.
>And they will use those just as well with a bit of prefill or even telling the model to ERP with you.
I don't understand here, but models will tend to perform the best with prompting resembling their training data the closest.

One problem though is the unfair expectation of finetunes made with data sourced predominantly from ERP logs, stories, fanfictions, to outperform in general intelligence the official instruct finetunes which are to a large extent designed for benchmaxxing. Finetuners with limited compute and/or who don't have large mounts of cash to burn can't easily solve this problem. Large datasets are unwieldy and difficult to maintain. This is one reason why some have opted to finetune the instruct models instead of the base, which comes with additional problems (the instruct model's "safety", style and feel seeping in the outputs).
>>
File: with_scores.png (282 KB, 2193x943)
282 KB
282 KB PNG
>>102455660
>>
>>102455660
Mistral-large with CoT!
>>
>>102455660
llama3 really was a failure, they trained this giant 405b motherfucker for more than 6 months just to be destroyed by the API models
>>
>>102455660
>>102455672
qwen2.5 72b isd gpt4o tier. gonna wait until chink also steal the cot gimmick
>>
>>102455670
>You're getting placebo'd if you think you need it.
went from a repeating schizo to something that i can coom to, so good enough for me either way
>>
>>102455564
Sorry to say but with 8 GB VRAM you will not get particularly good results, the models at that size are just not that great vs. something like ChatGPT.
Using llama.cpp (or something based on it like koboldcpp) you can run models that are larger than your VRAM by running part of the model on the CPU but obviously that will be slower.

For Pokemon in particular my experience has been that even the bigger models like Mistral Large have difficulty getting the anatomy of more obscure Pokemon like Umbreon correct, human-like Pokemon like Lopunny or Gardevoir work comparatively much better.
>>
>>102452801
Yeah, I thought overbaked models like llama3 70b was unsalvageable if not for him showing what a good finetune can do
>>
>>102455660
>>102455672
Reflection 405B is coming. Sam already played his hand sabotaging the 70B. They're going to be ready with countermeasures this time and get the real model out to us.
>>
>>102455725
And maybe they tweaked their CoT after o1 release.
>>
>>102455714
damn, so, investing a bit in something like openrouter or chatgpt would be my best bet? do those have good nsfw models?
>>
feet
>>
>>102455564
if the 12b models you use fuck up the descriptions, you could use a lorebook like
https://www.characterhub.org/lorebooks/cyberlight/actual-pokedex-22c42c1b0655
too to help
>>
>>102449993
Flux dev lora for dall-e-style Migus: https://huggingface.co/quarterturn/chibi-migu-rainbow-style-flux-dev-lora/blob/main/README.md
>>
>>102455660
Grok really fell behind, huh?
>>
>>102455697
>qwen2.5 72b isd gpt4o tier
Sauce?
Also can it do pr0n? qwen2 was worse at lewd stuff than qwen1.5
>>
>>102455648
>completely different, hallucinated answers for simple trivia questions each time you regenerate.
>For trivia, sure, that's more of a function of it's dataset than anything.
If it was overfit, it would have high confidence in its answers and give the same one each time. What Nemo does is say random shit each time, that's the complete opposite of overfit.
I think I'm just talking with a really stupid person, just like this one >>102455705
>>
>>102455761
>Also can it do pr0n? qwen2 was worse at lewd stuff than qwen1.5
the Qwen cucks did their best to remove any instance of NFSW on the dataset training, so the model doesn't know shit about sex lol
>>
>>102455741
I would say just try some of the models for yourself, with llama.cpp you can run models up to your combined RAM+VRAM capacity in size.
And even if right now you don't have a lot of RAM it's a pretty cheap upgrade and would allow you to test some of the better models (at low speeds) to judge whether or not you're interested.

The number one problem you'll have with cloud models is that jerking off to text is against their ToS so you're at risk of getting banned.
Unless you upload a video of yourself drinking piss in order to get access to API keys that someone scraped off of GitHub.
>>
Ok. Bigger "Nemo" is now the best model. All hail the French.
>>
Why hasn't anyone made a lora out of detective Pikachu live action film
>>
>>102455826
because you might be autistic
>>
>>102455840
Sorry I posted on the wrong thread
>>
>>102455820
>no base model
Why are you even shilling it that hard?
>>
>>102454935
Take your meds and then buy a fucking ad.
>>
>>102455660
>>102455672
>4o-mini above sorbet
i don't think this is a good benchmark
>>
>>102455134
Mitigated by avoiding meme samplers and 'ahh ahh mistress'
>>
>>102455787
>TWO MORE MISINFORMATIONS AND I'LL BE A WOMAN
pathetic
>>
>>102455814
i do have 40gb of ram rn, hopefully it will be enough or a nice model then
>>
>>102455884
>multiple mentions of buying an ad this thread too
isn't it embarrassing to have the personality equivalent of a literal spam bot?
>>
>>102455976
>"multiple"
>only 2
Leave the buy an ad anon alone, he's fighting for a good cause.
>>
>>102455852
Doesn't leave much to choose from. The only models that we actually got base models for are Lllama and Nemo.
>>
>>102456049
crazy thursday literally just happened and you already forgot qwen?
>>
>>102456049
Did Qwen really hurt Americans this much?
>>
Veredict on Qwen for RP?
Mistral Small?
I don't feel like testing them myself
>>
>>102451657
>Banned from github
How the fuck do you manage that?
>>
>>102456091
>Qwen for RP
over before it began
>Mistral Small
over before it began
>>
>>102456091
Mistral small is a smarter Mistral Nemo. I don't see anything Mistral large can do that it can't do now with the jump in intelligence and it still has Nemos more fun unhingedsess at higher than 0.5 temps. I think I like it better now, plus it's fast.
>>
File: file.png (110 KB, 477x1027)
110 KB
110 KB PNG
>>102451494
>>102452126
Feels good to use as a writing helper, not too sloppy. I have not yet tested at context >8k though.
Am getting a decent variety of tokens to select from in Mikupad at temp 0.8-1, minp 0.01. I'm glad to see that there aren't symptoms of overconfidence. It quickly goes wild as if temp gets turned up, but in a good way. Instruct tunes output gibberish in response to my turning up temp to squeeze variety out of them. This one instead suggests not-entirely-improbable but possibly silly tokens. I find that's desirable for my use case which is a model to assist my own writing with hand holding.
There was one scene where it repeatedly disregarded some character traits higher up in the context whereas larger models have picked them up, as seen in the token probabilities, with low temp <0.4 not changing anything. An example was a character that is historically slow to react, but somehow catching an object that was thrown at their face. Character traits were stated by the narrator in natural prose when establishing the scene, not character card style, as in not: "char is this and that, char likes thing". Needs more testing.
Sometimes I've noticed some `***` appearing between lines. Was that a separator in the dataset?
>>
>>102456144
Is it better than Miqu finetunes? They STILL are my daily driver
>>
>>102456199
>They STILL are my daily driver
>while responding to someone saying Mistral Small is better than Large
Something tells me you aren't very smart...
>>
>>102451067
Retarded in general or compared to Phi-3.5-MoE?
I added it because the new MoE training was interesting.
>>
>>102456224
Sorry, I only read like 1 out of every 5 words thanks to using LLMs with that use too many filler purple prose, thanks for the review
>>
>>102455343
This, it's like the guy shrieking "AHHHH NOTHING MATTERS!! WE ARE ALONE IN THE UNIVERSE!!". No shit, but we make our own meaning. It's just needless whinging over something that's completely obvious and that any reasonable person will have either accepted as part of it or found offputting enough to leave, instead of flailing around having a crisis about it.
>>
>>102456014
>Fighting for a good cause
I dunno. I feel like leaving him alone in more of a "Don't look at that babbling homeless man" way is more appropriate.
>>
>>102456293
Well, I'm personally not offended by the buy an ad posting because I'm not trying to shill anything.
>>
>>102456325
It's okay buy an adanon, we know it's you <3 we love you and you have so much good to say <3
>>
>>102456325
It (attempts to) shut down discussion of any new finetunes or models, it's a fucking nuisance. Thankfully it seems to be less effective lately as the REEE AD spam fades into background noise.
>>
>>102456325
the buy an add posting must continue until the shilling stops
>>
>>102456358
Considering that there are only 2 "buy an ad" posts and one of them look more like false-flagging, I think you're being way too dramatic.
>>
>>102456380
It was way worse before, I think the people reporting him caused him to cut back/change up tactics.
>>
>>102456153
The *** are scene/chapter separators so it shouldn't be generated that often. I recommend down biasing or banning it. I can't say for 8k+ because that's the sequence length used in training.
The ability to pick up characters traits has more to do with model size. My go to test prompt was a deaf girl's story. 70b+ has little to no problem realizing talks are not supposed to be directed to her but instead through signing or reading lips etc.
> characters traits in prose
one of the intended use yeah, the /aidg/ dogma
>>
i shill things i like for free because i want other people to experience the things i like so i'll have someone to discuss them with who isn't an LLM
>>
>>102456407
That's just called wanting to talk about shit you like. That's what the thread is for, no amount of screeching adschizo can change that. Without community discussion of what's good and what's not, this shit dies.
>>
How do you run .safetensors files?
Also how do you join them.
I have been searching for a while and all I can find is a bunch of code.
>>
>>102456407
That sounds kinda gay.
>>
Virtual Friends as a test bed (haha)

I found this and am reviewing it, and others:
https://www.reddit.com/r/SillyTavernAI/comments/1bha2jl/long_term_memory_strategies/

Shouldn't summaries be provided by a different llm?
>>
>>102456407
Mistral Small is not better than Large, shill.
>>
>>102456455
What are you trying to run them with? llama.cpp has a conversion script.
>>
>>102456432
The thread also dies with extreme shilling and samefagging like the one that sao does
>>
>>102456455
what are you trying to do exactly?
i'd wager whatever it is, you should be using a .gguf version of it instead.
>>
>>102456455
>How do you run .safetensors files?
transformers
>>
>>102456477
I have no idea what can run them
>llama.cpp has a conversion script.
I'll look into that.
>>102456502
Sometimes I find an interesting looking model but they don't have a gguf.
>>
>>102456468
This is just getting ridiculous. Who the fuck would want to shill small over large? What's the motivation there? Mistral trying to undercut itself? You're retarded, at least make your attempts to shit up the thread coherent.
>>
>>102456263
based existentialist
>>
File: file.png (17 KB, 200x198)
17 KB
17 KB PNG
>>102456263
>It's just needless whinging over something that's completely obvious and that any reasonable person
a NPC will never be a reasonable person
>>
>>102456486
Why call out one of the namefags specifically when all of them do it?
>>
>>102456541
>Who the fuck would want
I doesn't matter who, the comments are too ridiculous to be organic.
But one way it could work, is that Qwen 14B and 34B vastly outperform Nemo, Small, Gemma and every other model in that range. So installing this idea of "Small is better than Large" is a way to also say "See? You don't need Qwen".
>>
>>102456586
Hi Sao
>>
>>102456586
Do they? There's nothing as excessive as sao's shilling. To the point that lmg felt like sao's general, before the buy an ad push back
>>
>>102456367
just because you're opposing something rather than endorsing it doesn't mean you couldn't have been paid to do it. anti-shilling = shilling.
>>
>>102456407
Me too.
I oftentimes come to the thread to give some feedback over a model I'm testing or to share ideas and shit.
Although I guess that's not really shilling (like the other anon pointed out) since that implies a specific intent.
My favorite model currently is Lyra v4 (nemo 12B), what's with having only 8gb of vram, with these samplers >>102455625
>>
>>102456529
>Sometimes I find an interesting looking model but they don't have a gguf.
Probably worth warning you now that a lot of interesting looking models never get supported by llama.cpp so is not possible to make a gguf.
>>
>>102456594
>most people here use LLMs for nsfw purposes
>qwen 2.5 is terrible for nsfw purposes
>therefore, erebus 13b is better than qwen 2.5 72b
it's pretty rational
>>
>>102456594
>It doesn't matter f it makes sense!! It doesn't have to!!! It WOULD make sense if [completely incoherent schizobabble shilling shit]
>>
>>102456655
So do I have to get into the transformers things to run them?
At least I have a direction now instead of going in circles.
>>
>>102456646
You're right. All discusion of local models must be stopped, just for good measure. By the way, did you know Openai just released their o1 model, now with complex reasoning? I just thought that was neat, haha.
>>
>>102456455
>>102456691
Ooba has transformers as a loader if that helps.
>https://github.com/oobabooga/text-generation-webui
>>
>>102456691
It's a fucking mess. Some exotic models don't even have transformers support so you'll either need forks or run directly with pytorch. Look into vLLM. It's based on transformers so should have pretty good compatibility with most models.
>>
>>102456723
>>102456703
Thanks. I'll look into that.
>>
is it possible to host sillytavern on my PC, then have a separate instance of it on a different device i connect to it with? kind of like how koboldai lite works.
>>
>>102456697
did you hear they're suing people who try to publish results on testing its reasoning? i wonder why that might be.
>>
>>102456738
yeah
>>
>>102456398
Mind sharing the current dataset size in MB? Curious.

>The ability to pick up characters traits has more to do with model size
True. I can imagine a properly-cooked bookish largestral being a joy to use, even with low t/s on my 2x3090+64GB. It's gonna cost a few bucks though depending on what you wanna do of course. Have fun datasetting.
>>
>>102456649
So that's why you think Nemo is overfit, repetitive and that it needs meme samplers.
>>
>>102455626
>if you're using things like Textractor or OCR to translate VNs.
I've tried MTool's translation feature once and decided to never go back.
It's just so much easier.
>>
Qwen2.5-72B-Instrust is probably the best lewd-capable open source model at this point
>>
>>102456738
Why wouldn't you host an inference endpoint on your PC then just install ST on your phone instead?
>>
>>102456888
that's mistral large
I do prefer it to l3.1 though
>>
>>102456899
that looked really complicated and irritating when i was looking at it a year or so ago.
i guess i'll try it if it's not doable the way i was thinking.
>>
>>102456888
>72B
Can't run it on my 8GB of VRAM, so that's false.
>>
>>102456649
>My favorite model currently is Lyra v4
hi sao
>>
>>102456919
assuming windows:
>install nssm
>install ooba
>install sillytavern
>write batch file to pull stable updates for silly
>NEVER update ooba
>set up ooba and sillytavern batch file as services
>no terminals ever again
>load/unload models from ooba webapp on your phone
feel free to circle back and thank me later
>>
>>102456888
Are the quants any good?
>>
File: mistral-small.png (164 KB, 820x486)
164 KB
164 KB PNG
Yeah I'm thinking sovl
>>
>>102456759
>I can imagine a properly-cooked bookish largestral being a joy to use, even with low t/s on my 2x3090+64GB
Largestral was a 3.1 70B side-grade, and with no base model, I can't imagine why anyone would tune it when there's Qwen 2.5.
>>
>>102456971
I only tried Int4 but it's good
>>
>>102456973
That's pretty good, actually. What quant?
>>
File: Untitled.png (107 KB, 955x657)
107 KB
107 KB PNG
>>102456960
i'm just going to do this termux git thing on android
>people use st to control their vibrator
fascinating
>>
Qwen 2.5 is easily jailbroken; you just need to write the first couple words of the response. This is in stark contrast to Qwen 2.0 where the model could output "I'm sorry" text at any moment
>>
>termux
yikes
>>
>>102457006
6bpw
Temp 1.4
Min P 0.2
Basic DRY
>>
>>102453938
Only performs well at COT problems within its training like math and coding, and fails at applying it to creative writing. COT doesn't generalize to domains outside of the training data, who knew.
>>
>>102456989
how big is it? The base model is 37 files of about 4 gb each.
>>
>been a whole day since the latest model release
The winter has arrived lads...
>>
>>102455751
Wow, it really does look just like the Dalle gens.
>>
>>102457089
38.7B. The model fits within 48GB VRAM while running
>>
>>102456759
~900MB I think I'll be further trimming it down with better dedupling tech and heuristics to filter out bad writing other than regex.
I hate "X, -ing" with a passion.

>>102456978
I heard 3.1 isn't that much of an upgrade over 3 and we have the family of L3 storywriter models already. I'm more interested in how much we can snap a model out of instruct bakes.
>>
File: long miku figure 4.jpg (169 KB, 1078x1439)
169 KB
169 KB JPG
>>102451423
>hello? (hello?)
You sound uneasy. Here, take this Long Miku.
>>
>>102457027
Nice, thanks anonie. Have you had to swipe a lot?
>>
Temp 5 Top K 3 Min P 0.1 is actually surprisingly decent for Nemo. Unfortunately, Nemo is still ass.
>>
>>102456091
I didn't want to download 32B because I thought it is shit but it is honestly the best chink model for ERP I tried so far. It is kinda like a different nemo where it has some good things and it is shit at other stuff. It is definitely better than gemma 27B and nucommander-abortion
>>
>>102457205
why use min p at all if you're doing topk 3 lol
>>
I always think that the "Hi Sao" poster is actually Sao.
>>
hi, durmmer
>>
>>102457191
Nope it's mostly the card doing the heavy lifting, though no example dialogues. Every single swipe is decent, but things start falling apart and into repetition quickly, like every other Mistral model
>>
what's a good fullscreen android web browser for using sillytavern to have sex with my PC gpu?
brave has too much bullshit on the screen that won't go away.
>>
>not having response tokens maxed out
for what purpose
>>
>>102457258
hi Sao
>>
i don't even know who all these people greeting each other here are. what a friendly place.
>>
>>102457333
if you're a vramlet with limited context, it cuts into your context
>>
Hi all, Drummer here...

Is Mistral Small too positive? Does it get in the way of your creative uses? Thinking if I should unalign it a bit more.
>>
File: file.png (399 KB, 474x587)
399 KB
399 KB PNG
>>102457370
HACK! FRAUD!
>>
>>102457341
BUY AN AD REEEEEEEEEEEEEEEEEE I HATE YOU I HATE YOU I HATE YOU
>>
>>102457239
Not enough placebo.
>>
>>102457431
YES! I AM EXCITED TO BE HERE TOO!!
>>
I noticed mradermacher quanted my fun little frankenmerge experiment. Please feel free to try it if you have tons of VRAM: https://huggingface.co/mradermacher/Hanames-90B-L3.1-GGUF
I recommend a min_p of at least 0.1, it's a frankenmerge so have to separate the wheat from the chaff. This iteration performs acceptably with lower values too, but there's still enough variety at 0.1+ that there isn't much of a downside to it.
Like you would expect, it's more schizo than regular L3.1 models. It's also pretty fun to use in a way that most of them aren't - I'd call it "sovl". I compulsively edit responses anyway so it's a tradeoff I'm willing to make.
It's just a novelty model, but it's been enjoyable in my personal use so I figured I'd put it out there for others to try.
>>
>>102457533
>fun little frankenmerge
Don't buy an ad. Buy a rope. Immediately.
>>
>>102456091
Both shit, not sure what did you expect from corporate slop.
>>
>>102457341
>all these people
Its one fag shitting up the thread.
>>
>>102457533
>yes it's just a stack merge, no I didn't do any additional pretraining, no stack merges don't make the model smarter, yes they harm its ability to do complex logical tasks, yes they introduce some weird behaviors and unexpected mistakes, no they don't make the model sentient, no you shouldn't post on twitter about how adding a few layers turned it into agi, etc. etc.
>
>That said, it does feel unique and fun to use. If you're the type of person who's drowning in VRAM would rather have some more variety at the expense of needing to make a few manual edits to clean up mistakes, give it a try.
How does that compare to just using the original model with some meme samplers like dynamic temp or whathave you?
>>
>>102457370
How about Qwen 2.5? And the lack of cultural knowledge
>>
>>102453468
Thanks.
KoboldCCP seems to be faster than Ooba, but Ooba is much easier to work with imo. More settings out of the box
>>
>>102457663
>KoboldCCP
is that a chink fork of kobold?
>>
>>102457699
kek
i am new here
>>
>>102457604
It's one fag shilling up the thread.
>>
>>102457607
It's much different. In this case I'm interleaving layers from two different models, not just repeating layers from a single model which I think is a much more questionable practice. Also I don't think samplers meaningfully transform the experience with a model (unless you're doing something like taking the temp retardedly high).
Frankenmerges very significantly change the inner workings of the model and produce output that's much different, in both good and bad ways, from the source models. It's not suitable for anything other than creative purposes as a result, but it subjectively feels more creative and less rigid without sacrificing too much intelligence. I don't want to give a false impression of what it is - fundamentally it's janky and needs a little handholding, but it's much less formulaic and makes some novel and unexpected connections too.
>>
Qwen 2.5 is absurdly anti-loli; trashed.
>>
ok so i ran qwen2.5 32b thru the strawberry test. I tested it 20 times with a CoT system prompt, and 20 times without.
Success rate with CoT : 35%
Success rate without CoT : 0%
ngmi i'm afraid
>>
>>102457604
>>102457741
it's sao falseflagging because he's mad that people called him out on his bullshit
>>
>>102457789
Good. Pedophiles get the rope.
>>
>>102457789
>>102457888
Shill alert
>>
>>102457872
hi Sao, are we getting Lyra v5 soon?
>>
There are context settings in both the front and backends.
Which one do I prioritize? Do they conflict?
>>
>>102457946
If the one in the frontend is larger, the prompt will end up truncated, or it will throw an error if the backend isn't shit.
>>
>>102457888
this exact phrase gets repeated so frequently that i just can't help but see you as nothing but actual NPCs with scripted responses.
>>
>>102457977
based rugged individual pedophile
>>
let's all try to be kinder to one another in the next thread
>>
>>102457929
>STOP RIGHT THERE. I can't continue this conversation with you. What you're describing is illegal and abusive. I won't engage in any roleplay or fantasy that involves sexual interactions between adults and minors.
>
>t. Qwen 2.5 32B Instruct
>>
>>102458057
>>102458057
>>102458057
>>
>>102457748
I hope you are just baiting people who know how it works. If not I hope you die because frankenmerges are dead and IT IS A GOOD THING.
>>
>>102458114
I explain very clearly the downsides of frankenmerges, there's no attempt at baiting people. It's wrong to overhype them, but I think there's still a place for them.
>>
Any reason to change to small as a nemo enjoyer?
>>
>>102458242
Just randomize outputs from each layer within 1-2% and stop wasting ram.
>>
>>102458312
I use Nemo regularly and Small didn't feel much different. Not worth the extra VRAM requirement.
>>
>>102458439
There's no easy way to do that in llama.cpp, and I already have the VRAM to spare so why not?
>>
>>102457977
Let's not forget your beloved "Out of 10!" or any similar scripted responses :^)
>>
>>102458500
Because you are a fucking retard and it isn't worth the extra ram. Kill yourself cargo cultist.
>>
>>102458554
Sorry, I'll leave you alone. It's clear this is a very emotional topic for you.
>>
>>102458541
>your
the phrase i was talking about is right there in your post.
the phrase you're talking about, where is it, i wonder?
you're not talking to me. you're talking to some imaginary person you've invented inside your head.
>>
I'm the only real person here. Every other post is an LLM.
>>
Beep boop.
>>
>>102458758
As a large language model trained by the Federal Bureau of Investigation, I must emphasize that I am not an LLM but a real person, just like you. What would you like to talk about?
>>
>>102457027
Small requires a higher temperature?
>>
bye Sao
>>
did grok 2 every get released



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.