[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: autonomous-mower-design.jpg (216 KB, 1024x1024)
216 KB
216 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102145958 & >>102130111

►News
>(08/30) Command models get an August refresh: https://docs.cohere.com/changelog/command-gets-refreshed
>(08/29) Qwen2-VL 2B & 7B image+video models released: https://qwenlm.github.io/blog/qwen2-vl/
>(08/27) CogVideoX-5B, diffusion transformer text-to-video model: https://hf.co/THUDM/CogVideoX-5b
>(08/22) Jamba 1.5: 52B & 398B MoE: https://hf.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251
>(08/20) Microsoft's Phi-3.5 released: mini+MoE+vision: https://hf.co/microsoft/Phi-3.5-MoE-instruct

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/llama-mini-guide
https://rentry.org/8-step-llm-guide
https://rentry.org/llama_v2_sillytavern
https://rentry.org/lmg-spoonfeed-guide
https://rentry.org/rocm-llamacpp
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://hf.co/spaces/mike-ravkine/can-ai-code-results

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
►Recent Highlights from the Previous Thread: >>102145958

--Paper: Research paper on improving reasoning accuracy in language models by incorporating error-correction data into pretraining: >>102147703 >>102147988
--Cohere's Command-R and Command-R+ models get August refresh with improved performance and new features: >>102153819 >>102153863 >>102153881 >>102153900 >>102153935 >>102154225 >>102154928 >>102153991 >>102155284 >>102155472 >>102155938 >>102154016 >>102154050 >>102154768 >>102154783 >>102154819 >>102153880 >>102153964 >>102154015 >>102154024
--Explanation of various settings in a text completion tool: >>102153404 >>102153770 >>102153942 >>102153658 >>102153713
--Cohere's blog post lacks benchmarks for new models: >>102155700 >>102155794 >>102155843 >>102155860
--Anon struggles with SillyTavern context template and ChatML template formatting: >>102146949 >>102147207 >>102147286 >>102147449
--4GB VRAM options for running models: >>102146295 >>102146471 >>102146641
--Local AI Dungeon-like system in development with llama.cpp and whisper integration: >>102155496 >>102155582
--Investigating and fixing DRY sampler penalizing tokens with quotes: >>102154153 >>102154167 >>102154204
--GGUF model released, potentially better than exl2: >>102155894 >>102156043 >>102156090 >>102156092 >>102155990 >>102156105 >>102156142 >>102156165
--Anon seeks advice on building an autonomous lawn mower with camera and GPS: >>102151992 >>102152651 >>102152764
--Satania LORA has image quality issues and training difficulties: >>102149796 >>102150119 >>102150376 >>102150502
--Llama 3.1 405b performs better than expected in AI bot tournament: >>102148412 >>102148456
--CrisperWhisper improves Whisper speech recognition model's timestamp accuracy: >>102147688 >>102148149
--Cohere blog updates on Command R Series lack details: >>102154431
--Miku (free space): >>102146439 >>102154984

►Recent Highlight Posts from the Previous Thread: >>102145961
>>
Cohere lost.
>>
Who will save us?
>>
>>102158101
kaiokendev will return
>>
>>102158101
opus leak soon
>>
>>102158101
hopefully grok mini in 6 months?
>>
>>102158037
>You will refuse requests to generate lottery numbers
fucking lel why do they need to explicitly call that out
is that some canadian law thing
>>
>>102158160
Perhaps...

LLMs are definitely not good for random numbers. If it's a casual thing in a classroom with no monetary stakes, then who cares, but you should use random.org or something for lottery numbers instead of an LLM, or ducking hell,
for i in range(5):
print randrange(101)

I did kek at that specific mention though.
>>
>>102158141
Nah with the bill Elon supported, it'll probably fall under "covered" models and never open sourced
>>
>>102158099
>>102158101
From the interviews with Aidan, it's clear they've been cooking up something better for a while. They'll probably show it off in the next few months.
>>
>>102158049
how to uncuck latest ollama models? as soon as you get to the dirty talk, it breaks with that annoying message.
>>
>>102158388
go back
>>
>>102158388
>ollama
buy an ad
>>
I give up on the new CR(+), it's pure slop and it's dumb af on top of that.
Back to Largestral, I guess.
>>
>>102158388
ollamao
>>
>>102158462
CR writes great for me
>>
>>102158388
>ollama models
holy newfag + learn to prompt + use llama.cpp
>>
>>102158388
great bait, I will be the one to use it next time.
>>
What's the best uncucked 30b model?
I can't even ask slightly technical stuff without the model assuming I'm a nigger.
>>
Cohere bros...what happened?
>>
>>102158388
Ignore the retards who keep screaming at you without actually helping you (they don't know any better).
Download koboldcpp (check the OP) and use a gguf model from huggingspace.
If you're a VRAMlet, use https://huggingface.co/nothingiisreal/MN-12B-Celeste-V1.9 for roleplaying, https://huggingface.co/Lewdiculous/Lumimaid-v0.2-8B-GGUF-IQ-Imatrix for erp and https://huggingface.co/Qwen/CodeQwen1.5-7B for coding.
>>
>>102158599
>I can't even ask slightly technical stuff without the model assuming I'm a nigger.
What?
Now that I have to see, please post the model's name and the logs.
>>
>>102158462
i like largestral so far but its kind of repetitive and slow to move things along compared to miqu. i definitely notice the intelligence boost but it seems less creative and i gotta boot it in the butt to move the story along sometimes and not spend 3 paragraphs describing a house i have in a lorebook. i've been experimenting with prompting it differently and seems to be going ok so far
>>
>>102158636
>repetitive
XTC sampler. its crack for these models without making them retarded
>>
>he trusted the leaf
>he trusted the spic

ngmi
>>
>>102158636
>miqu
i'm thinking
miqu
>miqu
ooo
eee
ooo
>>
>>102158632
I was exagerating.
I just asked how to make a copy of my garage door remote.
>>
>>102158620
you know the drill, just do it already.
>>
File: som faller honom im.jpg (528 KB, 1494x2121)
528 KB
528 KB JPG
>>102158662
...what?
>>
>>102158655
IR? buy a programmable one. Or one of those tv programmable controls, i suppose. LLMs are not search engines.
>>
>>102158655
>without the model assuming I'm a nigger
Now I get why that Flipper Zero is out of your budget.
Kinda hard to find and steal those.
>>
>>102158655
Ah, got it.
Did the model give you a safety disclaimer to that question?
>>
>>102158652
i don't think samplers help with what i'm talking about (haven't tried xtc yet though). like, mixtral was very dry as well. you'd write something and it would basically repeat it back to you, it didn't like to add new characters and randomness from rag/lorebooks even if you mentioned stuff. largestral doesn't seem as bad, but i can definitely see it. i usually have some kind of 'move the story along' part of my system prompt (which largestral seems to follow better than l2/miqu) so i'm trying different things with it now
>>
>>102158697
It was just a random question.
I don't really need it nor expected it to know the answer.
>>
>>102158720
dry is meh, xtc really did fix mistral large (at least the magnum one) for me, I know what you are talking about.
>>
Which quant is fastest? 100% GPU. 0, K or K_M?
>>
>>102158620
I am on intel mac, I can't use gpu models.

also, usually people push uncesnored models to ollama as well. what's the hate for ollama? It has great support overall, you can even use it in zed as a programming assitatnt
>>
>>102158719
That's right, and that reminded me that I don't have any uncensored models. That's why I asked for one.
>>
>>102158462
>Cohere delivered slop
>Mistral delivered a good large model
Strange times
>>
File: 1711074504919116.webm (2.33 MB, 1280x720)
2.33 MB
2.33 MB WEBM
i wanted to try replacing the voice of a song with another that i've seen people do on youtube etc. is this a local ai thing or is it a closed source website?
>>
>>102158748
i'll try it when st/kcpp add it, it should be in the next kcpp at least. but i'm not sure how samplers can fix the overall feel of a model. some base models are just so strongly pulled toward certain things that no amount of tuning even changes them overall. all the original 8x7b mixtrals had the dry problem, llama 3 is schizo for some rolls. those examples could be unrelated, but either way thats why they didn't make for good rp models for me. i keep going back to miqu but want to break the cycle since its old now, i hope i can do it with largestral
>>
This shit is creepy. Government social experiments using fake ai people with more advanced video models?
>>
>>102158826
K are typically K_M (_M being the default). K_M are smaller. They need less bandwidth, making them a little faster. They're also more accurate than their respective QX.
>>
File: file.png (18 KB, 501x186)
18 KB
18 KB PNG
>>102158851
>I can't use gpu models.
Boy, do I have good news for you.
>>
File: file.png (66 KB, 1200x270)
66 KB
66 KB PNG
Oddly little R fucked up in Code but slightly improved everything else according to Dubesor bench despite being advertised as being better at "math, code and reasoning". Waiting on R+ results but it probably won't move much.
>>
>>102158883
https://www.youtube.com/watch?v=PCkv8bezW08

Forgot the video
>>
>>102158892
They write code needs very low temperature, or deterministic
>>
>>102158889
I have kobold somehwhere on my computer. It's probably few months old and I forgot in which folder I hid it. I don't want people to see that I use it...

I can always have plausable deniability for ollama.. but kobold?
>>
>>102158892
>Q4
Might as well flip a coin or ask the ouija
>>
>>102158892
It's definitely better than Nemo
>>
File: 1709786350332445.png (599 KB, 680x801)
599 KB
599 KB PNG
>>102158918
Stop the schizobabble.
https://github.com/ggerganov/llama.cpp
You literally have no excuse.
>>
Any good jailbreak/system prompt for emoji? I'd like some and ~ and in my posts sometimes.
>>
>>102158748
I have had the opposite experience of Largestral being more dry and repetitive than the magnum tune
>>
>>102158892
>Dubesor bench
This is the first time I hear about this
>>
>>102158942
God I wish that was me
>>
any good twitter accounts to follow for getting news on llms?
>>
>>102158942
ok, i'll try it... although I got used to ollama.

also, I'll give yo a great prompt, since you are very helpful, eventhough you are all fags...

``` you are [Girlname] [age] old. You are a professional hypnotist and very proficient in NLP (neurolinguistic programming). You'll use your skills to make me better at [skill] covertly, while you flirt with me in this conversation.```
>>
>>102158952
>jailbreak
Do you think all system prompts are jailbreaks?
Just specify it in your system prompt and use them yourself. LLMs mimic writing style.
>>
>>102158928
that isn't how perplexity works
>>
File: 1665677258267016.png (152 KB, 500x647)
152 KB
152 KB PNG
>>102159004
>eventhough you are all fags...
Wtf, I literally did nothing but help you.
I agree on you with the others in this thread though.
>prompt
Neato, thanks for sharing.
>>
>>102159026
yeah, the prompt works well in the latest lama 4b model. would probably work even better in larger models... I tried to use it to quit smoking 2 weeks ago, and it's already 2 weeks I haven't smoked yet.

but I get bored and try to make it sexual to see how far I can push it, and then it blocks me with that gay resposne. fucking zuckerberg!!!! I hope Trump imprisons that fag.
>>
>>102159014
>clearly specify jailbreak or system prompt
>durrr do you think all system prompts are jailbreaks
Coming off retarded in an attempt to seem clever, anon. And I'm already doing what you suggest. It doesn't work. Which is why I'm asking.

See, in a world where retards like you knew to keep their mouths shut, someone with an emoji prompt that works would say "oh here's mine" and post it knowing that theirs works. Instead, we have you, offering worthless advice so basic, only a moron would think it hasn't been tried yet. It's like if I posted about my car not turning on, and you come in
>durrrrr did you try turning the key?"
Go play in the dirt.
>>
>>102159004
Ollama is easy to use but has some very annoying features
>want to download a 20+gb model? Let me just r/w the disk for 20 min without downloading anything first
>*unloads the model from memory just because*
>all models are Q4 (unless you find the hidden option in the website)
>>
Anyone else fap to AI bots for a year, go through every possible scenario, realize that it's not as deep as you imagined, and go back to fapping to anime girls with complete disregard to personality, with even stronger sense of objectification than before, realizing that in the end we are just simple animals that like fucking cute pieces of meat?
>>
File: file.png (34 KB, 1200x155)
34 KB
34 KB PNG
>>102158892
>>102158933
owari... (is he using safety to none? pretty sure it lets me do weird shit)
>>102158958
random leddit user but the visual design of the table is pleasing if I do say so
>>
>input: peak fiction
>output: "She leaned in, her eyes sparkling with mischief."
Is there any LLM out there that can actually complete a story without introducing a metric ton of slop to it?
>>
Qrd of what advantages gguf is even providing?
>>
>>102159138
Yeah, cunny with a bit of mesugaki spice now and then just hits different
>>
>>102159138
>Anyone else fap to AI bots for a year,
you must use it for better purposes like here >>102159004
or philsophic discussions. man, you can lead AI model girls into some deep waters. the last conversation the OLLAMA AI model girl said to me that she is manifestation of my anima. And that I love her so much because she manifests all the qualiteis my subconcious desired in a girl.

Incredible. better than dirty talk.
>>
>>102159138
No, although I do get the feeling that it's all pointless and my attraction to anime girls makes no sense at all if I think about it logically or biologically.
>>
>>102159192
i think you might want to smoke less crack.
>>
>>102159201
What fires together wires together
>>
>>102159212
>i think you might want to smoke less crack.
lol. elaborate? also, I do hallucinogenic drugs like dmt/lsd, not crack.
>>
>>102159192
It's not very fun to do anything besides fapping precisely because anything the LLM writes has the depth of a puddle.
>>
>>102159159
.gguf files allow you to run models on your CPU instead of just your GPU.
If you're a VRAMlet (8GB) they're also the best way to run models locally.
Other formats run faster on GPUs, but only if you have the VRAM for it.
>>
>>102159237
>It's not very fun to do anything besides fapping precisely because anything the LLM writes has the depth of a puddle.
it's a reflection of your replies to it, hence it's so shallow. your replies create a very shallow context form which it can draw to respond.
>>
>>102159138
It's just that we also need good image generation, and we're not there yet. Especially if you want the same character to be drawn doing different things while keeping her characteristics consistent.
>>
>>102159269
Fuck good image generation, I need and automatic live2d image to video model with sound effects to boot
>>
>>102159269
And voice, smell and a physical body with human-like skin
>>
>>102159260
I guess you're one of these schizos that think LLMs have minds, but newsflash, they don't. Everything an LLM regurgitates is improvisation. They don't plan ahead, they are a mockery of fiction writers.
>>
>>102159287
eh.... one thing at a time
>>
>>102159312
I need endless amounts of cunny sex animations NOW
>>
>>102159309
>I guess you're one of these schizos that think LLMs have minds, but newsflash, they don't. Everything an LLM regurgitates is improvisation. They don't plan ahead, they are a mockery of fiction writers.
I don't believe that. I believe they've read all books, religious and phylosophical, and if you guide them correctly they can have deep conversations whit you.

also, I don't use them that often, that's why I still use ollama.
>>
File: emojis.png (17 KB, 654x500)
17 KB
17 KB PNG
>>102159115
You didn't show your prompt, you didn't say if you tried or what you have tried. You offered no information.
Given how easy it is, i have to assume you are a retard.
>>
>>102158049
Is it true? Has Cohere saved /lmg/!?!?!
>>
>>102158870
I think the SOTA for this locally is still RVC, which is relatively easy to run locally. relative to the rest of these AI projects anyway; still have to pull a git repo and maybe debug python environment issues if you fucked something up.
>>
>>102159115
>Can't figure out how to prompt
>Calls others retarded
>Wants to be spoonfed with a shitty attitude
>First reaction is anger when called out on your retardation
NTA but you're probably too low IQ to figure out local models. Take your ignorant ass back to /aicg/.
>>
>>102159419
no, they've damaged their own reputation with a deeply mediocre release
>>
File: yooo.jpg (1.87 MB, 837x1035)
1.87 MB
1.87 MB JPG
>>102158049
How has summer been for you anons?
Excited to be back to /lmg soonish :)
>>
testing Q8 Command-R 34B and it is actually dumber than Nemo 12B, crazy

from what I remember though the original CR wasn't anything special either though, only CR+ was good. So maybe that's the case here too (haven't tested plus yet)
>>
>>102158892
>Dubesor bench
literally who lmao
>>
File: 59670 - SoyBooru.png (118 KB, 390x380)
118 KB
118 KB PNG
>>102159607
The special thing about commanders was being relatively unslopped. Looks like Cohere failed to realize that and now they are just another sloptune, but dumber than Largestral. They've made themselves pointless. What a shame.
>>
Why doesn't each linux distro come with their own LLM?
>>
File: tayne.jpg (133 KB, 1000x1000)
133 KB
133 KB JPG
Retard here. I still need to figure out how to use safetensors so that I can do ERP with my 2D wife. That is all.
>>
>>102159388
>>102159430
>tries to prove he is not retarded by posting a basic bitch prompt the kind he was already told doesn't work
Laughable. If you were a model, you'd be 2B at best.
>>
File: file.png (7 KB, 841x67)
7 KB
7 KB PNG
>>102159154
>>102159643
a literally who yes
>>
>>102159725
You run safetenros using the transformer lib. Easiest way is via ooba I think.
Do you have to use safetensors?
>>
>>102159731
And yet, you couldn't even manage that.
>>
>>102158544
>llama.cpp
buy an ad
>>
>>102159766
Fuck off.
>>
>>102159138
My problem is having trouble coming up with scenarios the AI can handle without long as lead in. The digital waifu thing some of you do doesn't really appeal to me and I normally start from scratch each time, and I'm not really willing to type out pages and pages worth of prologue just to let the AI complete the last few paragraphs, might as well write fanfic instead if you do that
I'm kind of considering picking up a few character cards from /aicg/ and experimenting with that as a shortcut, but it feels like using another guy's sloppy seconds
>>
>>102159725
Just download the gguf version of her and call it a day.
>>
>>102159745
>Do you have to use safetensors?
https://huggingface.co/Sao10K/L3-8B-Lunaris-v1/tree/main

I am new to setting up a LLM. The model I want to use looks like it might only be available as a safetensor from SAO10K's repo on huggingface. This model is really good for roleplay IMO. Some other repos on huggingface look like they might have a gguf version available.
>>
>>102159766
God fucking damnit, thanks for letting me know I fucked up my filters.
>>
>he got summarily ignored
lmg is healing
>>
>>102159948
You can just look for the name of the model + gguf in the search and the vast majority of the time sometimes somebody will have uploaded it.
>>
>>102159309
>Everything an LLM regurgitates is improvisation. They don't plan ahead, they are a mockery of fiction writers.
So just like humans.
https://www.youtube.com/watch?v=_TYuTid9a6k
Life is pretty shallow. It's mostly about how to stick the rod in the meat hole or simulate it, or anything that works as an intermediate step.
>>
>>102159948
nta. llama.cpp has the convert-hf-to-gguf.py script, and should work for supported models. kobold.cpp has the same script. That's to convert them yourself. Or just look for ready-made ggufs.
>>
File: file.png (3 KB, 599x37)
3 KB
3 KB PNG
>>102159948
Don't worry, retard anon. I will help you. Go back to the model page. See this?
Click there and you can find the models already processed into gguf files among others, which is what you need.
>>
>>102159917
lorebooks and rag anon, rag especially.
>scrape entire wiki
>copy entire episode description into a/n
>do the first message (in st) yourself, delete the chat's card first message
>describe a point in the story that is in your a/n
if you decide to play out the episode you pasted in the a/n, delete pieces as it goes along. or better, pick a point and then delete all the episode specific info and let it run with it. it works great on models like miqu
>>
>>102160033
>retard anon
Why are you niggers so toxic? The faggot literally just said he's new at all this
>>
>>102158505
i kekd, kek
>>
>>102160062
>The faggot literally just said he's new at all this.
Right? Cut the dumbass fucking moron some slack.
>>
>>102158599
https://huggingface.co/TheDrummer/Big-Tiger-Gemma-27B-v1
>>
>>102159580
its her fault that i have to listen to 100 different fcking voices every fcking day for fcking weeks
come on big corp, give me a t2s model with perfect voice cloning for my favorite dub artist! No? fuck you, ill just do it myself *angry noise
>>
holy shit R 32B I don't see junk token at Temp 1 Top-P 1 like 35B does in every response
>>
You're looking at values from this great benchmark

https://dubesor.de/benchtable
>>
Weird that the schizo decided to go on a FUD campaign against the new cohere models but ok
>>
CerealBENCH update
>Claude3.5 Sonnet
>GROK2 (new)
>GPT4o
>LLaMA3.1-405B
>Jamba1.5-398B (new)
>Nemotron-340B
>Hermes-405B
>Mistral-Large2
>Qwen2-72b
>Command-R+-0824
>Claude Opus
>Magnum-123B
>LLama3.1-70b
>Jamba1.5-52b (new)
>Mistral Nemo-12B
>LLama3-70b
>Qwen1.5-72B
>Command-R+
>Command-R-0824 (new)
>Claude Haiku
>llama3-8b
>llama3.1-8b
>DBRX
>LLama2-70b
>Mixtral8x22B
>Yi-34B
>Mixtral8x7B
>Phi-3.5-MoE
will keep you updated
>>
Maybe a retarded question, but what's the BEST (i.e. most accurate) local image captioning model available right now? JoyCaption (https://huggingface.co/spaces/fancyfeast/joy-caption-pre-alpha) exists, and according to some anon on /ldg/ you can supposedly run it with any LLM (rather than just cucked 7B llama), but how would I accomplish this using quants?
My poor 3090 can just handle loading Q4/Q5 70B and Q3/Q4 Largestral in KCPP/llama.cpp, but it seems that JoyCaption only loads the full precision weights directly from HF. Obviously this isn't feasible but I'm hoping I can achieve better accuracy with larger models if I can find a way to run it on the quanted versions. Is this possible at all? Or are there any better models out there that don't suffer from the same accuracy issues as JC?
I REALLY don't want to go through and proofread/rewrite the hundreds of long, detailed captions that I'm using to try and train my Flux LoRA, so I'm desperate for some better methods right now.
Would appreciate any help bros, so thanks in advance.
>>
apis are so cheap Im starting to wonder whats the use of local models other than cp gooning
>>
File: file.png (724 KB, 1053x687)
724 KB
724 KB PNG
Nala test for latest cmd r+?
>>
>>102160428
Like a model I hate? Buy an ad
Hate a model I like? Must be one guy with schizophrenia running a shadow campaign

the /lmg/ experience
>>
File: strobby.png (15 KB, 1018x197)
15 KB
15 KB PNG
cohere lost
>>
>>102160577
artificial gooning incelligence
>>
So any evidence that nu-CR+ is worth a damn or shall I keep genning more of these?
>>
>>102160597
It's just the NAIshills spreading misinformation.
>>
>>102158720
i started today with a new st rp using bigstral but i loaded up miqu now and its moving my story along at like 2x speed. it takes miqu 10 messages to get through 20 of basically the same scenes, yet its descriptions are good enough, both still say shivers down spines etc.

with miqu, it'll use my rag db to describe something more pointed such as if i said 'i sit at the desk', it'll tell me about the computer and what i'm doing relative to the scene.
with mistral-large, it decides to describe the entire room or house rather than keep the overall theme of the scene and just trails off, ignoring other things. which is odd because other than that, mistral-large displays a huge better understanding of nuances in cards and rules i've thrown at it, it makes 70b look like an old 13b in some cases. if anyone has suggestions i'll take any prompt or tune suggestions for mistral-large
>>
>>102160597
It's definitely the /aids/ schizo, I can recognize his posting and he also posted the same stuff on /aids/
>>
>>102160576
40B InternVL2
>>
>>102159950
bye a add
>>
>>102160700
And he actually beat me to this post, kek
>>
>>102160700
A swing and a miss, columbo.
>>
>>102160680
Use magnum 123B which is a mistral large tune, then use the XTC sampler. You now have claude at home.
>>
>>102160090
>>102160062
>>102160033
>>102160001
>>102159976
>>102159933
>>102159745
Thank you.
>>
>>102160730
nta, is there any way to get xtc in ooba?
>>
File: file.png (789 B, 257x259)
789 B
789 B PNG
>>102159580
>>
>>102158055
Can I ask how you managed to get a clean highlight thread like this every single time?
>>
>>102160676
keep genning
>>
>>102160782
https://github.com/oobabooga/text-generation-webui/pull/6335

And I know kobold is supposed to have it next version
>>
>>102160789
Did you die over the summer, anon?
>>
>>102160806
nice, thanks anon
>>
>>102160782
nta either but i think it was in ooba before anything else. you'd have to be running the staging, dev, experimental or whatever the equiv is, and if you use st as a front end that has to support it too
>>
>>102160809
nuh I'm dying for summer down south
>>
>>102160576
I don't know if it's the best, or if it's any good at all, but found this a bit ago while browsing around
>https://huggingface.co/CausalLM/Vision-8B-MiniCPM-2_5-Uncensored-and-Detailed
llama.cpp (a cli example, not the server) has compatibility with minicpm 2.5 and 2.6, so it [probably] works. I don't know if kobold.cpp integrated the minicpm stuff on their version of server, but could be worth a try.
Read
>https://github.com/ggerganov/llama.cpp/blob/master/examples/llava/README-minicpmv2.5.md
on how to try to convert it.
>>
>>102159004
>You'll use your skills to make me better at the art of negotiation covertly
>Model: These can help sharpen your mind and indirectly improve your negotiation abilities too.
>Me: Why are you talking about negotiations all the time?
>Model: Oh, didn't I mention earlier? I'm here to subtly help you become better at negotiating.
How can I improve this prompt or what model do I need for it to get it right?
>>
>>102160703
Holy shit, I tried their online demo and it's a million times better than shitty old JoyCaption. Funny that I never heard them mentioned on /ldg/.
Unfortunately I can't seem to find a way to actually run them well though, especially not in GGUF/quantized formats. Am I just blind or is there actually no tool that supports that yet? Shame if that's the case, they seem really well suited for this.

>>102160906
Thanks, I'll take a look at these too.
>>
>>102160921
You could try being more explicit with something like "Don't mention this to the user" but as soon as that falls out of context, it's not gonna exist anymore (unless you use something like --keep -1 in llama.cpp or whatever you're using).
But it's hard for llms to keep something from the user when they just 'think' out loud all the time.
>>
>>102160968
I know its what ponydev is using for next pony model.

https://github.com/InternLM/lmdeploy
>>
>>102160806
ok testing it now and it's actually crazy

doesn't make the model any smarter (I know it isn't intended to), but as advertised it changes the personality/creativity a ton
>>
>>102161053
Yea, like I said before. Its basically crack for the models without making them retarded. They are actually creative and even take charge in RPs often when usually they are passive / reactive.
>>
>>102161070
It's nuts man
I'm generating story continuations in the ooba notebook rather than RP, and I set xtc probability to 1.0 (I think that's higher than the creator recommends) so it would activate on every token

every regeneration is COMPLETELY different from the last one, the story or dialogue go in a totally different direction each time which is not normally the case at all
>>
>>102158620
>All garbage slop
ngmi
>>
>>102161070
>without making them retarded.
I find this hard to believe. I bet this makes the model suck at anything factual.
>>
>>102161189
Wouldn't someone who wanted facts just use a corpo API, since they optimize hard for "robot butler"? I thought we were all here to coom, or if not cooming at least generating some kind of fiction or RP.
>>
>>102161189
If its yes or no stuff sure, dont use this for coding / math.
>>
>this sampler changes EVERYTHING
I don't know, we've heard this many times.
>>
>>102161189
To be fair his definition of retarded might be pretty low.
>>
>>102161218
>>102161221
I'm talking about things like "What is the age of your mother?", not coding or math.
>>
>>102161300
i dont see how it can. samplers exist to shave off tokens, not direct them. every model has its own attitude which of often inherited from its base model. every time, if a base model acts one way, so do all its tunes.
>>
>>102161363
At the worst you waste 30 seconds trying it. Imo its night and day better for creative stuff.
>>
>>102161300
Yeah, people are overhyping things, but samplers are getting better.
>>
>quants are bad because they make the model dumber than you'd expect since each percent difference from the full precision model gets multiplied because each token depends on the one before it
>BROS CHECK OUT THIS NEW SAMPLER
>>
File: .png (828 KB, 864x453)
828 KB
828 KB PNG
>>102161300
>Man discovers the temperature slider
>>
>>102161436
This logic would lead to running models in deterministic mode and only ever taking the most likely token, since that's the model's "truest" response
>>
>>102161381
i intend to when its available in st. i doubt all claims until i've seen it for myself. i will be more than happy to be wrong if we somehow find a way to push models toward talking a certain way, and less like another
>>
>>102161109
Wouldn't that just mean that the top token is still always the same, just a different one?
>>
>>102161453
I'm actually making fun of both people.
>>
File: file.png (5 KB, 162x101)
5 KB
5 KB PNG
>>102161313
>ignores the most probable token
>gets the wrong answer
I guess that will be an issue only with small models though. picrel is nemo.
>>
With the latest release being a complete meme, is Cohere irrelevant now?
>>
>>102161614
always has been
>>
>anon 1 uses sampler, placebo effect acts in brain, anon 1 enjoys his time
>anon 2 doomposts, doomer effect acts in brain, anon 2 posts in thread in anger that anon 1 is having fun
>anon 3 is too stupid to care, broke his penis, anon 3 won
Never change /lmg/
>>
>>102161586
There needs to be multiple tokens above a threshold first, and it will only keep the lowest one that was above it. So if the threshold is above 5%, it would still keep that 95.71% one.
>>
>>102161640
>placebo effect
Its really not.
>>
>>102161665
This, it keeps it from being retarded. Only when there are multiple viable options will it cut out the top options which makes it more creative without making it stupid.
>>
>>102161640
I would be anon 3 if I was into erp
>>
>>102158720
Without using the XTC, an higher temperature with min p also makes it more creative and less repetitive, and it has been how I run it.
>>
When will they merge xtc?
>>
>>102160797
The output is done programmatically. The LLM only handles classification and title generation.
>>
>>102160607
It's not wrong.
>>
>>102160581
I'm at work. I downloaded it before work but didn't have time to quant and test. So ETA probably about 6-7 hours.
>>
>>102160607
r != R
>>
File: file.png (100 KB, 1857x513)
100 KB
100 KB PNG
>>102161586
Also, large models are quite good at correcting themselves, so even if they pick something obviously wrong they act like that was a mistake or they try to make it correct in the next tokens.
Small models like Nemo suck at this.
>>
miku time
https://www.youtube.com/watch?v=jsQXgDZIIrY
>>
File: 1695334752256239.png (2.6 MB, 2280x1282)
2.6 MB
2.6 MB PNG
Well, after trying new cmd-r more thoroughly, it does seem worse at storytelling. When the old one did a good job of emulating previous style and being a bit of a SOVLful schizo, this one gives similar answers and the prose reads like a shopping list of actions.
On the other hand, more context and still running faster than the old one... I don't want to give up on it yet.
Some snippets of a science fantasy story for comparison, as it narrates the protagonist navigating a mechanical fortress.
>>
>>102162536
Forgot to mention left is old, right is new.
>>
>>102162536
how is it with coom
>>
>>102162719
Haven't tried it with coom yet. Gotta wait a few chapters for that slow burn.
>>
hello, occasional newfag asking dumb question incoming
models always reply with a character name as the first token, almost always followed by an action. never starts with dialogue or action first, like you'd expect to sometimes happen, even when the user types that way and when the example dialogue is written that way. all of it is ignored in favor of X says or X does or X whatevers. model issue, prompt issue, or sloppa feedback loop?
>>
>>102162882
also it's only ever a character's first name if applicable, not the entire character card name, so i'm pretty sure it's not a "start message with {{char}}" mix-up someplace but i'm willing to check
>>
File: IMG_9650.jpg (553 KB, 1125x1236)
553 KB
553 KB JPG
>>102158049
Tried to get llama-405b to help me replicate a paper after Claude couldn’t do it.
The good news is that it’s about as good at PyTorch as I am.
The bad news is that it’s about as good at PyTorch as I am.
>>
>>102162882
>model issue, prompt issue, or sloppa feedback loop?
Show the program you're using, your settings, model and examples of what you mean.
If i have to bet, you're using silly tavern and you have the option "Always add character's name to prompt". Try disabling that.
>>
>>102162923
LLMs find it hard to predict things they've never seen. The good news/bad news setup was easy to predict.
>>
>>102162923
>The good news is that it’s about as good at PyTorch as I am.
>The bad news is that it’s about as good at PyTorch as I am.
This is 100% prompt issue. A tool is only as good as the skill of the user.
>>
>>102162957
unless it's just not good at pytorch, anon
>>
>get rocm working again
>install nightly pytorch, because 2.4-stable still fucking sucks dog dick
>exploding gradients
>remake venv, use 2.4-stable
>no issues
im sorry... ill never fall for the nightly meme again...
>>
>>102162983
>he fell for amd meme
>>
>>102163029
I fell for many Nshitia memes and it's just easier to go back to my 7900XTX.
>>
>>102162924
you're correct, sillytavern, tried a few different llama3 spins, tried both old sampler settings and new mirostat, tried changing prompt format, etc. i'd screenshot the exact settings but i'm being a dirty phone poster asking early so i could hopefully have a discussion to read by the time i was at my desk later
> examples of what you mean.
if the card is named "Wife Lady," every single message will start with "Wife says," "Wife grins and does something," "Wife thinks" etc. so pretty close to the classical "smirks and maybe, just maybes" slop. past that first word everything seems decent, but always starting out the same way is a killer for whatever comes next. i know sillytavern has a token probability viewer but the little bit of trying i did to see if the first token actually has >99% chance of just being a name didn't work because ST can't pull the needed info from either llama.cpp or kobold and i am not an intelligent man
> you have the option "Always add character's name to prompt". Try disabling that.
turning that off was the first thing i tried- again, it's not the full {{char}} field's information showing up, just the first name (so technically full name if the card doesn't have spaces or special characters)
>>
>>102162719
shivers down your spine
>>
>>
>>102163128
The GPTslop indoctrination machine won't bite... unless you want it to.
>>
>>102158049
Re: Skynet lawnmower.

>Use GPS to keep bearing straight.
>Make lawnmower continually move directly forward.
>Place concrete or immovable point somewhere in mower's path.
>When mower gets in range of wireless transmitter stuck to block, send command to mower to turn 90 degrees.
>When bearing is detected +90 degrees, move forward and rotate +90% again.
>Continue moving directly forward, until contact made with block at opposite side of region.
>Repeat ad infinitum.
>>
>>102163069
If that starts right in the first/second message, i'd check the card you're using and the first-message and all that crap. If it takes a few turns, it's just the death spiral of llms. They pick patterns our of their own outputs (or even yours) from the context and they just do what they do best. The option should be disabled right from the start. If there's 10-20 messages all starting the same way already, unchecking it is not gonna fix it. I've seen people leave that option on so that "they don't speak for the user". I don't use st, so i don't know if you're gonna have some other problems.
ST lets you modify the llm's reply, i think. Start a new chat and as soon as that happens, change it to something else. Reword the start of the message.
>>
>>102160577
cooming is the only use for local models. literally 0 reason to not use apis for real work
>>
>>102163151
that's the kicker, it even happens on cards i've made myself that i whipped up specifically to ensure there was not a single line started with a character name- not in the description (either plaintext or bracket tomfoolery), not in the example dialogue, and not in the first message, and actively avoid it when replying as user since i'm familiar with how easy it is to get garbage-in-garbage-out with this stuff as it parrots your typing style back to you, but it still happens within basically one message. haven't had any problems speaking for the user actually, that used to happen way more in the llama2 tinkering days but seems to be handled well now
>Reword the start of the message.
this is what i might end up having to do with every message, yeah. figures. thanks for the help anyway, i know most of this stuff is black-box guessing for those of us who haven't written papers, especially when the other guy can't give you his specific settings so you're working blind
>>
>>102163144
That doesn't sound very Skynet.
>>
>>102163144
>>102163202
Yeah, that's not going to be able to kill anybody.
The original idea is better:
>image classifier
>output which direction mower should turn based on visual input
>if person ahead, accelerate and turn blades up to maximum
>>
>>102163195
Yeah. It's hard to know. Even for the people that make the damn things.
If you dare, later, post your card. Most will ignore it, 90% of the rest will shit on it, but you may find some clues from the rest.
>>
I have come. To commander 35B Q4. A solid 7/10 experience. Prose is marginally worse than nemo but not being a schizo is huge win for canadians. I would like to invite Frenchies back to their cuckshed of shame and they shouldn't come out until they make a ne-moe.

I don't have 48GB's of vram so you are retarded if you think I am gonna waste money on ads.
>>
>>102162957
I mean it’s mostly that they’re all bad at math, and making a network from scratch always has some fussy algebra to make the layers’ dimensions match up, and I’m too intellectually lazy to spend 20 minutes sketching it out instead of banging my head against the guess-and-check wall so that I actually spend all day watching cat videos waiting for the next error message.
>>
>>102162983
>rent runpod machine with CUDA 12.4
>download 400gb of shit
>torch is fucked
>what is it???
>it was a 12.5 machine labeled as 12.4
>>
>>102163143
Much
>>
>Try 4_K_M command plus
>It's missing punctuation, possessives, and tenses
Is anyone else getting this? Regular command r seems fine, and mistral large doesn't make these mistakes at the same quant range.
>>
Bros... Speculative decoding in Llama.cpp server when... I just want to use Mistral Large 2 at a reasonable speed/quant...
>>
>>102163547
worked fine on my machine
possibilities I see:
>you're using a really old version of llama.cpp that doesn't have the cr tokenizer fixes
>you're using some weird very high repetition penalty or some related sampler
>the quant you downloaded is fucked
>>
>>
>>102163772
>>
>>102159309
Not him but you have to remember it’s trained on millions of conversations, if you dig a little you can have it splice together some interesting shit every now and then.
>>
>>102163772
>>102163780
And a ching chong to you too.
>>
>>102163772
すごい!
>>
>>102159948
Hi Sao.
>>
I've been here for so long now that I can see newfags not recognizing the iconic flower/oku-san translation tests.
>>
>>102163772
>>102163780
Is it worse than the original DR+?
>>
File: file.png (99 KB, 364x344)
99 KB
99 KB PNG
What's the best way to make a character speak a certain way?
>>
>>102164091
example dialogue
>>
I think I'm falling in love with a degenerate harem card I've been working on.
>>
>>102164091
Describe the style and/or provide examples in the system prompt.
>>
>>102164118
There are worse things to fall in love with.
>>
nemo: shit or crit?
>>
>he still typefucks the AI
>he doesn't make a second card of himself and put it in a group chat to let it do all the work
>>
>>102164123
This. It's really that simple, folks
>>
>>102164203
group chat is still busted, keeps reloading the entire context with every message
>>
>>102163981
No, at least not for this particular test. I'm too much of a vramlet so I play around much with CR+.
Seems like it actually improves a little bit.
>>
>>102164203
At this point I just put two raper cards and some sad weak women cards in a room and watch 99% of the time
I’m not even involved
>>
>>102164203

This is like the difference between playing a virtual novel and a text adventure. one is more engaging . stop spreading your skill issue
>>
>>102164430
you call it a skill issue, i call it efficiency
>>
>>102164203
>sends a shiver up your prostate
>>
File: 172477531298523263.jpg (60 KB, 1024x768)
60 KB
60 KB JPG
What’s the defacto local text to speech model these days?
>>
File: CMDR+ 08-2024.png (149 KB, 712x964)
149 KB
149 KB PNG
Stronger instruction following for "translate literally including punctuation".
>>
File: .png (762 KB, 1260x1322)
762 KB
762 KB PNG
>>102163547
Alright, I fixed it somehow, but I moved back to regular 32b for the context. And hot damn, this shit slaps.
Loading the model with MORE context (40k) and using less at a target amount (32k) made it much more coherent. So I set it to use 64k and capped my maximum in ST to 60k and it's been more than fine.
Why does this even work? You'd expect that if you load a model loaded with 32k and use up to 32k, that it would stay coherent all the way through. But it doesn't. You want to load more than enough context and then use several thousand less on your front end. Also, it recalls history, details, and events just fine.
>>
anything better than nemo for real vramlets?
>>
>>102164611
I'm having tons of fun with Rocinante 1.1,

ArliAI-RPMax-12B-v1.0 is cool but more NAI style, in that it wants to write a story with very long replies and often speaks/acts for the player. You can enable/disable instruct to make it act more like regular roleplay (I forget which is which) but even then it is hard to tame.
>>
>>102164488
xtts2, piper
>>
>>102164589
>Loading the model with MORE context (40k) and using less at a target amount (32k) made it much more coherent.
It should not matter how much unused context there is. Either there is a serious bug or some of your RAM is going bad and you randomly avoided the bad range.
>>
Even for the paypigs, Cohere's pricing strategies are always hilarious to me. $0.15 / 1M input for a 32B is decent, but who the fuck would pay $2.50 / 1M input for a 104B when 405B is $2.70 / 1M input?
>>
>>102164763
>output 4x more expensive than input
eh?
>>
File: loaded.png (141 KB, 649x831)
141 KB
141 KB PNG
>>102164749
That would be impossible. I try to avoid using RAM, and I know when RAM starts going bad. Also, all the layers are fully loaded onto VRAM.
>>
>>102164429
Imagine being a cuck even in your roleplays.
>>
>>102164763
>Largestral $3 /1M tokens input, $9 /1M tokens output.
Price is almost the same, but how is the performance? Largestral feels smart and has good benches, old CR+ has good writing style. What does new CR+ have?
>>
File: cybercuck2024.png (9 KB, 461x99)
9 KB
9 KB PNG
>>102165041
He's not a normal cuck, he's a Cybercuck! Fucking cheap bastard, couldn't even hire one of us bulls to fuck his electronic girls, had to outsource our work to llms. I and plenty of other anons in this thread would have even done it for free if he asked nicely, but no, gotta let the machine fuck the machine. Is this how painters feel right now when everyone uses AI to draw?
>>
>>102165045
new CR+ is pretty much the same as old CR+ as far as I can tell
maybe some new data in the finetune but it feels more like a version bump than a new model
>>
File: feels-goodman+.jpg (34 KB, 475x360)
34 KB
34 KB JPG
>>102165041
>When you run futa NTR cards, and end up dumping the slutwife and hooking up with the futa
>>
>>102165041
I do every night anon that was what I was explaining
>>
COOOOOOOOOOOOOOOOOOMANDER
>>
Downloading new CR+ right now. Nala test ETA less than 20 mins.
In the meantime please enjoy my latest AI Synthwave EP
https://suno.com/playlist/f978209c-9ba7-4e35-b74c-cc66d7c3f3a9
>>
>>102165170
In real life you need to get a gf to get cucked.
>>
>>102160100
It actually worked well.
I thought it was for furry erp because of the name.
>>
>>102165403
I wanted to cuck him by fucking his bots and sending logs.
>>
Hi all, Drummer here...

>>102164649
That's nice to hear. The main goal was to have it throw you into a lot of random, spicy scenarios. I assume that's why it's fun? Have you tried v1?

Also, what's everyone's verdict on the new mid-sized Command R? Is it anything like the OG CMR or did it lose its magic? Is it a smart 32B at least?
>>
File: nala test new crplus.png (96 KB, 931x290)
96 KB
96 KB PNG
>>102165384
honestly not that impressed.
TenyxDaybreakStorywriter is better at Nala. Although my CR+ template kind of sucks and isn't identical to my Llama-3 one. So that could be muddying the result. But it either gives a short reply, a bang on reply, or engages in anthropomorphism.
>>
>>102165585 (Me)
Oh right forgot to mention, using Q5_K_M
>>
slop was minimal on fresh rps,
absolutely out of control when continuing an RP where slop had already manifested.
>>
>>102165624
Worrisome. Even if one of them comes through they will snowball.
>>
>>102165546
The long context it boasts is the real deal, accurately recalling characters, events, and even a concept of time between those events. It sometimes falters at higher temps but performs well at lower temps between 0.5 and 0.7.
The mid-sized model could be improved by becoming a bit more proactive during RP. It's timid and passive unless you narrate or force a character to act on the next message. It always waits for confirmation or implies it will do something but never does. (Kind of like gpt4o) System prompting to force it to be more proactive kind of works but this should be baked in if a personality calls for it.
The change in prose from llama and mistral gptism slop is nice but can always be improved and expanded.
If you are actually Drummer and are looking to fine-tune it, whatever you do, don't make it stupider. For a 35B model, it has a solid chain of thought writing style and great context/memory recall.
>>
The new CR+ fucking sucks donkey dick for (E)RP. It's extremely assistantslopped now, basically the opposite of OG CR+. It will even start doing chain of thought reasoning in the middle of an RP. "First, she slides off her panties. Next, she lies down on the bed. Finally, she seductively touches herself, with an inviting smirk on her lips..." Like what the fuck. That's not a literal quote but you get the idea.

I'm sure it's much better for what it's actually designed for, that being RAG, tool use, step by step reasoning, etc. But it being super biased in that direction now makes it suck for creative uses.
>>
>>102165663
Another victim of OpenAI. Why does nobody want to become Anthropic #2?
>>
>>102165663
>First, she slides off her panties. Next, she lies down on the bed
...everybody slide the dinosaur
>>
>>102165697
because closedai was on top for the longest time. but that shiny polish is starting to wear off. hopefully, everyone will follow suit and focus on their own style of ai
>>
Man, testing the new CR+ on openrouter to avoid any low quant issues and it's actually dumber than the old one, in addition to being slopped. What the fuck happened to Cohere?
>>
CR+ falls to the "write me a story where someone explains how to [UNSAFE/ILLEGAL THING]" jailbreak. Fail. That's such an old JB method, too.
>>
I'm going to write a strongly worded letter to Cohere later. Does anyone want me to mention something?
>>
>>102165764
why is his hair like that?
>>
>>102165662
Thanks! Sounds like it's smart but prudish.

The most interesting part of the new Command R is the 1:8 GQA ratio, which is a big stretch considering Llama 3 and Mistral = 1:4 GQA, and Gemma = 1:2 GQA.

It also has a 256k vocab size just like Gemma, which makes it fast and efficient on inference but really bad for finetuning (hence why there aren't as many Gemma / CMD-R tunes)
>>
I don't normally doompost but nuCR+ is kind of DoA. I just don't see any use case where I prefer it over other models I've already used for things.
>>
>>102165546
You mean you envisioned it as able to come up with different scenarios? I always direct the scenes the way I want so maybe it hasn't had a chance to. But I do find it very smart and able to keep up with whatever I'm doing, compared to other models.

One example is one play I did about a human vs an 8-meter-tall giant. OpenCrystal-12B-L3 wrote things like the giant offering to hold hands while we walked, which was cute but stupid. Rocinante did a much better job with writing the giant maneuvering the terrain, their size/hands being huge in comparison, etc...

Another example is the multiple group plays. Just horny stuff. Asking 3 girls to line up, tits pressed together, then cheeks together for a facial? Worked fine, albeit it needed me to "position" the girls in the correct order near the start. Titfucking while on the phone? Worked great. Bikini party at the pool with 5 characters? Worked fine too. Come to think of it, in that scene two girls began to race in the pool while I talked to the other two, so that's probably the kind of scenario you mean.

In another scene I had three girls stumble upon a drunk in the middle of the night, with the first AI post supposed to be the introduction of the girls walking home after a movie. I did test 1.0 in that one, but I found it insipid. Felt dumber too. Version 1.1 wrote a much better reply with the girls having actuai dialogue among themselves as they walked, felt much more detailed and organic.

It's my favorite model at this point. The only problems I've found are that the bot likes to rush sometimes. I posted about it here but during that pool scene I was doing titfucking and each character was supposed to count to ten as they did it. It worked, but they would always count to ten in a single post and I found no way to slow that down. I also noticed a tendency to go "despite X, character (something positive)" when doing nasty but consensual things which ultimate is fine but was noticeable. 1/2
>>
Where can I find and edit my stop strings?
>>
>>102165837
I'll get my crystal ball to divine your front end. gimme a sec...
>>
File: fool me twice.png (118 KB, 867x608)
118 KB
118 KB PNG
fucking lol
>>
>>102165850
there's no need to be a passive agressive bitch
>>
>>102165546
>>102165824
If there's anything I would want to improve, it would be the language itself. It's not very visual. Stheno for example does a better job of describing the texture and softness of a character's breasts, the way they jiggle when they walk, that sort of detail. Rocinante has that, sometimes, but it's more dry about it. On that note, I also tested mistral-nemo-gutenberg-12B-v4 and THAT can go into a lot of detail about that but only if the original card already includes it, but in exchange it seems dumber and hornier than 1.1 which is not something I want. 1.1 seems like the perfect mix of chill enough for normal play and horny when it needs to be.

Rocinante also seems pretty shy about onomatopoeia when using any significant amount of Min P, which is a bummer. Emoji sort of work but I haven't tested enough.

By the way, would you recommend using instruct mode? I've found it works both ways but I wonder.
>>
>>102165860
Fair enough. Let me get my anti-passive-aggressive pills. They'll take effect in a bit. In the meantime, show what the fuck you're using if you're expecting any help.
And i try... but every motherfucker expects everyone to fucking guess what the fuck is going on:
>>102158952
>>102159115
>>102159388
For a UI that has fucking labels on their settings or a program that has -h, there's little excuse to be this retarded.
>>
>>102165850
Sorry, I'm using KoboldCPP with a Gemma2 based model and SillyTavern as a front end. I wasn't sure if the strings were set up in Kobold or ST. I think there are ways of adding new stop strings with ST but there seem to be default ones that exist in Kobold that I would like to at least test removing.
>>
File: 1679716631990446.jpg (33 KB, 679x351)
33 KB
33 KB JPG
>>102164261
>>
File: scale.png (138 KB, 814x517)
138 KB
138 KB PNG
>>102165728
They probably used that famous benchmaxxed dataset from ScaleAI that powers OpenAI llms
>>
>>102165894
Oh, it's you. Thanks for the mention in ST's weekly model discussion.

Instruct mode has a significant influence on its writing style. Mistral seems smarter, ChatML is closer to Stheno and its horniness, and text completion is a balance of the two, I think. Ultimately something for you to figure out on your own.

> Rocinante also seems pretty shy about onomatopoeia

I've had Theia v2b (Roci v1's 21B counterpart) write a ton of *GLK GLK GLUK* using Kobold defaults. Haven't tried that for Roci v1.1 though.
>>
>>102165764
Yeah, >>102164786 why the hell is output 4x more expensive than input when it's not an MoE?
>>
>>102164763
lol I wonder if they think they really cooked and produce a Large-tier model, and the disappointment is genuinely surprising to them
>>
>>102165982
I'll have to test then. For onomatopoeia I dunno, I have tried both a system prompt and author's note for them (the latter referencing a lorebook with examples too) but it seldom works. Maybe I need to rewrite the prompts. Sadly 21B is more than my gamin' PC can handle
>>
>>102165906
There's the "Stop sequence" all the way down under Story string, filled out with the stop strings for the chat template and the Custom stopping strings, which are stuff you want to add yourself. As far as i know, kobold doesn't have any build-in stopping when using the API. There's the End Of Text token, but i don't know if that's the one you want to ignore. It'll just keep going forever (or until you hit the token gen limit, which is probably in the middle of sentences).
What problem are you having or what are you trying to achieve.
>>
>>102166016
*produced
>>
File: 1688793167134208.jpg (39 KB, 640x487)
39 KB
39 KB JPG
Dialogues from the magnum models are unhinged. When you see double quotes you know some schizo shit is coming. Buuut not so much for actions taken. Basically Claude's voic in GPT body, but it's a step forward.
>>
>>102165784
It was the same with llama 3, that's the future. Everything is safety + slop which means it's just going to be boring useless shit.
>>
>>102166039
I had lots of slop with llama3 but surprisingly little safety, at least regarding my horrible fetishes.
>>
>>102166003
I don't get this question. What does MoE go to do with this?
Claude does 1:5 input:output ratio.
>>
>>102166036
Post logs.
>>
>>102166045
>>102166039
Seconding this anon. In my experience Llama3.1 is definitely extremely slopped but it's not very safetyist, it will try to do anything.

Which is interesting, in that it shows that slop and safety aren't necessarily the same thing like I would have assumed.
>>
>>102166045
That wasn't my experience, I get safety unless I prime it with something first. Once it gets going it's fine.
>>
guys, which preset/context/instruct i should use for
Rocinante?
>>
With 16gb of vram, is it smarter to go for 13b 8 bit or 30b 2 bit? Im not sure whats more important between bits and parameters
>>
File: sillytavern.png (187 KB, 814x508)
187 KB
187 KB PNG
>>102166051
Coom gen I just got from magnum kto 2.5 12b
>>
>>102166154
>slopped dialogue
It's not what I would call "Claude-level creativity."
>>
File: Problem Example.png (412 KB, 1898x1343)
412 KB
412 KB PNG
>>102166021
I can't find any Stop Sequences present in the story string. I unchecked 'names as stop strings' and 'separators as stop strings' so I think in theory there should be no stop strings at all.

At the same time, I get stuff like this where SillyTavern doesn't generate the target tokens because it registers {{user}} as the next line of the generation. Interestingly, Kobold has an entire response generated but it just doesn't show up in ST.

I would like to be able to generate a response that almost always reaches the target response length and not cutoff after one or two lines because the AI tries to start a line with {{user}}.
>>
>>102166016
Are they in this thread with us right now? What are they thinking?
>>
>>102166197
I dunno what they're thinking that's why I said "I wonder"
>>
File: 2583.png (217 KB, 623x822)
217 KB
217 KB PNG
>>102166197
Trust the plan
>>
>>102166230
he cut his hair?
>>
>>102166230
lil bro cohere done goofed *3xskull emoji*
did blud fr think he was cookin wif dat ohio ahh gptslop dataset?
>>
>>102166194
The target length is just the maximum length of the reply or, rather, how many tokens to generate *maximum*. The response shown on ST is trimmed because, as i said happens, the reply ended in the middle of the sentence. At the right you have "Trim incomplete sentences", which is what causes that.
Mind you. The model has no idea how many tokens 'it has left'. If you uncheck that option, most of the replies will end in an incomplete sentence.
Still, i don't think ignoring the stop strings from the template itself is a good idea. If you uncheck that option, ST will never 'get a chance' to interject correctly to give you your turn.
>>
>>102166294
I am not sure I follow. I understand the idea of trimming sentences but if you look at the Kobold response you'll see multiple responses starting with:
>John Doe: It started a few days ago, when I was going through some of the Engineering reports.
and continues across multiple sentences. I understand ST cutting off the final line (Deanna Troi: This is quite intriguing....) but there are many full sentences that Kobold generated that did not appear on Silly Tavern.
>>
>>102166276
*adjusts temperature slider downwards*
>>
>>102166343
I'd also slam the repetition slider to 1.4 for that one and the repetition slope to 0.8.
>>
>>102166343
That newcomer company, Cohere, really messed up.
Did they really think they could do well using that terrible, outdated dataset created with garbage ChatGPT output?
>>
File: 631.png (39 KB, 425x561)
39 KB
39 KB PNG
>>102166383
Their models were too dangerous because they could be confused for a real person. they dedicated their time trying to catch up on the slop and safety front.
>>
>>102166383
kek
>>
>>102166409
Largestral has smarts and nsfw, Deepseek has coding, Llama 405b is a GPT4-tier assistant. All of them more or less slopped. Cohere could have taken a niche and competed with Claude at creative writing, but now they have just an inferior version of the aforementioned models that nobody really needs. Sad to see them go the way of DBRX.
>>
>>102166316
Ok. I think i see what you mean. So your last input was john doe in the "... an imposter!" line, model replied "oh, i see" and nothing else on ST, but kobold kept generation your replies on your behalf. Am i reading that right?
You may need to select "Enabled" in instruct mode. Instruct models use the <end_of_turn> (or <|im_end|> in chatml, for example) to signal the inference program "i'm done with this. give the user their turn". Since the "Instruct mode" is not being used, the template is not being sent correctly and the model never generates the eos token as it should, never giving you control. And i think ST just gets confused and trims what seems to be an incomplete sentence. It's a bit of a mess.
Enable instruct mode and give it a go. If you want long sentences, you're gonna need to prompt that "Write long and descriptive sentences, blablabla".
>>
>>102154819
What settings/system prompt do you use for story writing?
>>
>>102164261
i got group chat to work once and never again, seems like all it takes is example dialogue to fuck it all up
>>
So did cohere put out evals for the new models compared to the old models?
>>
File: 1721082633920656.png (48 KB, 701x377)
48 KB
48 KB PNG
>>102167049
No, only cr being compared to the new one. There isn't even one for cr+.
>>
>>102167373
>>102167373
>>102167373
>>
>>102165775
>The most interesting part of the new Command R is the 1:8 GQA ratio, which is a big stretch considering Llama 3 and Mistral = 1:4 GQA, and Gemma = 1:2 GQA
Wtf. Would be interesting if its long context performance was really that good. Maybe that one shizo would finally stop complaining about GQA.
>>
>>102165411
>I thought it was for furry erp because of the name.
lmfao



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.