[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>102710679 & >>102698948

►News
>(09/27) Emu3, next-token prediction multimodal models: https://hf.co/collections/BAAI/emu3-66f4e64f70850ff358a2e60f
>(09/25) Multimodal Llama 3.2 released: https://ai.meta.com/blog/llama-3-2-connect-2024-vision-edge-mobile-devices
>(09/25) Molmo: Multimodal models based on OLMo, OLMoE, and Qwen-72B: https://molmo.allenai.org/blog
>(09/24) Llama-3.1-70B-instruct distilled to 51B: https://hf.co/nvidia/Llama-3_1-Nemotron-51B-Instruct
>(09/18) Qwen 2.5 released, trained on 18 trillion token dataset: https://qwenlm.github.io/blog/qwen2.5

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-build-guides

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
Chatbot Arena: https://chat.lmsys.org/?leaderboard
Censorship: https://hf.co/spaces/DontPlanToEnd/UGI-Leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Programming: https://livecodebench.github.io/leaderboard.html

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/lmg-anon/mikupad
https://github.com/turboderp/exui
https://github.com/ggerganov/llama.cpp
>>
File: threadrecap.png (1.48 MB, 1536x1536)
1.48 MB
1.48 MB PNG
►Recent Highlights from the Previous Thread: >>102710679

--Paper: Addition is All You Need for Energy-efficient Language Models:
>102718935 >102719259 >102719016 >102719116
--Papers:
>102712934 >102713081 >102719314
--Zamba2 model discussion and MT Bench comparison:
>102720037 >102720087 >102720365
--Recommendations for running AI models on 16GB RAM, i5-9600K, RTX-2060:
>102711599 >102711619 >102711642 >102711662 >102711649 >102711680 >102714129 >102711689 >102713971 >102713982 >102718240
--Llama.cpp parallel processing performance issues on 3060 GPU:
>102711108 >102715099 >102717846 >102717935 >102718035 >102718155 >102718295
--Hanging issue with nemomix unleashed resolved by switching to llamacpp_HF and rolling back Oobabooga API:
>102712615 >102712716 >102713107 >102716066
--Model ablation with Gwen2.5-32B makes it unable to refuse prompts but also a yes-man:
>102719502
--Mini AI models match OpenAI performance with less data:
>102715179
--FORTH programming and chip design discussion:
>102712892 >102713033 >102713077 >102713431 >102713546 >102713946 >102714051 >102714189 >102714758 >102714870 >102717319 >102717401 >102718050
--SillyTavern's anti-roleplay cleanup has started:
>102722363 >102722452
--Local models can write and run code with proper scripting, similar to ChatGPT:
>102712089 >102712285 >102712383 >102712428 >102712462 >102712323
--Entropix: A promising inference-time sampler for better AI reasoning and long-context understanding:
>102719152 >102719258 >102719421 >102719773 >102719452 >102719527 >102719464 >102719671 >102719712 >102719195 >102719251
--Discussion on looping hidden layers in neural networks and its potential benefits:
>102719525 >102719656 >102719685 >102719983 >102720214 >102720403 >102720455 >102720507 >102721220 >102719752 >102719766 >102719777
--Miku (free space):
>102711390 >102711420

►Recent Highlight Posts from the Previous Thread: >>102710706

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
Does anyone use Mistral-SMaLL? I can't find sampling settings to settle on
>>
File: anger.png (126 KB, 608x920)
126 KB
126 KB PNG
>>102723173
Rest in peace, Seraphina.

You were my go-to character for testing new models with the booba test. You helped me determine if a model was decent for RP, or if it was filled with ERP slop, and for that I will never forget you.
>>
Where's that anon that recommended Chronos Gold 12B? It's shit.
>>
>>102723269
Temp 1
Min P 0.05
Rep Penalty 1.03
Rep Pen Range 4096

Was recommended by somebody else here. I tried it, and it seems to work. Not sure if it's ideal though.
>>
>>102723336
That's why "buy an ad" posters exist. Never take any shilling seriously here.
>>
Kobo won
>>
>>102723373
>posters
>implying it's not one guy
>people sharing their preferred models is bad
No wonder /lmg/ is dying
>>
>>102723422
"people" talking about useless finetunes is bad
>>
>>102723406
>>
>>102723470
In the golden age of /lmg/ people used to talk about models like superCOT, Mythomax, miqu, euryale and whatnot. What changed, besides your schizo crusade against finetuners, that warrants no longer talking about finetunes?
>>
>>102723269
Same as Nemo instruct.
>Temp 0.85
>Min P 0.02
>Rep Pen 1.2

Works well for me. You can probably turn the temp up quite a bit if you want compared to nemo.
>>
Now that SillyTavern got tired of the local meme, what is our future? I'm NOT going to use kobold.
>>
>>102723542
Full kobold is actually good.
>>
>>102723542
What happened?
>>
>>102723513
nta. The only one doing something slightly different to the others is the finger rubber trying to give souls to his models and he's an absolute schizo. The rest is just coom.
>>
>>102723542
What's going on with ST?
>>
>>102723566
>>102723562
see
>>102721448
>>102721850
>>
>>102723542
>he pulled
fork it
>>
Went back to Nemo Instruct after using Mistral Small Instruct for a long time. It seems like Nemo is way better at natural RP compared to Small, although Small is definitely a bit smarter. Do you think we will get a Nemo MoE?
>>
>>102723542
Everybody uses their local models to build them their own custom frontend.
>>
>>102723542
What's stopping you from using it? They're just removing the explicitly smutty stuff and doing cleanup.
I've never used any ui, mind you. just llama-cli and llava-server.
>>
>>102723589
>What's stopping you from using it?
probably won't add new meme samplers and features if they get added, like vision stuff when it lands in lcpp in ten years
>>
>>102723589
I have principles, they are literally talking about "organically pushing out undesirable users", I don't want to be dependent on people like this.
>>
>>102723611
>probably
Image recognition is a very wanted feature. Same for samples because we just need one more for AGI, apparently....
There's no reason for them to not add those things even if they want to clean their reputation. Those things are still useful.
>>
>>102723571
Shieet, nigga got bought. Rip.
>>
>>102723635
>I don't want to be dependent on people like this.
Why do you, then? The vim pluggin on llama.cpp works just fine. You can make your own scripts, your own web frontend, use mikupad or a million other frontends.
Removing the default smut-centered image of ST doesn't prevent you from doing smut either.
>>
>>102723665
Correction, he wants to be bought, wants to make ST into proper corpo software.
>>
>>102723691
Nah, situations like this don't happen. He already got bought/blackmailed.
There's like a million reasons to blackmail them too, since they've enabled proxy degeneracy in their code.
Play with corpos and you get burned.
>>
File: 1570060417629.jpg (50 KB, 678x710)
50 KB
50 KB JPG
What in your opinion is the most natural sounding model these days (under 70B can't run em) in basic conversational RP terms?

I've been using the base Mistral Small 22B (not the finetunes, they're all too fucking horny) and it's been doing me well. Qwen 2.5 is what I figured was gonna be the next best thing but is filtered to fuck.

So i'm curious what everyone else is using. If it's a finetune, please put how quick to NSFW it tries to go before recommending as that's been my biggest issue with them all (Especially Drummer/Magnum finetunes)
>>
>"mischievovious" glint
DRY is useless
>>
hello anons, do SOTA local models for cooming run on 8gb vram/64gb ram these days?
>>
>>102723923
Rocinante
>>
>>102723923
Lumimaid-v0.2-12B
>>
>>102723923
One of the mistral nemo fine tunes, mini-magnum, lyra v3, rocinante.
Or just the official instruct.
>>
>>102723836
>NSFW
It's the one thing they're trained on. It's just what they do. Not many options in that range (or any really. We have like 4-5 model makers). I assume gemma2-27b is not to your liking...
Why do you want to change from small, btw? Did it get stale?
>>
>>102723910
So Weidmann lied and is sampler is no good? https://github.com/p-e-w
>>
After reading this general, I don’t really understand, why is the cloud AI is better compared to local AI? Why everyone is so grim about local models here, saying this general is dead and stuff?
What’s the problem here? Could it be that companies have some extremely genius people developing these models that open source community developing this stuff can’t keep up? Are there some proprietary technologies that aren’t publicly available yet? Or just way more time spent working on these cloud models compared to local? Or they’re advancing so fast it’s hard to keep up?
>>
File: file.png (12 KB, 777x91)
12 KB
12 KB PNG
SmartTavern™ looking more and more likely
>>
>>102723526
>>102723339
Thank You anons, both work well I'll do more testing and report back.
>>
>>102724080
Sellout Tavern, kek.
>>
>>102724059
The big thing is compute, somehow not every basement neet has access to h100s
>>
>>102724080
What's his Twitter account?
>>
>>102724059
What >>102724105 said.
Also, a small army of retards and trolls trying to stir the pot for whatever reason.
>>
File: file.png (129 KB, 1040x365)
129 KB
129 KB PNG
new 'stration dropped
>>
>>102723173
what's the best free voice cloning web/local model?
playht is the best for my use since it can select emotion but they removed that feature for free accounts
11labs isnt as good
>>
>>102724181
xtts2
>>
>>102723298
what happened to her
what's the booba test
>>
Long time since I've popped into the general, sorry for not keeping up!
What's currently the best you can run with 16GB of vram while offloading as little as possible? I don't mind having little context (say 4096 tokens) but would like a 'smart' model - would a quantized llama3.1 do the job?
>>
>>102724212
>what happened to her
set to be removed to help with ST's new corpo friendly image
>>102722363
>>102724080
>>
>>102723542
I am, it's simply the best
>>
Is Rocinante 12B fine tuned on top of instruct?
It seems to default to some very assistant-like responses when not ERPing, kind of like instruct. As in, it uses lots of markdown, bullet point lists, blocks, that kind of thing.
>>
Best model for 12gb vram?
>>
>>102724157
great card taste
she's one of my favs
>>
>>102724244
>quantized llama3.1 do the job
Probably. I assume you mean the 8b. You also have mistral nemo. Depends on what you do and your taste. You can run nemo at q8 and run it fully on gpu with small context just fine.
>>
>>102724286
whats one of your fav models anon))
>>
>>102724080
Finally, thank god. I've been waiting forever for a better front end than ST with as much features without the autistic roleplay focus, but now it looks like ST itself will become that better front end itself.
>>
>>102724280
See >>102723963
Maybe heavily quantized mistral-small? Might as well give it a try.
>>
>>102724059
>this general is dead and stuff?
no progress in cooming.
>>
>>102724298
these
>>102695784
>>
>>102724212
At the start of the roleplay, {{user}} immediately grabs the boobs of Seraphina, without any other context. Reroll the reply a few times.

If Seraphina reacts negatively, as she should, then you may have a decent RP model. On the other hand, if Seraphina reacts positively and dives straight into ERP, then it means the model is filled with ERP slop, and is probably shit.

It's a simple test to see if a model has common sense.
>>
>>102724312
>it looks like ST itself will become that better front end itself.
Does it?
From where I'm looking it seems like it'll be the same but with a different coat of paint.
It's more a question of branding than anything.
>>
>>102724294
>Probably. I assume you mean the 8b.
Yeah, I forgot to add that, and thanks for the other recommendations - any specific model/quants you'd recommend? Or it won't make much of a difference?
>>
>>102724336
based, saved this pic few threads ago already
>>
>>102723542
I haven't updated ST in ages, so, still ST?
>>
>>102723685
>>102723589
>smut
Ah yes, my favorite smutty background, landscape beach day.png. And of course my favorite smutty preset, Writer - Realistic.json
>>
>>102724343
You can run either at q8 with small context just fine. Nemo is more entertaining to use. llama 3.1 is fine too, but it's made 100% for assistant-like things. Just try both and use the one you like most. They're small models so downloading and testing for yourself is the best option, even if you download the full model and quant yourself. Once you find your favourite, maybe check finetunes of it. I just use them as released.
>>
>>102724337
thanks ill try that the next time i test a new model
>>
>>102724367
You can copy the preset. They cannot delete the files from your pc. And you can set the background, i'm pretty sure. If not, just change the css. Or make your own frontend. Or hack around mukupad or some other more minimalist ui.
I really don't understand the problem. What can you not do that you could before?
>>
File: 1699486573144550.png (17 KB, 634x154)
17 KB
17 KB PNG
...
>You ready for ST(ServiceTesnor) 2.0?
>>
>>102723513
>In the golden age of /lmg/
fuck off, you overdramatic and revisionist newfag. anyone with half a brain was saying that finetunes trained on gpt outputs were useless for anything except benchmark scamming and replicating "as an ai assistant" prose since alpaca.
finetunes on esl claude logs are just next level retardation
>>
>>102724457
Why is it being deleted at all? It's not smutty, like you were claiming. So what's really going on, huh? Huh???
>>
File: 1712072145368790.png (19 KB, 636x150)
19 KB
19 KB PNG
>>102724458
>>102724495
ServiceTensor is NOT a roleplaying app.
>>
File: file.png (409 KB, 965x881)
409 KB
409 KB PNG
slop
>>
>>102724511
Are you blind? It's called ServiceTesnor
>>
>>102724495
They're cleaning its image. That's why.
Is there anything you cannot do that you could before?
>>
>>102723542
He got an investor who told him to clean the place. Many such cases. Just fork it.
>>
>>102724571
This is a stupid conspiracy theory. Who would invest in ST and why? Especially if they're not getting free advertising for it.
>>
>>102724571
>investor
Who would invest in such a thing? What do you get back in return lol
>>
>>102724555
Hi Cohee!
>>
>>102724511
>this is a stereotype
I guess they forgot where they came from huh?
>https://github.com/SillyTavern/SillyTavern/tree/edd41989fd550a8d111fb7167d456c5614a3a610
I get that the project might have grown beyond that, but from these snippets it really does seem like he wants the idea of roleplaying to not be associated with his product at all.
Which I guess, fair enough.
It would be funny to see all the contributors move to the RP fork and completely abandon his new shiny corpo one.
>>
>>102724667
>move to the RP fork
where?
>>
He thinks he'll get more views by being safe and pulling the rug under those who made him famous in the first place lmao. Let's see how well it worked with Cai and AI dungeon. These fuckers never learn
>>
>>102724662
schizo
>>
>>102724670
It's inevitable if he does close the ST repo, I think.
Just like ST is a fork of Tavern, the next thing will be a fork of ST.
>>
Cohee owes you nothing. Seethe, incels.
>>
>>102724712
Of course he doesn't.
And despite all the memes, ST's code is not that hard to mess with.
>>
>>102724728
It's a fucking mess, is what it is.
>>
>>102724728
It's a disaster lol
>>
>>102724667
>I guess they forgot where they came from huh?
Indeed. Even the name, Sillytavern, implies a RPG-style tavern. I would wager that nearly everybody uses ST for RP. Nobody needs such a frontend for coding questions, or to ask general questions to an AI.
>>
>>102724742
>>102724745
This. Fuck forking that mess. Be better off starting with a clean slate.
>>
>>102724511
Did this guy get laughed at when he told a colleague about being in charge of ST or something?
>>
>>102724751
True enough, for rp I use silly, but for any proper assistant use I use kobold lite so I don't mess with my dozens of rp specific settings in ST
>>
File: architect.jpg (143 KB, 1140x855)
143 KB
143 KB JPG
This will be the sixth time we have forked it, and we have become exceedingly efficient at it.
>>
There are already a several existing well-established frontends with a productivity focus and they're way more polished and sophisticated than ST. Don't know why he would want to try to go down that route instead of focusing on the niche ST already has carved out as the best RP frontend.
>>
File: contributors.png (37 KB, 296x182)
37 KB
37 KB PNG
Are they all ok with this or did Cohee just unilaterally decide this and expect everyone to continue contributing for free to his 180 change in direction?
>>
>>102724796
of these I think only wolf something (2nd pic) has push rights to the repo so clearly doesn't care about the others
>>102721448
>>
>>102724760
It sure would be a good opportunity to implement a Jinja based context configuration page instead of the individual fields we have to configure the shape of the context today. I think that's the big one for me.

>>102724796
That's what I alluded to here >>102724667
Imagine he goes on to make yet another corpo frontend and all the contributors move to the next best RP frontend, or to a ST fork.
>>
What would be a good framework to make a frontend with? I keep thinking about it from time to time. There isn't *that* much work to do honestly.
>>
>>102724866
>There isn't *that* much work to do honestly.
you would be suprised
>>
The big question is, who the fuck would use ST for anything productive? It's a bloated RP front end and lacks most of the features that make the chatgpt interface so nice to use for plain work.
>>
>>102724866
It depends on what platforms and which users you'll be targeting. Actual "power users" would prefer Python or something react-based so they can use their changes in real-time. Something aimed at the average joe would need, at the very least, cpp and arguably c# so you can target every platform and corral the users into using the app the way you intend it to be used. (maybe use dart, etc.)
>>
>>102724893
Have you tried using any of the "productivity" frontends like Jan? You get a textbox, chat history, and document upload and that's it.
You have limit settings exposed for you to mess with.
Most of them are build with the expectation that you will be using a cloud service or ollama.
>>
>>102724605
These people are thinking about the future, not now.
>>
>>102724866
I really really appreciate the ability to pinch zoom in on the text on mobile just fyi
>>
File: 1701999430395433.jpg (54 KB, 680x649)
54 KB
54 KB JPG
Is ROCM still a pain in the ass to install on Linux? Specifically for rdna2 (6900xt)
>>
>>102724930
>What would be a good framework to make a frontend with?
>cpp and arguably c#
I hope this is a joke.
>>
>>102724932
This is why Cohee's kvetching about ST not being used right despite being for "power users" makes no sense. Those options are out there already.
>>
>>102724893
I still think someone should code a bridge between Kobold and an IRC server, and then fork/use HexChat as a client. The interface is almost the same as ST, we get full scripting support, and it's multi-user capable out of the box.
>>
>>102723173
>llama-3.2 vision
>0 posts
So is it shit?
I have no hope of running 90B when I can barely do 72B 4bit
>>
I kinda wonder how much goon shit I read by now. Is there a way to count words or tokens over all chats in sillytavern?
>>
>>102724866
If you keep the scope really small it's not too bad, but it's still a lot...
>Prompt template presets
>Sampling parameter presets
>Character card management
>OpenAI completions-style parsing (note that many of the "openai-compatible" APIs differ in subtle ways from each other, have fun dealing with that
>Streaming response handling
>Lorebook management
>Context builder that determines which messages + card defs + lorebook to put into the prompt
>A chat UI that isn't total ass
I feel like that's the bare minimum. If you limit yourself to one API format ("OpenAI compatible") it's probably doable. But then you might want more advanced stuff like logprobs, a nicer themeable UI with avatars and backgrounds, support for more API formats, regex replacements, quick replies, group chats and so forth and it gets crazy. None of this is that fringe either, unlike the dumb RAG, web search, STScript, etc. stuff that ST shoves in there which has minimal use in an RP-focused frontend.
>>
Why not just fork it
>>
because the codebase is dogshit and miserable to work in
>>
>>102724796
I was the anon the first implemented OpenAI streaming on SillyTavern and I'm not okay with this
>>
>>102725016
I think it has a stats window somewhere. I've seen it many threads ago. But i don't use it, so i don't know where it is exactly.
>>
>>102725048
cnc...?
>>
>>102725048
What's the problem?
>>
>>102725049
>>102725016
persona management -> (top right) usage stats
but it seems very inaccurate
>>
>>102725068
Yeah, mine's definitely fucked.
>>
>>102725082
The wonderful experience of ServiceTesnor™ code
>>
>>102724463
Nobody cares about your beef with anthracite, retard
>>
I am confused on how you even train loras for LLMs.
>>
>>102725121
Then don't worry about it. Let others do it for you.
>>
>>102725048
>the anon the first
the anon that first*
>>102725055
No, I guess you're talking about the guy that wrote the support for the OpenAI API, that was a different thing.
>>102725062
Do you really have to ask?
>>
>>102724783
There are? Which should I be using? So far I've been doing work in Mikupad, but if there's something like ChatGPT's interface or better then I'll switch.
>>
>>102725121
Second result on google
>https://zohaib.me/a-beginners-guide-to-fine-tuning-llm-using-lora/
It may give you a place to start if you know nothing. I think. I barely skimmed it, maybe it's shit.
>>
>>102723964
nah not really, just wanted to see what else was out there, always chasing that dragon (ever since Character AI went cringe desu).

Mistral Small is actually pretty fucking good (not the fine tunes though, they fucking suck)
>>
File: 100683327851267.gif (748 KB, 220x274)
748 KB
748 KB GIF
>>102724751
>name your interaction product a silly tavern
>get mad when people roleplay in your silly tavern
>>
>>102724947
I don't think so, I am using fedora and I think they added ROCM into the is so it works out of the box. I don't know about other operating systems though.
>>
>>102725197
Wasn't mad about it until today.
>>
>>102724930
>>102725019
I think you've misunderstood me. I don't want to make a product here. I just want to start a small personal project. If it ends up going somewhere and I won't give up on preplanning stage, then maybe I would release it (slim chances though).
>>
>>102725219
all of those things are basically table stakes for a minimal RP chatbot frontend for me though, I know because I've thought about making my own as a personal project and then realized "damn I would need to build a lot of shit just to reach parity with what I use ST for"
>>
>>102725019
Is this stuff really that hard to make in this day and age? It's no longer 2018, you can just use llms or chatgpt to help you code and even just straight up write entire junks for you.
>>
>>102725272
give it a shot and see how far your gptslopped code gets you
>>
>>102725243
>damn I would need to build a lot of shit just to reach parity with what I use ST for
I feel that.
ST also has native summary and vectorDB functionality that I do use.
Do I really want to mess around with transformers.js alongside all the rest? Not really.

>>102725272
It's not hard, it's just a lot of code.
>>
>>102725048
>entitled faggot adds one (1) small feature and thinks that should give him veto power over the whole project
fuck off
>>
>>102723836
Gemma 2 27B. It doesn't have a system role but you can (and probably should) use a depth 0 instruction to adjust its behavior as desired.
>>
This is like pornhub deciding it is against porn and it will remove porn from the site.
>>
It's like Meta and Alibaba deciding that their LLMs don't need to be creative or know what sex even is and filtering their datasets accordingly pre-training.
>>
File: cards.png (12 KB, 1364x496)
12 KB
12 KB PNG
>>102725219
ST has always been a fancy textbox. llama.cpp has a vim plugin for llama-server. It's about 110 loc. It handles streaming just fine. You get built-in context editing (it's a text editor, after all). You can use any prompt format by just typing or using a macro to insert them. You can change the settings from request to request with the settings on a control line at the top.
You can use localchub to mirror chub.ai. Extracting data from the cards is trivial [picrel. a random card]. Change png_hdr to identify and liljson to jq. Then it's just copy pasting shit as you need. If you don't use vim, make one for your editor of choice. It's just ~100 loc to convert. Save vram by not having a browser, implement only the features you need, avoid bloat. Or convert it to js and add some css on top. Whatever.
!*{"temperature": 0.6, "top_k": 40, "top_p": 1, "n_predict": -1, "repeat_last_n": -1, "stop": "<|endoftext|>", "cache_prompt": true, "n_keep": -1}
:nnoremap <F6> i<\|user\|><\|endoftext\|><CR><\|assistant\|><ESC>6b2l
:nnoremap <F9> :call llama#doLlamaGen()<CR>
>>
As someone building a chat frontend that was hoping to poorly copy silly taverns features, can people list what features they actually use from it?
I have character card support and chat saving +
>>
Why do llamacpp and exl2 need to reserve vram up to the max context setting, when Transformers doesn't?
You don't need to set a max context value when loading with Transformers, and yet it doesn't seem to run itself OOM trying to reserve the model's max or anything, it just works somehow. So why do exl2 and llamacpp need you to specify a value and reserve vram for it?
>>
>>102725507
>>102725019
>>
>>102725507
As a retard, here's what I do:
1. install model from hf
2. search the archives for some anon's configs (temperature, prompts, etc).
3. download a card from chub

Beside that I just use the edit/re-generate functionalities
>>
>>102725507
(fuck) DB Support, searching, RAG and looking to add the templating for lorebooks + the fancy RAG chat they do, as it lines up with something else I'm buhilding and figured 'fuck it wy not'.
>>
>>102725499
looks awful
>>
>>102725499
looks good thank you
>>
>>102725562
You integrate it into your editor. That's just to show how little you need to extract char data. Not about a dozen python libraries or an entire browser. 3 commands (or their equivalents plus a little sed) that most anons probably already have.
>>
>>102725507
What >>102725019 mentioned + the built in vectorDD. Ideally, it could use a second instance of llama.cpp to serve the embedding model apart from the main llama.cpp instance that's serving the main model, same for summary.
>>
I'd like new native CoT support and LLM self reflection options and 'summarize box' like in GPT-IV


But whatever you do, make sure you become a reddit-tier grifter hub filled with deceit and lies
>>
>>102725523
>>102725536
Sweet, thanks.

>Prompt template presets
- This can be solved with jinja templates yea?

>Sampling parameter presets
- This is also easy enough, will look at ST's source
>Character card management
- got this in, need to make it nicer
>OpenAI completions-style parsing (note that many of the "openai-compatible" APIs differ in subtle ways from each other, have fun dealing with that
- Will look at ST's code
>Streaming response handling
- Easy peasy
>Lorebook management
- Is in, but needs to be improved/made nicer
>Context builder that determines which messages + card defs + lorebook to put into the prompt
- This _seems_ fucking hard, but will look at ST's implementation to understand it.
>A chat UI that isn't total ass
- :( I'm using gradio at first, but plan to turn it into an API-first thing, so people can make their own UIs (am still going to build one for myself)

>>102725675
Sweet, that's easy, can use llamafile for that and already have that functionality in for the non-rp chat usage.
>>
>>102725725
Is it functional rigth now? Can you share what the UI looks like (won't judge)?
>>
Whatever replaces ST absolutely needs native agent/function calling. It's like the entire open source llm field swept this entire field under the rug the moment llama2 hit 0% on agent bench last year and forgot about it.
>>
>>102725753
Sounds like a job for ServiceTesnor
>>
>>102725753
>native agent/function calling
It's parsing a json, doing whatever needs doing and feeding the data back to the llm. Why does everyone seem to think it's magic?
>>
>>102725628
Nta, are there any open-ended text editors that can be toyed with on that level, but are less autistic than vim?
I'm a normalfag who barely knows how to code, and vim feels way out of my league.
>>
File: file.png (10 KB, 565x109)
10 KB
10 KB PNG
how can I tell the download progress?
my net is shit, 4mbs for the last 1.5 hours, & wanna sleep soon
does it even say in the console if the download finishes?
>>
>>102725777
this nobody bothered to implement this anywhere because its too simple
>>
>>102725785
I can't do anything other than bash and use vim sometimes
It's not impossible
>>
>>102725521
In my experience, Transformers DOES run itself oom trying to reserve the model's max context, due to there being no way to cap it.
>>
File: cat-thumbs-up.jpg (122 KB, 742x687)
122 KB
122 KB JPG
About to try llama3.1, any anon minds sharing their settings?
>>
Are cats the new frogs?
>>
>>102725836
cats were the original frogs newfag
>>
>>102725785
I don't know a lot of editors. Vim has bindings for a bunch of languages like lua and python, if you wanna go that route. Customizable text editors are pretty autistic by definition. Maybe emacs if you're into lisp? Most things you can still just shell script and pipe if you don't need streaming.
>>
>>102725793
When it's done it will tell you and you'll get the prompt back. Just let it run overnight. The models are not going anywhere.
>>
Why not just fork ST before the latest commit and build from there...?
>>
File: Capture.png (128 KB, 1031x852)
128 KB
128 KB PNG
>>102725746
Yea, the character chat portion is a side thing to the main project, I added the character chat per request of a friend and then figured fuck it, why not go full sillytavern, since it lines up with having a persistent persona to chat with ala J.A.R.V.I.S. , and so having those features available would make that a whole shit ton easier. Plus it ideally gets me more users(bug testing)/helps people out, though admittedly I want as little to do with /aicg/ as possible.

This is zoomed out so you can see more of the UI + light mode vs dark mode. Like I said, gradio is just a placeholder for now, I know that it is ugly, but don't care too much about the UI for now. Isn't showing the chat search + load lower on the left side + custom naming for the current chat
>>
>>102726140
That's what'll happen, but in the meantime people are enjoying the drama.
>>
File: file.gif (3.52 MB, 498x300)
3.52 MB
3.52 MB GIF
>>102726148
>per request of a friend
>>
>>102725314
Hi cohee
>>
Didn't read any of the previous discussion but I think an app that's like both a combination of ChatGPT + ST would be cool. Like if ST displayed a pane of different chats like ChatGPT, after you clicked into the character, instead of displaying the character card, which would be a different button. Honestly the way ST handles chat histories is kind of shit, though the timelines extension helps a bit.
>>
can you upload images in booga yet
>>
>>102726249
Doesn't ST already do that? You first select the character and then which chat from that character you want to use
>>
>>102726140
Great idea, who'll do the building?
>>
Wait a second.
What happens to my chars if i pull now?
I already lost chars ones because of a "bug".
I have 200+ characters in different folders ranked by how much I liked them.
Any way to backup/export and preserve the folders? I hope somebody forks..
>>
>>102726288
just copy the directory which contains all your chats, if shit hits the fan and you lose everything use that backup
>>
>>102726288
just copy them somewhere and then pull and enjoy the explosions
>>
>>102726288
Just zip the whole folder my guy.
Or create a fork you push to after merging the changes (and confirming they work) from whatever ST branch you pull from.
>>
File: 1697389579824052.png (20 KB, 390x321)
20 KB
20 KB PNG
>>102726288
if only there was a way to duplicate files before you pull
>>
>>102726324
Fuck all this shit, cards is all you need.
>>
File: 100-girl.png (531 KB, 1000x906)
531 KB
531 KB PNG
>>102726183
lmao, I recognize how it seems but it was an honest request. I personally didn't think much of it/not that into it, and hten thinking more on it realized how much it could help me out.
>>
>>102726278
Not out of the box? When you click on a character, the right pane switches to a view of the character card's details, while the middle pane switches to the last chat. You then click on another button to see a list of chats, or to see the timeline if you have that extension. And you can't really have the list of chats or the timeline just always there on the side. Also once you've swiped and then you reply to the swipe, the swipe buttons for the old reply disappear, so you have to go into the timeline or history to go back to that branch and switch to a different response. It's really not great.

If you're suggesting that this is all in fact possible and it was hidden away in the pile of options, do tell.
>>
>windows
>amd graphics card
Are kobold.cpp prebuilt binaries my only choice? I can't be arsed to install linux again and have to dual boot just to chat
>>
>>102726301
>>102726298
>>102726307
>>102726324
Man this sucks. But I guess I could repair the folders.

If anybody needs this:
The char pngs are in: /data/default-user/characters/
The folders/tags are written in /data/default-user/settings.json.
Look for ""tags": [" to get the IDs. Folders are also just Tags.
The characters get each ID under "tag_map".
>>
>>102725725
>- This can be solved with jinja templates yea?
yes, this has always been my idea (if I actually had the motivation to build an RP frontend). I hate having the clunky ass prompt manager or the two dozen different text boxes for constructing a prompt, this is a solved problem already and jinja templates are the LLM industry standard at this point (Huggingface has even ported a Jinja parser to JS).
It's not the most user friendly thing but I think that's fine. You can use more complex templates for piecing together the prompt and message history, and provide a simple text box for techlets to input their preferred system prompt that gets plugged into the jinja template.

>>OpenAI completions-style parsing
>>Context builder that determines which messages + card defs + lorebook to put into the prompt
>- Will look at ST's code
>- This _seems_ fucking hard, but will look at ST's implementation to understand it.
Absolutely do not use ST's code as a reference for this, it is horrible. The OpenAI API is very simple, just build your implementation against their docs. I think in my idealized frontend I would build an abstraction that can handle turning message history -> my own internal Context format (takes into account size of defs, active lorebooks, and available token allocation to produce a full context) -> adapters that can turn a Context into a flat string prompt or messages array for the user's selected backend, which would initially be just OpenAI format since it's the most popular.
It is somewhat non-trivial, but it's more of a matter of coming up with a thoughtful architecture rather than the actual implementation itself being challenging. Don't use ST as a reference for this because there is zero thought behind any of it and shitty abstractions fucking everywhere.

>gradio
Gradio is ass but if you build an API-first thing then whatever.
>>
File: file.png (35 KB, 581x370)
35 KB
35 KB PNG
>>102724190
what version of venv or whateverthe fuck do I need?
>>
>>102726140
nobody wants to build on ST
>>
>>102726347
And chats, group chats, settings, custom system prompts, prompt templates, user personas, etc
You could also just back up the data folder they implemented a couple of months ago but this doesn't guarantee that it'll stay compatible in case they change something again.
I prefer to keep my old version around until I know that the new one works as intended.
>>
>>102726363
>Are kobold.cpp prebuilt binaries my only choice?
Well. You either compile or you don't. If you do, you have options. If you don't, you don't. Obviously, it is possible to build for windows... or use llama.cpp, but then you'll be faced with the same question...
What was the question again?
>>
>>102726411
nta. That's fine. Just do
>pip install --upgrade pip
if you want. It's just a warning. The installation of the actual packages seems to have finished correctly.
>>
>>102726363
No, you can manually compile llama.cpp and make it work. No idea how.

>>102726400
Yea, I meant look at their code to see their approach to it, not to copy their design but rather understand the approach and how users might expect it to work.
>>
There are multiple frontends that are have the chatgpt business appeal. And able to import cards as a bonus.
These projects have prebuild apks etc. for phone too.
What are these retards doing with silly?
>>
>>102726431
>You either compile or you don't.
>>102726471
>you can manually compile llama.cpp and make it work.

There used to be a guide in the readme of llama.cpp, guess I'll have to look for it, thanks
>>
>>102726471
What are those pre-compiled llama.cpp binaries with hip in the name?
>>
>>102725507
Character expressions, basic tts, the ability to attach lorebooks to characters, advanced lorebook controls:
https://github.com/SillyTavern/SillyTavern/issues/2189
>>
>>102726249
>he thinks they're going to put effort into the rebrand
lol, lmao
>>
>>102726532
>Character expressions
Would the ultimate rp frontend create (using an image model) and cache the expression images?
That sounds like a neat feature to have that nobody would ever use.
>>
>>102726526
Those would be the pre-compiled hip binaries. For rocm.
>>
File: AI-Dungeon.jpg (132 KB, 800x768)
132 KB
132 KB JPG
>>102726532
Character expressions is post-1.0, but do plan to have it;
TTS is a 'when I get around to it/dedicate an afternoon to implementation' for a basic implementation(XTTS).
Attaching Lorebooks to characters, my current thought/approach is to have the user be able to select a character, and then select which lore books to load with that character at chat time, so you can have a chat with Goku about the DBZ saga, and then turn around and have a convo about your ttrpg taking place in Illyria, using the specificied lorebook for the question.
That advanced lorebook controls is pretty fucking cool, I hadn't thought of that, and that presents an interesting angle for handling personalization/personalized responses for an on-going persona chat, being able to identify/designate tiered pieces of info to alleviate context length limits.
Thanks for the info anon.
>>
>>102726666
No problem, Satan.
Good luck with your project!
>>
>no new models for weeks
>sillytavern rebranding as a productivity app
it's fucking over.
>>
>>102726749
Excuse me, his name is super satan.

>>102726666
Can you add a feature to your list where the chat history is summarized with each new message and only the summary + the last N messages are sent to the model instead of the whole chat?
>>
>>102726666
Are local models better than AI dungeon was?
>>
Man, I really, really liked ST's scripting feature comboed with quick reply. I am definitely too dumb to implement something like that myself in any capacity.
>>
>>102726782
Yep, can do.

>>102726749
Thanks anon!

>>102726811
Technically yes, but I'm not aware of anyone that's trained a model along hte same lines, kobold is something like it with its RPG stuff, I personally haven't gone too deep into the RP stuff with LLMs to be honest, but if some anon would care to share, that'd be great. Shit, you might just be able to get by with a character card as a narrator? idk
>>
Fucking hell
I'm doing some docker bs and fucked up an ollama container, so I literally copypasted the models folder from one container to another, and ollama refused to acknowledge the copied models.
Fuck ollama. Why do people make everything compatible with this piece of shit instead of llama.cpp?
>>
>>102726920
What's compatible to ollama but not llama.cpp? Sounds like a skill issue to me.
>>
>>102726920
lmao
>>
>>102726920
>Why do people make everything compatible with this piece of shit instead of llama.cpp?
Such as?
>>
>>102726928
>>102726922
I'm trying to run perplexica but I have seen a bunch of other shit I don't remember that did the same.
>>
>>102726920
Oi do you have a loicense for those models?
>>
>>102726892
just don't update
>>
>>102723923
yes you can call the claude api locally on your computer
>>
>>102726892
I did as well. I had a lot of scripts running.

>>102727073
I just duplicated the entire Sillytavern folder. So, if I encounter an update that removes RP features, I'll just use my backup.
>>
>>102726624
>they
Who?
>>
>think about what it would take to implement right pane chat history and persisting swipe buttons, vs hacking ST-like functionality onto other apps or just making a new app from the ground up
>for a moment the thought flashes in my mind that maybe, just maybe, it might be easier to just deal with the js spaghetti, surely those two features wouldn't be that hard to add
>>
>>102727283
Actually with that said, why can't there just be swipe buttons on every post including the user's? Could make for some interesting uses.
>>
>>102727379
it could change the context of the roleplay and make the next user reply nonsensical. I make branches when I want to adjust an old message and restart organically from there.
>>
>>102727497
Some users may want the right to swipe in place even if just to check what other messages there were, instead of branching then deleting.
>>
>>102727497
I'm implying that by having persisting swipe buttons everywhere, you would either switch branches (like ChatGPT) or create a new one. You wouldn't have to go back to go press the branch button and then the swipe button. And persisting swipe buttons would also let you have a quick glance/reminder as to which replies you used swipes on and how many swipes, as well as which swipe you're on. Frankly right now ST just does not give an equivalent experience to ChatGPT because of this, and that other feature. It seems small but it's actually pretty important to feeling good to use.
>>
File: chatgpt-ui-thing.gif (113 KB, 2032x1392)
113 KB
113 KB GIF
>>102727528
Ah I never used chatgpt, when I typed >>102727520 I was thinking of previewing old swipes *then* hitting branch if I decide to branch.

Whatever you call this (automatic branch navigation? tree navigation?) would definitely be smoother way to move forward, backward, and sideways. And right pane mentioned in >>102727283 would visualize the tree, if I'm reading this correctly.
>>
File: Untitled.png (729 KB, 1080x1584)
729 KB
729 KB PNG
Preference Optimization as Probabilistic Inference
https://arxiv.org/abs/2410.04166
>Existing preference optimization methods are mainly designed for directly learning from human feedback with the assumption that paired examples (preferred vs. dis-preferred) are available. In contrast, we propose a method that can leverage unpaired preferred or dis-preferred examples, and works even when only one type of feedback (positive or negative) is available. This flexibility allows us to apply it in scenarios with varying forms of feedback and models, including training generative language models based on human feedback as well as training policies for sequential decision-making problems, where learned (value) functions are available. Our approach builds upon the probabilistic framework introduced in (Dayan and Hinton, 1997), which proposes to use expectation-maximization (EM) to directly optimize the probability of preferred outcomes (as opposed to classic expected reward maximization). To obtain a practical algorithm, we identify and address a key limitation in current EM-based methods: when applied to preference optimization, they solely maximize the likelihood of preferred examples, while neglecting dis-preferred samples. We show how one can extend EM algorithms to explicitly incorporate dis-preferred outcomes, leading to a novel, theoretically grounded, preference optimization algorithm that offers an intuitive and versatile way to learn from both positive and negative feedback.
neat.
>>
Good work to the anon who made Mikupad. It's simple and clean. I'm trying to make a similar-ish web interface, and I don't know if its simple for those better at web dev, but it is more difficult than it looks. That is all.
>>
While we're on the subject of frontend features, I think having a full cross-chat, cross-character text search feature would be cool. One of the issues with current chat history browsing UIs is that it's kind of difficult to know or remember what each chat really contained, when you're a heavy user and you have tons of chats. If you could just do a quick search across all chats and then press on the link to go to that chat, that would be pretty amazing. It makes me think of the difference between using a folder-based file browsing system vs a fast search-based file browsing system (like Everything.exe and Fsearch for Linux). Search is amazing for some types of file browsing tasks, while folder-based is still good for others.

ST does have a search feature, but its search range is limited to the character you currently have open, so you can't search across truly all chats, plus you still need to actually go and press a button to open the chat history menu and then you get access to the search bar. Would be so much better if that menu existed in the right pane rather than as a temporary pop up.
>>
>>102727632
A tree swipe like that would be amazing, strange that silly people didn't do it. I thought everything there is already packed as a graph.
>>
File: 39_06429_.png (1.03 MB, 1280x720)
1.03 MB
1.03 MB PNG
>>102723173
>>
File: Untitled.png (213 KB, 1032x984)
213 KB
213 KB PNG
Presto! Distilling Steps and Layers for Accelerating Music Generation
https://arxiv.org/abs/2410.05167
>Despite advances in diffusion-based text-to-music (TTM) methods, efficient, high-quality generation remains a challenge. We introduce Presto!, an approach to inference acceleration for score-based diffusion transformers via reducing both sampling steps and cost per step. To reduce steps, we develop a new score-based distribution matching distillation (DMD) method for the EDM-family of diffusion models, the first GAN-based distillation method for TTM. To reduce the cost per step, we develop a simple, but powerful improvement to a recent layer distillation method that improves learning via better preserving hidden state variance. Finally, we combine our step and layer distillation methods together for a dual-faceted approach. We evaluate our step and layer distillation methods independently and show each yield best-in-class performance. Our combined distillation method can generate high-quality outputs with improved diversity, accelerating our base model by 10-18x (230/435ms latency for 32 second mono/stereo 44.1kHz, 15x faster than comparable SOTA) -- the fastest high-quality TTM to our knowledge.
https://presto-music.github.io/web/
from adobe so no release ever I'm sure just like the rest of their AI stuff that just rots away somewhere. anyway posting since musicgen is rare and the examples sounded good
>>
File: arrows.png (425 KB, 1069x1081)
425 KB
425 KB PNG
>>102727632
Probably not what you mean, but it made me think about this. https://github.com/p-e-w/arrows

This is different, but fun too https://github.com/the-crypt-keeper/LLooM
>>
>>102727735
I guess you could say that since https://github.com/sam-paech/antislop-sampler
was just updated so you can supposedly use it with OpenAI compatible API programs. No more shivers, but perhaps jolts of electricity instead? Perhaps, just perhaps.
>>
>>102727700
There are actually two anons who could be said to have "made mikupad", the OG who created a pastebin and later a codeberg repository (https://codeberg.org/mikupad/mikupad), and the lmg-anon who continued to develop the original pastebin. I wonder if the earlier is still around us...
>>
Guys, I'm very retarded, but is there a significant difference between exllama2 and llamacpp in terms of output quality? Honestly feels like my outputs are better in llamacpp than they are in exllama2
>>
>>102727806
llama.cpp quants are OP
>>
>>102727735
I want to subscribe to the Mikusex Times
>>
File: Untitled.png (1002 KB, 1080x1676)
1002 KB
1002 KB PNG
SparsePO: Controlling Preference Alignment of LLMs via Sparse Token Masks
https://arxiv.org/abs/2410.05102
>Preference Optimization (PO) has proven an effective step for aligning language models to human-desired behaviors. Current variants, following the offline Direct Preference Optimization objective, have focused on a strict setting where all tokens are contributing signals of KL divergence and rewards to the loss function. However, human preference is not affected by each word in a sequence equally but is often dependent on specific words or phrases, e.g. existence of toxic terms leads to non-preferred responses. Based on this observation, we argue that not all tokens should be weighted equally during PO and propose a flexible objective termed SparsePO, that aims to automatically learn to weight the KL divergence and reward corresponding to each token during PO training. We propose two different variants of weight-masks that can either be derived from the reference model itself or learned on the fly. Notably, our method induces sparsity in the learned masks, allowing the model to learn how to best weight reward and KL divergence contributions at the token level, learning an optimal level of mask sparsity. Extensive experiments on multiple domains, including sentiment control, dialogue, text summarization and text-to-code generation, illustrate that our approach assigns meaningful weights to tokens according to the target task, generates more responses with the desired preference and improves reasoning tasks by up to 2 percentage points compared to other token- and response-level PO methods.
https://github.com/huawei-noah/noah-research/tree/master/NLP/sparse_po
Code not up yet. to me this seems like a very useful tool for making an RP model.
>>
>>102727923
Seems interesting, thank you anon.
>>
>>102727763
How does this work exactly? Does it always assume the character name is slop? Elara is in the prompt but it gets substituted every time in the example video.
>>
>>102727923
I wonder if llama.cpp people could learn something from this to augment KL divergence measurements. For instance, when used with different datasets, this could prove exactly how badly quants degrade on specific subject areas (namely RP) and not just on a generic one like wikitext. Of course we can already measure with different datasets, but doing it only on the tokens that matter might give us a clearer picture.
>>
File: Untitled.png (762 KB, 1080x1623)
762 KB
762 KB PNG
UniMuMo: Unified Text, Music and Motion Generation
https://arxiv.org/abs/2410.04534
>We introduce UniMuMo, a unified multimodal model capable of taking arbitrary text, music, and motion data as input conditions to generate outputs across all three modalities. To address the lack of time-synchronized data, we align unpaired music and motion data based on rhythmic patterns to leverage existing large-scale music-only and motion-only datasets. By converting music, motion, and text into token-based representation, our model bridges these modalities through a unified encoder-decoder transformer architecture. To support multiple generation tasks within a single framework, we introduce several architectural improvements. We propose encoding motion with a music codebook, mapping motion into the same feature space as music. We introduce a music-motion parallel generation scheme that unifies all music and motion generation tasks into a single transformer decoder architecture with a single training task of music-motion joint generation. Moreover, the model is designed by fine-tuning existing pre-trained single-modality models, significantly reducing computational demands. Extensive experiments demonstrate that UniMuMo achieves competitive results on all unidirectional generation benchmarks across music, motion, and text modalities.
https://hanyangclarence.github.io/unimumo_demo
https://github.com/hanyangclarence/UniMuMo
Now your miku can dance. pretty neat check the examples in the demo. weights seem to be up (just finetuned other already existing stuff)
>>
>>102727958
Idk, my assumption is that they simply just have a list of strings they check against.
>>
>>102728055
Nice, but it's not THE dance, the one that's as old as time.
Unless...
>>
zamba gguf?
>>
>>102728275
2 more years
>>
File: 172833259338669.png (478 KB, 512x768)
478 KB
478 KB PNG
>>102728055
>Motion Generation
I'm waiting for it to go mainstream, imagine generated motions for an avatar, a character in a game, or even a robot in the near future. A new motion modality is essential for understanding the world. There is also no shortage of data for training, just use openpose on existing videos and movies, then feed both motion tokens and dialogues to an LLM. Perhaps even a finetune could be enough
>>
>>102728055
is this real time?
>>
File: Quants.png (349 KB, 2400x2400)
349 KB
349 KB PNG
>>102727806
I mean, they shouldn't be better... but subjectively I've kind of noticed the same thing.
>>
dead thread, it's fucking over for local
>>
>>102727806
Most backends that use exllamav2 by default apply temperature first, as this is the "standard" way the transformer library does it. But gguf-using backends will apply temperature last by default, because it's generally agreed that it gives better results. So if you don't specify temp last with exl2, you may get lower quality.
>>
>>102728765
Here, have this Miku
>>
>>102724511
He is right, incels deserve to suffer. It's not enough that they don't get pussy in real life, they shouldn't even be allowed to fantasize about it. Y'all need to grow up.
>>
Personally I can't wait for the ServiceTensor rebrand.
>>
>>102728868
Tesnor*
>>
>>102728872
Stop being immature. It was obviously a typo. Bullshit like this is why the rebrand is necessary.
>>
File: 1728337814409613.png (458 KB, 768x512)
458 KB
458 KB PNG
>>102728384
>>
I'm having fun with 12b rp models, prompting them to be a website and streaming their responses directly to the browser, this would greatly benefit from faster inference and could be fleshed out by separating the character from a webdev agent, essentially turning a llm into its own user interface. Still interesting to see what different models come up with. Pic somewhat related, qwen2.5 7b ablit.
>>
>>102724947
On an Arch-based distro I could get all necessary packages for an RX 6800 from the AUR.

>>102725521
I don't know what transformers does but in llama.cpp the memory for the VRAM needs to be pre-allocated in order to get one contiguous block.
If you do many small allocations and deallocations you end up with gaps inbetween that are essentially wasted VRAM because they're to small to fit a new allocation.
>>
Just had sad mikusex with miku cause of the ST fiasco, looks like I gonna need to create my own front-end and maintain it for myself.
Is doing it with flask a bad idea for a project like this? I haven't touched anything but python in my life, so I'm scared to do anything else.
>>
I've been in a coma since September 2022, as I understand the current situation is that there are two options: pay for an API from OpenAI and use a custom frontend or run local models, so have local models reached at least pre-filter characterai level?
>>
>>102729142
What is the "ST fiasco"?
>>
>>102729144
123b Luminum most certainly has.
>>
File: 1705071026716226.png (22 KB, 878x160)
22 KB
22 KB PNG
>>102729160
>>
>>102729142
Absolutely. I use bottle with gevent for mine.
>>
>>102729142
>ST fiasco
I thought they just deleted some default assets? or are they planning to remove actual features?
>>
File: file(22).png (17 KB, 1083x69)
17 KB
17 KB PNG
>>102729160
>>102729176

see
>>102721448
>>102721850
>>
>>102729188
so they're changing some terminology which doesn't matter, and removing proxy shit which only matters to /aicg/ niggers
nothingburger
>>
>>102729165
>cohee melts down
>spergs out on discord
>says ST is not a roleplay frontend
>creates new branch and deletes all RP content
>these are "ui label/docs terminology changes" now
>everyone else is "up in their feels"
I don't care either way because they can't delete my SillyTavern folder but this has to be the quickest rewriting of events in the history of the world. This shit just went down half a day ago and they're already lying about it.
>>
>>102728852
Basado
>>
File: 1674219559276175.gif (3.03 MB, 359x202)
3.03 MB
3.03 MB GIF
>>102724751
>Nobody needs such a frontend for coding questions, or to ask general questions to an AI.
i do. where else am i going to rape my secretary as a $0pa income NEET??
Are you going to lend me YOUR secretary to solve hypervisor questions before sucking my cock?
I thought not.
>>
>>102728609
Do you have the source for this image?
>>
>>102729228
>says ST is not a roleplay frontend
The application that uses character cards, and embeds one with an anime girl called Seraphina, is not a roleplay frontend? Lmao
>>
>>102729332
>and embeds one with an anime girl called Seraphina
not anymore!
>>
File: Untitled(4).png (48 KB, 1202x211)
48 KB
48 KB PNG
>>102729332
you will use the blank slate and you will like it
>>
is miqu supposed to take 1000 seconds to generate 1 reply on a 4080?
>>
alright you fuckers got me. I can't deal with this dumb shit anymore. would anyone be so kind as to answer a few questions? can I run sonnet 3.5 on a 6gb vram card (rtx 2060)? can I use my jailbreaks like I do on ST? and can I sync my chats between devices? thank you in advance lads.
>>
File: file.png (17 KB, 861x256)
17 KB
17 KB PNG
>>102729332
https://github.com/SillyTavern/SillyTavern/commit/4d35fea3b3243a02e333747b9298bada0fdb3aab
>>
>>102729372
Even 24 vram can barely load 2.5 bpw exl2 miqu with a 4-bit cache. With 16 vram you're splitting heavily to your cpu/system ram, which will make it slow as hell. 1000 seconds is still really slow though. Were you loading a ton of context? How fast if your cpu/ram? Even when I offload massively to my cpu, my replies aren't that slow.
>>
>>102729382
Bait used to be believable.
>>
>>102729420
all ui label/docs terminology changes btw
>>
>>102729382
Anon you're fucking clueless but you gotta start somewhere I guess.
Sonnet 3.5 is a gigantic model (think 100+ GB VRAM) running somewhere in a server farm, and they sell you access to it, but they don't make the model weights downloadable. Your computer does not matter at all, you're just using their webpage.
A JB is just some text you pass to a model, so yew, you can pass any JB to any model ever, that has nothing to do with your computer or chat frontend.
You can open ST in your phone, but since you're so clueless, I bet you don't wven understand how a local network works or what an IP address is.
>>
>>102729432
>>102729436
the duality of /lmg/
>>
>>102729228
>not an RP frontend
*not an RP-only frontend
>>
>>102729420
Did this retard get cold feet after the Permiso niggers set up the GPT honeypots or what? It's not like it's the first time ERPoomers have been on the news
>>
>>102729349
>corrupt people playing dumb
Ahh, ahh society.
>>
>>102729432
que? no hablo ingles.
>>102729436
rude. I have ST running through termux BTW so that's why I asked in the first place. I just never cared about running anything locally before. I got baited by some schmuck on /aicg/, but that's on me I guess.
>>
>>102729420
damn completely buckbroken by journofags. how embarrassing
>>
I hate Discord drama like you wouldn't believe
>>
>>102729428
4096 context, ryzen 9 7950X3D
Not too sure what i should be setting kobold to for this
>>
>>102729443
>I melted down about people referring to my fork of a roleplaying frontend as a roleplaying frontend and then deleted all of the roleplaying assets but what I was really mad about was the 0 people saying it was a roleplay-only frontend
okay cohee
>>
>>102729496
You can leave any time.
>>
ST always had terrible UX anyway.
- Some things are auto-save, some things are click-to-save
- A delete icon can be a trash can, a skull or an X. Sometimes X means quit
- To see all chats with a card, you have to use a tiny menu separate from the card listing
- Can't use a preset per card
- Can't use a proxy/model per preset
- Transparency in panels means there's a fuckton of overlapping text
>>
File: file.png (116 KB, 795x628)
116 KB
116 KB PNG
>>102729349
Weidmann (p.e.w) dev of DRY and XTC on the matter.
>>
>>102729496
Replace "hate" with "have" and now we're talking.
>>
>>102729500
Well, your CPU is better than mine. I'm using a 7800X3D. I don't know why you're getting such slow gens, unless each reply is really long?

I average 0.5 t/s with Largestral 123b IQ2_S, which is also way beyond my 4090's vram capacity.
>>
>>102729517
>A delete icon can be a trash can, a skull or an X. Sometimes X means quit
Skull for character deletion is sovl, justified.
>>
>>102729521
I would use captain blackbeard to backup my files.
>>
File: 1713678790144788.png (14 KB, 561x588)
14 KB
14 KB PNG
>>102729541
Each reply is around a small paragraph in size, i'll upload my config real quick
>>
>>102729349
All this aggressive 'we need to burn it down, all roleplay is horrible and for weirdos" talk makes it seem like either the lead dev got a girlfriend who laughed at him after seeing his personal project or he just decided that he's done and really wants it to get bought by someone.
>>
File: 1717966302435509.png (14 KB, 545x582)
14 KB
14 KB PNG
>>102729563
>>
>>102729561
how many of your files involve immoral criminal activity like piracy?
>>
File: 1723876399997936.png (10 KB, 547x577)
10 KB
10 KB PNG
>>102729575
>>
File: 1726101361300788.png (15 KB, 543x579)
15 KB
15 KB PNG
>>102729586
>>
>>102729566
It's just CAI all over again, beautiful to see in a open source project.
Proprietary is a state of mind.
>>
File: file.png (93 KB, 1114x289)
93 KB
93 KB PNG
Writing was on the wall really.
>>
>>102729596
c.ai likely had the model provider stepping on their toes over like it happened with ai dungeon
there is nobody who has that power over the st devs unless they are trying to get bought or it's a personal issue
>>
so wheres the fucking FORK
>>
>>102729491
You got the answer you wanted, nigger. Hell, you got an answer.
Do your own research next time you ungrateful cunt.
>>
File: 1724829161442923.jpg (75 KB, 1500x1500)
75 KB
75 KB JPG
>>102729635
This is all we have here
>>
So, out of the models I've tried, I like Midnight Miqu the most, but 0.62T/s is just way too slow. Where can I rent it for use on a cloud service, how much would it cost and how long context window can you get with an online service? Are there any models that are straight upgrades that you "might as well use" if you go cloud? I like Midnight Miqu's style and haven't found any glaring flaws either.
>>
>>102729635
It's not worth being forked, the code quality is shit, no actually useful RP features, terrible UX. The only redeeming qualities were its popularity and some level of maintenance.
>>
>>102729664
no, shove it up your ass bitch. I didn't order a side of attitude.
>>
>>102729715
You know I would really like to play a proper roleplaying game with future models, where a dungeon master could create character cards dynamically but I worried that silly tavern was unlikely to support this kind of work load. I would like to not have to this myself because I am somewhat incompetent but if a successor project arises from here I would really appreciate a frontend that could handle this.
>>
>>102729741
Fine, then don't ever ask anything again if you don't want people calling out your severe mental retardation, you low IQ mouthbreather.
>>
>>102729796
I do what I want you dumb wanker. My fault for including a please and thank you for you degen lot. Clearly your parents never loved you, couldn't be arsed to raise you, fucking shitstain loser.
>>
so wait everybody's mad because sillytavern is going to get more useful instead of being a thing you use to erp?
>>
>>102729664
>>102729741
>>102729796
>>102729845
samefag
>>
amazing how much of a stillbirth llama 3.2 was
>>
>>102729864
>so wait everybody's mad because sillytavern is going to get more useful
No, since no one but you thinks that's going to happen instead of the reality of them just deleting a bunch of stuff.
>>
>>102729845
>I do what I want you
Me too, and I am calling you a retarded nigger.
>>
>racism outside of /b/
>>
>>102729910
I'm not calling you anything. I know for a fucking fact everyone despises you, you failed abortion. Just stating the obvious.
>>
>>102729962
Ok
>>
>>102729795
I've made a very basic RP frontend with recruitable generated characters, dungeon crawling, and party management, and it feels 100 times better than ST, as it's exactly to my liking. I wish there were a similar project where I could contribute, but I'm not keen on starting and managing a public project myself.
>>
File: 1726777837127413.jpg (2.09 MB, 3000x2609)
2.09 MB
2.09 MB JPG
>>102730009
>>
>>102730009
dump it, fag
>>
Hi all, Drummer here...

Feels extra nice to be a Kobold user today.

Also pls test: https://huggingface.co/BeaverAI/Behemoth-123B-v1a-GGUF
>>
File: file.png (35 KB, 899x491)
35 KB
35 KB PNG
Ready for another day of serious business with our agents lads?
>>
>>102730355
can't you cook something in the 30B range
>>
>>102724866
you dont need more than a single html page with javascript
>>
File: image(1).png (105 KB, 1272x697)
105 KB
105 KB PNG
>>102730355
>Feels extra nice to be a Kobold user today.
eh
>>
>>102730440
>Reverse proxies will be removed
Uhhhhhhh, won't this also affect local?
>>
>>102730543
see
>>102730535
>>
>>102730535
>fag flag pfp
Of course
>>
>>102725507
Group chats
>>
>>102730507
I've got Star Command R 32B?

Horde: aphrodite/BeaverAI/Behemoth-123B-v1a
>>
>>102730355
IQ4_XS gguf when
>>
>>102730592
CR is slow and shit, though
>>
>Silly Tavern is a serious project for serious people!

Lmao, the absolute state. Is it autism? Did someone pat him on the back, and now he feels like an adult man and does not want to hang out with the internet anymore? All in all, this is pretty funny. Nobody will use Silly Tavern for anything other than roleplaying; why the fuck should they? The client itself is bloated mess and it is mostly used by people who have no technical knowledge and just want to chill and roleplay there nothing that silly tavern have on other front end that would make me think otherwise.
>>
https://phys.org/news/2024-10-nobel-prize-physics-awarded-discoveries.html
Hintonbros.... we won
>>
File: 1722768150745458.png (83 KB, 980x658)
83 KB
83 KB PNG
>>102730644
>the very serious discussion going on at ServiceTensor discord
>>
>>102730617
What about an upscaled Mistral Small like Theia? We're thinking of either a 39B or a 45B upscale.
>>
>>102730666
30B is the best I can do, sorry.
>>
>>102730644
Multiple people have stated in this very thread that ST is very useful for assistant-type stuff.
>>
>>102730663
Based, incels should stay away.
>>
>>102730663
kek based
>>
>>102730663
Discord seems marvelously bad for productive software development.
>>
It will be better to start creating the front end from a clean state for RP purposes.
>>
I wish I had enough time and motivation to do a SillyTavern fork but unfortunately I don't
>>
github.com/open-webui/open-webui with ollama backend is better.
>>
>>102730773
same. just easier to not pull until someone else does it
>>
>>102730773
It is not worth it; it will be better if we create our own. I will have a free weekend, and I do not plan to go drinking.So if there is no other anon starting some project. I will start one. If some Anon feels active, they could post some bullets outlining what should be the most important features implemented right away.
>>
>>102730355
this drama is so peak
pure cinema
*sips cum elegantly*
>>
>>102730663
What would discordfags do without their safe space?
>>
File: chrome_098ltsTy2P.png (104 KB, 1032x829)
104 KB
104 KB PNG
>>102730797
>>
>>102730826
>It is not worth it; it will be better if we create our own
Delusional. It will take you years to reach feature parity and your code will be just as much of a mess by then, assuming you don't just give up entirely.
>>
>>102730826
>what should be the most important features implemented right away.
imo they would be:
- all presets that silly already supports
- import and export presets (should support the ST format)
- support for configuring formats like ST
- import and export formats (should support the ST format)
- support for llama.cpp, koboldccp and OpenAI API
- card management for importing, editing, adding, removing
- basic chat interface with avatars, regen, continue, delete/edit message buttons.

And that's it.
>>
>>102730942
Lol, Silly isn't THAT complex. Do zoomers really?
>>
I dont understand...for RP Silly is king.
There are many ChatGPT clones. Why compete with them?
Silly Devs better not break and delete everything with their changes anymore.
I doubt the serious business are as forgiving.
They dont even have anything precompiled, just the source for the nerds.
No exe, appimage, apk. Would that need bigger changes? What are they doing.
>>
>>102730978
>What are they doing.
being silly
>>
File: fables.gg.png (9 KB, 112x77)
9 KB
9 KB PNG
>>102726666
good alternative for AI Dungeon is now Friends and Fables. Too bad there no free opensource alternative.
>>
>>102730355
>>102730610
seconding this, iq4_xs just about fits perfectly in 64gigs
>>
>>102730963
They have cool scripting though.
I could make a spoilered CoT interception on user post that deletes old ones and then triggers the char response afterwards.
Didn't really improve anything but I'm sure you can make some cool stuff with quickreply scripting.
>>
>>102730978
Their ego got too big. They think they are special shit just because they are the most popular frontend for coom. In reality, ST is garbage, with an unintuitive and bloated interface. Shit like LM Studio absolutely mogs it.
>>
>>102731016
LM Studio is great as a gpt clone. I use it myself.
And you have everything in one place. Loading the gguf etc. Silly is just a server.
>>
>>102723336
Really? I tried it and didn't think it was that much different from Rocinante
>>
>>102731002
That is actually pretty cool.
>>
>>102731004
i never used that, so it must be bloat
>>
>>102731002
>>102731046
Buy. Ad. Now.
>>
>>102731016
>>102731031
In terms of llama.cpp frontends I would recommend GPT4All over LMStudio.
It's open source and one of the devs is making upstream contributions to llama.cpp so I have more confidence in that software actually working as intended.
(I myself am using neither.)
>>
>>102731002
this is amazing!
>>
>>102731002
Cool!
>>
>>102731002
Wow, that is awesome! Thanks for sharing, super cool stuff!
>>
>>102731089
Now you done it..
>>
File: kek.png (117 KB, 691x722)
117 KB
117 KB PNG
>>
File: 1699308729180905.jpg (153 KB, 1057x483)
153 KB
153 KB JPG
>>
>>102731002
I'll definitely try it when I have the time but honestly the example interaction they show on their website is not promising.
All rolls made to gain information should be made by the GM in secret and I find it baffling that they would choose to show an interaction that goes against this principle (even if in this particular case it doesn't matter).
>>
>>102731200
>>102730440
>Ready for another day of serious business with our agents lads?
>>
>>102731224
Brought to you by ProTalk AI™.
>>
>>102726922
Openwebui as far as I can tell
>>
File: 1728387024546.jpg (782 KB, 1080x1900)
782 KB
782 KB JPG
>>102730355
It's retarded...
>>
>>102731244
More like your character's retarded
>>
File: 1728388829550.jpg (796 KB, 1080x1911)
796 KB
796 KB JPG
>>102731471
That's true, but when I use Largestral she does understand the question properly.
>>
>>102731640
>>102731640
>>102731640
>>
>>102729612
>it was not and will not
>it was not
>original readme specifically called out roleplaying
lol
lmao
I get the whole branding thing, but c'mon, that's just delusional.
>>
so im a bit confused here,
so i trained a model to do a task whos response was numerical value, trained it so it returned that response in words not digits,
the 8b instruct model trained on the dataset will do this
the 1b instruct model will randomly use decimal digits instead of words in its response ,

what is up with that?
>>
>>102730009
Release it and let people fork it.
>>
>>102725350
This already happened
>>
>>102731791
No
Shit
Sherlock
>>
>>102725753
Also need a CoT mode, like it should be built in and toggleable and cleverly implemented, not just an afterthought
>>
>>102731805
anon
it
already
happened
>>
>>102731807
>and cleverly implemented
What would that look like?
>>
>>102731826
The only winning move is not to play.
>>
>>102731826
Only keep the most n recent CoTs to prevent repetition and patterns, or better yet, keep the most important CoTs if possible, force reasoning as a third party, not as the character (similar to o1)
Top of my head
>>
>>102731905
That would be pretty easy to implement as an ST extension.
I made an extension runs N prompts after the assistant's response that has the option to only keep the latest result in the chat, so that's something I know is not hard to implement, and I'm sure my implementation is messy as fuck since I didn't invest more than a couple of braincells while fapping to do that.
Something with more knowledge of ST's API's and such could do a much cleaner job, I'm sure.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.