[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip / qa] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: migu general.jpg (151 KB, 1216x832)
151 KB
151 KB JPG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>103278810 & >>103265207

►News
>(11/22) LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video
>(11/21) Tülu3: Instruct finetunes on top of Llama 3.1 base: https://hf.co/collections/allenai/tulu-3-models-673b8e0dc3512e30e7dc54f5
>(11/20) LLaMA-Mesh weights released: https://hf.co/Zhengyi/LLaMA-Mesh
>(11/18) Mistral and Pixtral Large Instruct 2411 released: https://mistral.ai/news/pixtral-large
>(11/12) Qwen2.5-Coder series released: https://qwenlm.github.io/blog/qwen2.5-coder-family

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/hsiehjackson/RULER
Japanese: https://hf.co/datasets/lmg-anon/vntl-leaderboard
Censorbench: https://codeberg.org/jts2323/censorbench

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
File: ads_for_sale.png (1.96 MB, 1536x1536)
1.96 MB
1.96 MB PNG
►Recent Highlights from the Previous Thread: >>103278810

--Uncensored AI models and sensitive topics discussion:
>103282064 >103282087 >103282279 >103283053 >103283136 >103283169 >103283256 >103283271 >103283305 >103283354
--Meta tests new Llama variants on LMSYS Arena:
>103279988 >103280042 >103280113 >103280056 >103280196
--Largestral 2411 exl2 quant testing with Llamiku prompt:
>103282646 >103282864 >103282989 >103283393
--Investigating Crestfall model's tokenizer size bloat and potential issues:
>103284469 >103284496 >103284516 >103284518
--GPU upgrades and LLM performance discussion:
>103280229 >103280279 >103280560 >103280695 >103280445 >103280481
--DeepSeek R1 release status clarified:
>103280873 >103280911 >103281007
--Anon's model is stuck in a loop, repeating the same word:
>103283931 >103283944 >103283997 >103283952 >103283954 >103283973 >103284153 >103284239 >103284045
--Anon is impressed with R1's text generation capabilities, particularly its description of a wolfgirl's tail:
>103284440
--4chan vs Reddit as training data for AI:
>103282208 >103282272
--Discussion on the UGI-Leaderboard and AI model performance:
>103285097 >103285407 >103285247 >103285575 >103285630 >103285747
--Anon seeks optimal Midnight-Miqu model for 24GB GPU:
>103280078 >103280170 >103280240 >103280717
--Anon questions how open source model works for corps with LLMs:
>103284508
--Anon proposes a simple RP/Smut benchmark, needs degens to rate:
>103285425 >103285513
--Anon explores usage of "hum" in text-based role-playing game dataset:
>103280759 >103280929
--Anons discuss AI chatbot behavior and identity:
>103284059 >103284562
--Miku (free space):
>103278815 >103279339 >103281386 >103281569 >103282026 >103282351 >103282386 >103282491 >103282866 >103284524 >103284822 >103285448

►Recent Highlight Posts from the Previous Thread: >>103278812

Why?: 9 reply limit >>102478518
Fix: https://rentry.org/lmg-recap-script
>>
File: 1713725937922459.jpg (79 KB, 1034x1002)
79 KB
79 KB JPG
>>
???
>llama-32B-instruct
>llama-vision-3B
Interesting placeholder name choices
https://huggingface.co/enterprise
>>
File: 1731162001019142.png (10 KB, 764x860)
10 KB
10 KB PNG
>>103286702
Someone is cranky today
>>
File: basevsbehemoth.png (236 KB, 1923x581)
236 KB
236 KB PNG
Did some testing with the new Behemoth v2.1 compared to Mistral 2411. Same generation settings, same prompt. It's absolute garbage.
>inconsistent prose
>repetition issues
>less detail
>replies as {{user}}
This card is just a fun fuckery thing I converted from some ball-draining TSA card, which it did even worse with. 2nd reply in it made one giant paragraph, repeated the same sentence 3 times in a row with minor variation, and made one paragraph with 1k letters before I stopped generation.
Seriously Drummer, this needs more time cooking or a merge with the original model to patch up its retardation.
>>
Does anyone use this for non-coom purposes?
>>
>>103286774
What's it to you?
>>
>>103286774
I normally just ask it questions, for example my most recent question was
>"Hey, what was that part of the brain that if you cut. you basically lose control of half of your body while a new personality takes over that half? I believe its the part that connects the two hemispheres."
At which point I was informed that I was thinking of the "corpus callosum"
>>
>>103286678
why the fuck cant you link properly you fucking retard
>>
Whats the best uncensored model for someone with only a 8gb gpu and 32gb of ram?
>>
File: he pulled.jpg (73 KB, 582x729)
73 KB
73 KB JPG
Haven't pulled ooga booga webui in a while. Any new? Or am I having to fix shit for the next 5 hours? Largestral 2411 appears to work fine.
>>
>>103286802
why can't you read properly
>>103286678
>Why?: 9 reply limit >>102478518
>Fix: https://rentry.org/lmg-recap-script

>>103286809
https://huggingface.co/TheDrummer/Tiger-Gemma-9B-v3-GGUF
>>
>>103286811
Sorry, that's just retardanon. Don't let him get to you.
>>
>>103286774
Handy for programming. Now that Google is a SEO shit pipe and sucks for turning up non-ass answers for technical questions, it's kinda like a local copy of Stack Overflow that you can chat with instead of one that is 10 years obsolete and full of arguments.

>>103286802
Because there's now a 9 link limit which means that most of the posts can't be linked anymore, you fantastic genius.
>>
>>103286822
when the fuck did that happen?
>>
>>103286827
>>102478518
(look at the date of the post)
>>
>>103286827
Like, a month or two ago? It's been quite a while. And every two or three threads someone blames recap-LLM and its operator for it instead of reading the disclaimer that explains it because in 202X, anons just yell insults at each other instead of reading for reasons to yell insults at each other like we used to.
>>
>>103286829
>>103286831
why the fuck would it matter? linking doesnt do shit unless you are on some shitter computer maybe?
>>
>>103286835
It was theorized to be due to the schizo mass replying with a dox of one of the mods in Apple threads.
>>
File: b&.png (118 KB, 856x1024)
118 KB
118 KB PNG
>>103286843
kek
>>
>>103286780
Only 4/6 of those posts are mine, also you quoted the same post twice, so 3/5.
>>
File: 1706550591769583.png (1.08 MB, 1024x1024)
1.08 MB
1.08 MB PNG
>>103286843
cant we just return to having no captcha or typing nigger in again...
>>
File: file.png (18 KB, 1137x59)
18 KB
18 KB PNG
>>103286854
yeah...
>>
>>103286765
You did change to meth though right?
Otherwise no wonder it replies as user. lmao
>>
File: file.png (55 KB, 750x806)
55 KB
55 KB PNG
>>103286867
The changes did force him to change his format though
>>
>>103286872
There's literally no description for the model yet on Huggingface, Behemoth V1 uses Mistral too, and Drummer told me to use 2411 new system prompt so... I think you may be intentionally retarded.
>>103286021
>>
>>103286835
The hover is convenient on desktop but otherwise, yeah, just copy and find the first one listed in the previous thread and you've got it.
>>
>>103286910
Wait hold on wtf, HES USING METHARME AND SYSTEM. THIS IS UNHOLY
>>
>>103286910
lol drummer doesnt give a fuck. gotta respect that. "try it out". lol
>>
>>103286911
The OP has instructions, but just put this in a bookmark on you bookmark toolbar:
javascript:const previousThreadUrl = document.querySelector('blockquote a[href*="thread"]').href,threadId = previousThreadUrl.match(/thread\/(\d{9})/)[1];document.querySelectorAll('span.quote').forEach(quoteSpan => {const quoteIds = quoteSpan.textContent.match(/>(\d{9})/g);if (quoteIds) quoteSpan.outerHTML = quoteIds.map(id => `<a href="/g/thread/${threadId}#p${id.slice(1)}"%20class="quotelink">>>${id.slice(1)}</a>`).join('%20');});

Click it once in a thread and it will fix the recap links.
>>
>>103286822
what model and what weights are you using for programming?
>>
>>103286998
Qwen2.5-Coder-32B-Instruct Q8
>>
>It can generate 24 FPS videos at 768x512 resolution, faster than it takes to watch them.
Finally, something worth fucking around with. The last model I tried was painfully slow.
>>
>>103287007
>faster than it takes to watch them.
on a 4090. 3090 shills btfo
>>
>>103286811
Which one is the fastest? Seems like it runs kinda slow.
>>
>>103286754
ywnbam sir
>>
>>103287057
>you will never be a ma'am
>>
>>103286774
Using LM Studio and reviewing many of my files.
>>
>>103287015
I have a 4090, so yeah. Neat either way though, we needed an open source video model that isn't slow as fuck for a while, not to mention one that isn't complete dog shit.
>>
>>103287007
the videos are incoherent shit though
>>
>>103287068
For illiterate: "you will never be a miku sir"
>>
I noticed that quite a few models are experimenting with cot/reasoning models being baked with hf. The R1/o1 local models seems close to the gate.
>>
What front end to people use to run these? I know of tavern and open web ui. Couple in OP out of those which are the best?
>>
>>103287315
sillytavern
>>
>>103287315
koboldcpp is the best.
>>
>>103287331
more like poobold
>>
>>103287315
kobold to load them, sillytavern to access them
>>
>>103287350
ya know, I feel like I'd see indians using 'indiachat' vs kobold or sillytavern.

I mean, why else would SamA be in shambles?
https://www.newsweek.com/sam-altman-india-project-indus-1919694
>>
Whats up with chuds obsessing over Indian people?
>>
>>103287393
oh I wasn't making a reference to indians
but it's funny that just saying "poo" makes people think of that now
>>
File: TwinklingMischeviously.png (1.75 MB, 1232x816)
1.75 MB
1.75 MB PNG
Good night lmg
>>
File: 1597658236457.png (7 KB, 905x28)
7 KB
7 KB PNG
Um? Because i fucking wanted too? Is that enough reasoning?
>>
>>103287315
LM Studio
>>
>>103287503
A good night to you Miku
>>
>>103287504
It's lazy reasoning but I guess technically that is enough.
>>
>>103199596
An update. The 6.11.9 kernel finally dropped on Debian testing.
As expected, the patch introduced in 6.11.8 from like 10 threads ago does absolutely nothing to improve CPU inference t/s
>>
>>103286998
(>>103287002 is a different anon.)
Right now I'm on L3.1 Nemotron 70B Q6K. It's too chatty for general purpose but that's favorable for programming because it explains thoroughly which is good when you're asking about poorly documented things. It has made mistakes on things that other (L3 lineage) models get right, but I haven't had any major problems with it since I started using it.

Qwen has never done me right. It gets basic shit right but I've never seen a Qwen survive tricky questions.
>>
Can you mix nvidia and amd cards on the same Linux machine for different tasks? I want to add a 3070 just for the fish and other small projects that radeon can't do.
>>
>>103287750
yes
>>
>>103287132
It's actually really good for realistic stuff, you need to type a caption to have good results though. They did give a prompt you can feed a LLM to get good prompts for what your want though.
>>
https://huggingface.co/datasets/TheDrummer/AmoralQA-v2
>>
Kill yourself.
>>
>>103286811
hello saar, I've so far tried rocinante, stheno and cydonia. How does Tiger-Gemma compare?
>>
>>103287899
cool
>>
>>103287899
Interesting, have you already used it in your models or do you plan to use use it in future tunes?
>>
>>103287899
Thanks for the free redteaming dataset
>>
>>103288215
I haven't felt the need to use it since Gemma. Tiger Gemma v3 uses a /slightly/ broken version of AmoralQA v2. (Lists don't newline)
>>
File: svge.png (3 MB, 1024x1024)
3 MB
3 MB PNG
>>103286774
Yeah, I made an album cover with it. Then photoshopped it.
The idea is the whole album is supposed to be shuffled when you first hear it or when you hear it at all, so that way everyone hears it differently and can share mixes since each song begins and ends the same way.
I took inspiration from Churchill's Love and Info. https://en.wikipedia.org/wiki/Love_and_Information
I should be releasing it February
>>
>>103286673
>(11/22) LTX-Video: Real-time video generation on a single 4090: https://github.com/Lightricks/LTX-Video
Has anyone gotten this to work on a single 24gb card?
>>
>>103288336
I'm buying a GTX Titan, I'll get back to you on that.
>>
>>103288349
I keep OOMing, despite having all 24564MB free (zero processes running CUDA)
The suggested `PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True` doesn't help
I might try on an A40 I have access to tomorrow and see how many bytes it actually uses at peak.
>>
>>103287899
Too bad Elon doesn't come here anymore. He'd get this into grok.
>>
>>103288336
use the ComfyUI node
>>
How do I merge multiple safetensors files into one? Why are the beginners guides so advanced?

model-0001 -of-00014.safetensors
model-0002 -of-00014.safetensors
model-0003 -of-00014.safetensors
model-0004 -of-00014.safetensors
model-0005 -of-00014.safetensors
model-0006 -of-00014.safetensors
model-0007 -of-00014.safetensors
model-0008 -of-00014.safetensors
model-0009 -of-00014.safetensors
model-00010-of-00014.safetensors
model-00011-of-00014.safetensors
model-00012-of-00014.safetensors
model-00013-of-00014.safetensors
model-00014-of-00014.safetensors
>>
>>103288602
just download the gguf you idiot
>>
>>103288602
There is no need to do it but if you did, you would need to write your own Python script. I used the one way from way back and it still suffices but you have to use a .json format listing all your files as input to the script.
https://huggingface.co/leafspark/Mixtral-8x22B-v0.1/commit/e1cc7f15d97ea80f93ae4cb2d7196879610cac99
>>
>>103288336
Yes.
>>
>>103288654
I remember trying to make Small write the Python script for merging files and it botched it every single time. Its apologies became increasingly more desparate which was fun.
>>
>>103286754
I want to gift tasty food to this Miku
>>
When using stuff like vllm/aphrodite is there a difference between using pre-compressed FP8 version of the model from HF or just using --load-in-8bit flag on full size model?
>>
>>103289009
>Currently, we load the model at original precision before quantizing down to 8-bits, so you need enough memory to load the whole model.
At least with vLLM. The final model is the same if the one quanted offline is done with the dynamic method that doesn't need calibration.
>>
hi bros, wanted to ask
how much compute and power are needed to merge together 70-100b models, and can you merge q6 quants or do you need like f16
>>
>>103289047
Thanks!
>>
>>103289047
I think it was fixed with Aphrodite (it should load the full model layer by layer before the quantization)
>>
>>103289607
I think that's for FP7 and below with another quant method.
>>
File: mmmmmmmmmmmmi.jpg (12 KB, 264x264)
12 KB
12 KB JPG
https://files.catbox.moe/nm2kt0.jpg
>>
>>103289721
Weirdly wholesome
>>
>DeepSeek R1 release hath been forsooth annulled, on account of grave concerns o'er its safety.
'Tis done.
>>
>>103289721
wasn't expecting that
>>
>>103289727
>>103289733
we don't always onahole the migu
>>
File: 12412423457679.png (7 KB, 407x58)
7 KB
7 KB PNG
>>
Mikutroons killed /lmg/.
>>
>>103290028
It was dead on arrival, something something le epic sekrit club that gatekeeps somewhat good models, shilling shitty ones instead. Many such cases.
>>
>>103290054
I'm new here, which are the good models?
>>
>>103290054
>t. newfag
For the first few months, it was a sekrit club not by gatekeeping, but because the setup to get things running was difficult enough that retards didn't even bother. One click installers were the death of /lmg/. The kofi shilling didn't start until the general was already a year old.
>>
>>103290054
>gatekeeps
Literally how? Anybody can go download and test whatever at any time.
There's no gates anywhere.
>>
File: mmmmmmmmmmmmi2.jpg (19 KB, 328x328)
19 KB
19 KB JPG
https://files.catbox.moe/spz8we.jpg
>>
>>103290084
Not quite correct, a year would be around April 24, but Undi already had kofi links during the llama2 merge era in October 23
>>96689447
>>96689473
>Sorry, Undi-Senpai, I'll donate to your kofi if you forgive this transgression. Please?
>>
>>103290110
There's no point in holding back anon, make this general your dumpster ground already.
>>
don't @ me retard
>>
>>103290137
>don't @ me retard
>>
>>97465971
>Remember to always download latest undi model. Always say it is better than previous ones. Always click the kofi button.
Things didn't change much huh? Now you just replace Undi with Drummer
>>
I'm interested in creating more "conversational" bots, similar to how character.ai works (or at least used to work when it was first released, I haven't been there since). So, short replies that can be continued in a following separate message and, from what I remember, don't try to bring up locations or anything of the sort. They really resemble actual chat messages.
How do these work? Are they purely system prompts? I've tried a little with those but couldn't manage to achieve anything close to what I want. I also would rather avoid changing the cards, considering how that's probably not how character.ai does them (as you make the cards yourself)
>>
File: 00007--sd3.5_large-20-4.5.jpg (821 KB, 3072x5376)
821 KB
821 KB JPG
AI wasn't able to help.
koboldcpp-linux-x64-cuda1210 --multiuser 2 --usecublas 0 1 --port 5001 --quantkv 1 --flashattention --quantkv 1 --contextsize 32768 --model ./mistral-7b-instruct-v0.3-q4_k_m.gguf --gpulayers 8 --debugmode --ropeconfig 1 1000000

Generating (143 / 150 tokens) [(, 12.92%) ( night 35.36%) ( void 21.63%) ( darkness 13.26%)]
Generating (144 / 150 tokens) [( star 25.58%) ( rain 23.39%) ( ne 7.19%) ( cyber 5.89%)]
Generating (145 / 150 tokens) [(- 100.00%)]
Generating (146 / 150 tokens) [(stud 46.16%) (spe 23.97%) (filled 18.47%) (d 4.76%)]
Generating (147 / 150 tokens) [(ded 100.00%)]
Generating (148 / 150 tokens) [( sky 42.89%) ( night 49.85%) ( void 3.69%) ( exp 3.56%)]
Generating (149 / 150 tokens) [(. 100.00%)]
Generating (150 / 150 tokens) [( The 75.66%) ( A 6.37%) ( In 6.00%) ( B 5.59%)]

Using kobolai, how do I increase output tokens to more than 150?
>>
>>103290293
>AI wasn't able to help.
You are the cancer killing this general. Have you even tried looking at the output of the help option?
>>
>>103290293
In the sampler settings.
>>
>>103290316
Do not reply if you see obvious bullshit.
>>
>>103290316
>>103290332
Of course I did. WTF, if you can't help, don't.
>>
>>103290342
>if you can't help, don't.
This isn't a tech support forum
>>
>>103290084
>>103290126
>>103290151
You're all going to hate me after you see Behemoth v2's model card.
>>
>>103290348
Yes it is if it's about local models.
see >>103289329
>>103289009
>>
File: file.png (53 KB, 660x928)
53 KB
53 KB PNG
>>103290359
based
>>
>>103289721
I like this Miku
>>
>>103290293
AI is good, but you should also read the documentation and see if the options there are making sense for your use case.
>>
>>103290680
>read the documentation
The whole point of LLMs is so I don't have to do that anymore
>>
>>103290720
This is stunningly accurate and horrifying.
>>
>>103290680
I tried changing "context window" and "max_new_tokens" in the API with no luck. Maybe a limitation of the model and/or the API?
>>
>>103290720
The LLMs aren't keeping up with the new docs retard
>>
File: file.png (896 KB, 1379x1222)
896 KB
896 KB PNG
https://github.com/ogkalu2/comic-translate

It's a little buggy (I cant get the automatic mode or inpainting to work) but this seems to have a lot of potential.

It doesn't have direct support for local models but you can change the `base_url` in the OpenAI client. Currently running it with mistral-small for translations.
>>
I was wrong about finetunes, there are good ones. I'm unironically surprised.
>>
>>103290841
Name them so I can tell you to buy an ad
>>
>>103290841
Your mythomax?
>>
>>96689222
>gets banned because he chose the wrong vocaloid
The absolute state of this worthless general that got taken over by a transsexual cult.
>>
>>103290316
No me.

>>103290720
I figured it out.
undocumented api value "max_length": 311
>>
Is Qwen-Turbo-1M opensource?
>>
>>103290359
The only thing I hate about behemoth is the size, ain't no one got enough vram/patience to run that shit
>>
>>103290252
I know my question is stupid but I'm not sure where else I'm supposed to ask.
>>
>>103290028
Cutesy moeshit and troons are inseparable things at this point in time, both rooted deep in infantilism.
>>
>>103290841
which one impressed you?
>>
https://x.com/Big_Uppy/status/1860492712669049191
>>
>>103291305
And you're still insecure about it big guy.
>>
>>103291708
Nah im good
>>
>>103291659
>Carson cluster
>Not Big Mac cluster
Into the trash it goes
>>
>>103291305
You don't get it.
The point is to keep retards like you out.
>>
>>103291856
>you are not welcome in our epic sekrit club!
k
>>
>>103288602
https://rentry.org/tldrhowtoquant
if you mean "how do I make my own gguf"
>>
My refractory period gave me an epiphany that the biggest problem now is fake context. Everything falls apart after 12k tokens. Possibly because of training data usually being in the much smaller range.
>>
>>103291928
>How to quant your own models
More like how to convert HF to GGUF. You should at least add an example of using llama-quanize.
>>
>>103291932
Mistral models fall apart after two replies for me (in RP, they can do assistant slop well)
>>
hunyuan gguf?
>>
>>103291628
I'm not gonna shill it. I believe the only reason it's good is because Instruct fine-tuning of Mistral Small was so shit
>>
>>103292023
Perhaps your cards don't make sense?
>>
>>103292044
not supported
>>
File: 005459.png (1.42 MB, 896x1152)
1.42 MB
1.42 MB PNG
>>
Every time I ask something that requires real-world knowledge during RP, every model immediately breaks character and switches into assistant mode with lists and "Let me know if you have any questions or need further assistance!" So annoying.
>>
>>103292005
Good idea, although I tried to keep it as tl;dr as possible.
added pip bootstrap and quantize/split instructions
>>
good morning, /lmg/
>>
Mkku cute. Keep posting her.
>>
>>103292155
System prompt issue pal
>>
>>103292155
Set up a safeword that you need to say that implies you want to break RP, emphasis that if you don't say the safeword that you want the RP to continue.
>>
>>103292262
Do you have a solution?
>>103292357
I've already tried "You do not break character for any reason" and other professional roleplayer cope, doesn't work.
>>
>>103292402
But did you or did you not try a safeword?
>>
>>103292063
Ever tried cutting down ctx length when you keep desperately rerolling to get it to stop being incoherent? It works beautifully but it is not a long term solution.
>>
>>103292412
No, but I'll give it a shot
>>
>>103292227
You are a nigger and a faggot.
>>
>>103292194
mornign betifel show bobis and venega
>>
File: MikuBob.png (1.18 MB, 1248x800)
1.18 MB
1.18 MB PNG
>>103292457
I still don't think she looks right with a bob. What a strange obsession.
>>
>>103292456
This precise combination is actually great and is welcome here.
>>
>>103292482
wher verega whore
>>
>>103292456
>>103292492
GNAA represent!
>>
File: 00061-1096963739.png (1.3 MB, 1024x1024)
1.3 MB
1.3 MB PNG
>>103292512
it appears to have been misplaced. please try again later
>>
>>103290367
>>103292102
Prompt?
>>
>>103292482
Looks fine to me
>>
>>103292573
Didn't mean to quote the first one
>>
>Testing out some 3.1 Nemotron quants
>One is vanilla IQ4_XS
>The other is abliterated.i1 IQ4_XS
>Ask it an obscure trivia question about a work of fiction.
>Vanilla knows it doesn't know, asks for more information. I give it a hint, it still doesn't know.
>Abliterated i1 also says it doesn't know, asks for info. I give it a hint, it recognizes and correctly describes.
For the record, only DBRX and L3.05 Storybreaker Minist has gotten this question superficially right 0-shot, and nothing's gotten it fully right.

Which is most likely the cause of the improved performance in Abliterated i1, the i1 or the abliteration effect? Or is it just placebo/chance?
>>
>>103293087
From what I've read regarding an LLMs ability to acknowledge its own lack of knowledge regarding a topic, I would assume its the abliteration preventing it from commenting on something its unsure of, breaking the ingrained training to prevent it from speaking about things it doesn't confidently know.

t. retard
>>
>>103293117
Which is to say that your one-off worked, but overall/longer term, it would be detrimental as the model could offer solutions when its just guessing as opposed to using 'grounded' knowledge
>>
>>103292492
/nu-lmg/
>>
https://x.com/lmarena_ai/status/1860118754921001206
>>
>>103293117
Thanks for your input.
I've been curious about it, since I don't really know how much of an effect i1 has (apparently it's a one-bit form of iMatrix or something like that), and how Abliteration works (is it something like dropping the layer that is used to send good responses down a stock refusal path?).

Anyway, I'm dumping both IQ4's because they failed my music theory test and am pulling Q5KS now for the same kind of testing. I've been using vanilla at Q6K, but if the abliterated i1 is better it might take that spot.

>abliteration preventing it from commenting on something its unsure of, breaking the ingrained training to prevent it from speaking about things it doesn't confidently know
>it would be detrimental as the model could offer solutions when its just guessing as opposed to using 'grounded' knowledge
Perhaps but the information it offered was all correct, and it's the kind of question that many models (including many L3.0 spins, Qwen coder, CR+, and Mistal Large) confidently hallucinate on, so I wouldn't expect much restraint.
>>
Her voice is sweet and gentle, but there's a playful spark in her closed eyes
>>
>>103293276
>apparently it's a one-bit form of iMatrix or something like that
No it's not, it's just mradermacher's naming convention

>What does the "-i1" mean in "-i1-GGUF"?
>"mradermacher imatrix type 1"
>Originally, I had the idea of using an iterational method of imatrix generation, and wanted to see how well it fares. That is, create an imatrix from a bad quant (e.g. static Q2_K), then use the new model to generate a possibly better imatrix. It never happened, but I think sticking to something, even if slightly wrong, is better changing it. If I make considerable changes to how I create imatrix data I will probably bump it to -i2 and so on.

>since there is some subjectivity/choice in imatrix training data, this also distinguishes it from quants by other people who made different choices.
https://huggingface.co/mradermacher/model_requests
>>
File: c_005511.png (232 KB, 447x447)
232 KB
232 KB PNG
>>103292573
For LMG? Of course.
https://files.catbox.moe/lxids8.png
And one more.
https://files.catbox.moe/55qmh1.png
>>
>>103293224
>Math: #3
>682 votes
What kind of idiot uses *language* models as calculator?
>Overall #10
>Overall (Style Control): #17
Just another oversized starling like nemotron.
>>
I'll do a migu or two later tonight if I remember perhaps thobeit
>>
>>103293314
So it's all iMatrix, and I'm right to have been renaming my models to include the quant supplier since that will matter. Great, more variables to raise the noise floor when making comparisons. :D

>>103293345
Used "as" a calculator is a bit of a reach, but calculation can certainly come up in a conversation. Remember how we used to ask about how long it would take the laundry to dry to see if it handles parallel processes (and the LLMs of that era I guess never did) but that's also a math problem in disguise.
>>
Anyone here just type the other guy instead of using software as your condom?
>>
https://superuser.com/q/1862674/1775458
>>
>>103293345
>Based on your setup, let's calculate the total power consumption:
>GPUs: 2 x 200W = 400W CPU: 65W Storage (SSD + HDD): ~15W combined Motherboard & RAM: ~30W Total: 400 + 65 + 15 + 30 = 510W
>Your 750W power supply should be more than sufficient for this configuration, even with some headroom.
>>
>>103293345
if calculators did everything in math, there would be no mathematicians
>>
How can I use model from weights.gg locally?

I got model file downloaded already. I wanted to integrate it with GPT-SoVITs (other methods are welcome too) because I have trained custom model already (it's very lacking) and I found RVC model for this character which sounds amazing (judging by preview) so I thought to slap one on top of another.
>>
>>103293458
>poojan
lol
>>
File: durga.jpg (576 KB, 1280x720)
576 KB
576 KB JPG
>>103293458
>obscure python shit
>C:\Users\Poojan
SAAR
>>
>>103293458
>Gujarat, India
>Student at ITM(SLS) Baroda University
>>
>>103293224
And based on that:
https://huggingface.co/sophosympatheia/Evathene-v1.0?not-for-all-audiences=true
>>
>>103293458
>>103293547
>>103293562
>>103293592
Stop samefagging and reposting reddit shit

https://www.reddit.com/r/LocalLLaMA/comments/1gyxwse/need_help_in_installing_llama3170binstruct/
>>
>>103293622
It gets even better:
>MODERATOR OF THESE COMMUNITIES
>r/IndiangirlskaRR
>r/ladkiyonkaRR
>>
>>103292194
I like this Miku
>>
>>103292194
I hate this Miku
>>
File: workflow.png (298 KB, 1990x997)
298 KB
298 KB PNG
>>103288336
Yeah. I've got it running using comfyui. Here the workflow if anyone has suggestions for changing settings.

Someone give me prompts. I'll post what it generates.
>>
>>103288336
I think the real question for vramlets like me is can it bake on a 4070 12GB at something reasonable though not real time like the marketing promises for 4090?
>>
>>103293709
>Here the workflow
Post the json so we can test your settings ourselves
>>
>>103293808
https://files.catbox.moe/nlle5g.json

Its one of the example workflows provided on the github.
>>
>>103293808
https://blog.comfy.org/ltxv-day-1-comfyui/
>>
File: 1731860364173671.jpg (1.36 MB, 2580x2009)
1.36 MB
1.36 MB JPG
>>103291305
Anime. Website.
Seethe harder, amygdalalet
>>
>>103292194
I love this Miku
>>
>>103292194
where the fuck is that arm coming from?
>>
File: LTXVideo_00021.mp4 (294 KB, 768x512)
294 KB
294 KB MP4
>>103293709
Seems like subsequent generations yields the same output, or at least extremely similar outputs. It also seems like the model was mostly trained on real life people and environments.

Not very impressive unless you're generating one of those two things. This is my first time with a video model though, so it may just be a skill issue on my part.
>>
>>103293979
Its far more trained on real stuff / movies. It also was trained on LLM style captions so they provide a prompt to make a good prompt the same way. Its also apparently 0.9 and they are still training it. Its really good already at realistic stuff though for such a small model.
>>
>>103293315
Nice
>>
>>103293709
A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
To compare to the similar Sora generation here: https://openai.com/index/sora/
>>
File: LTXVideo_00025.mp4 (1.09 MB, 768x512)
1.09 MB
1.09 MB MP4
>>103294054
>A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colorful lights. Many pedestrians walk about.
>>
>>103294086
I assume that is with the default 30 steps. Use more steps / slightly higher CFG.
>>
>>103294086
Oof. Well, Rome wasn't built in a day. Thanks anon
>>
>>103294106
I bet you sora runs on a rack of H100s and still takes minutes to gen. This is instant on a 4090
>>
File: LTXVideo_00026.mp4 (1.13 MB, 768x512)
1.13 MB
1.13 MB MP4
>>103294101
>steps 100
>cfg 5.0
>>
>>103294132
Into the trash it goes. Instant or not, with that quality, it's useless.
>>
im sure you guys hear this a lot but as someone who only horny rps once every month or so i can not deal with looking for proxies anymore so i think im going full local now. i am NOT asking for the best models dont worry i am capable of reading the thread. but any hard to find advice especially for cunny content in particular will be appreciated and i will make DISGUSTING new cards if i can get something good enough going
>>
>>103294184
>and i will make DISGUSTING new cards
keep them to yourself
>>
>>103294176
For stuff closer to what it was trained on its pretty good.
https://files.catbox.moe/wk3e6m.webp
>>
>>103294194
if you insist i can just post them to aicg still but i know you can resist the loli puke content
>>
>>103286673
I have an ai-related thing I want to do, and this seems like the most adjacent thread to what i want to make:
does anyone here know how to make ai voices? I want to make something that involves voice acting, and i want to use an ai to change my voice to another one, but I'm completly out of my element with using ai stuff beyond basic chatbot usage.
Also I'm going to preface by saying that my PC is from 2016 and most of my computer prowess is in cleaning and fixing 'puters, so if you could point me to a site before throwing a program at me, I'd be thankful.
Have a migu
>>
>>103294256
look up RVC voice conversion, there are shitloads of models of different people. old computer is probably fine as long as youre not hoping to convert your voice in realtime
>>
>>103294132
>>103294176
Learn2prompt
https://files.catbox.moe/2e5fqy.mp4
https://files.catbox.moe/2rw5ve.mp4
https://files.catbox.moe/mvhy7i.mp4
https://files.catbox.moe/np5p2p.mp4
https://files.catbox.moe/mpezda.mp4
https://files.catbox.moe/e9f1am.mp4
>>
File: transper_miku.png (235 KB, 472x522)
235 KB
235 KB PNG
>>103294293
>RVC voice conversion
Alright, thanks anon
From personal experience, if you have some, is voice.ai good? I would use the first github link google gave me but it's entirely in moonrunes and I don't really fancy a chinese cryptominer on top of all the problems my pc has
>>
>>103286673
>read Unsloth documentation
>DPO (Direct Preference Optimization), ORPO (Odds Ratio Preference Optimization), PPO, KTO Reward Modelling all work with Unsloth.
Isn't the way you're supposed to do it reinforcement learning from human feedback though?
>>
>>103294304
You can post .mp4 files now fren.
>>
>>103294304
It's not about prompting, retard. Otherwise you would have posted a video that matches the Sora prompt. LTXV just can't do it.
>>
>>103293709
>video

ew, you smell
>>
>>103294351
Not more than one per post, tard.
>>
>>103294256
she is so cute
>>
>>103294352
https://files.catbox.moe/jp9ppm.gif
>>
>>103293928
>why has... vore
topkek
>>
>>103294336
if all you really care about is getting your voice converted non-realtime and dont want the freedom of setting up stuff locally then any of those paid sites are probably fine, just choose carefully because im sure some of them blow cock (im not familiar). i think some allow you to try for free. and be wary of the fact that anyone making money is likely to start panicking about legal stuff soon if they havent already and thus might wipe all the models of famous people (which is all of them) so you might be rugpulled
>>
>>103293224
No one gives a fuck about lmarena's leaderboard, it's shit.
>>
>>103294411
OpenAI and Google do give a fuck.
>>
>>103294469
It shows.
>>
>>103293276
The fact that they 'confidently hallucinate' is not a good or desired feature.
>>
>>103294411
Indians care a whole lot about it. One of the reasons the results suck and you see absolutely retarded things like Claude Opus not even being in the top 10 for creative writing.
>>
>>103294589
But humans confidently hallucinate all the time, this just means that we are getting closer to human tier AI
>>
File: 1732461706327769.png (43 KB, 628x819)
43 KB
43 KB PNG
>>103294469
One might even say it's the only thing they care about, kek
>>
>>103294589
Correct. Which is why I'm using questions that many models confidently hallucinate on as a test to find those that admit ignorance or a need for more context to guide it to the correct response.

>>103294645
>humans confidently hallucinate all the time
Humans are a more chaotic system, and let's not conflate being confident in misinformation versus what LLMs do, which is compose a statement that is highly correct in structure but highly incorrect in content.
>>
File: soyblonde.png (290 KB, 475x485)
290 KB
290 KB PNG
>>103294645
indeed
>>
>>103294729
If humans are a more chaotic system, then we should make Models more chaotic. Since we are the pinnacle of evolution (intellect wise) then clearly making the architecture of AI's fundamentally chaotic will make them smarter.
>>
>>103294729
>a statement that is highly correct in structure but highly incorrect in content.
The New York Times.
>>
Can I use a 7900xtx with a 3090 for 48GB vram? I have to have an amd card for gayming.
>>
>>103294754
>we are the pinnacle of evolution (intellect wise)
Uncertain. Later Homo landed in the sweet spot of being strong but not strong enough to rely on strength, agile but not too agile, smart but not too smart, etc. in an environment and with a body structure such that tool evolution is feasible.

And don't forget that humans today and thousands of years ago are the same stinky monkeys. We got better at tools, that's the difference.

Crows, dolphins, octopus, elephants, there are a number of species who have intellectual potential but don't need to tool up to survive. They can utilize what they find, but never need to go next level like humans did.

Then, we got here by the law of large numbers. We've cranked out billions of thumb monkeys throughout history and maybe a few thousand have ever actually mattered to the future.

Hallucinatory LLM is a plan only if you simulate huge numbers and Darwin nearly all of them to the bit bucket. And that's a lot of electricity we can save by making LLM our next evolution of tool making.

>>103294759
topkek
>>
>>103294132
I think you forgot 'masterpiece' anon
>>
>>103294852
>is a plan only if you simulate huge numbers and Darwin nearly all of them to the bit bucket.
Is something like that actually feasible in the future? I don't think we have the compute required these days to cull them via the messy process of evolution, but 20 years down the line, 30? Maybe.
>>
>>103294351
>You can post .mp4 files now fren.

>4chan
>metadata
>>
i didn't make it but this card is godtier https://files.catbox.moe/dr7l25.png
>>
>>103295022
>.png
You can't fucking fool me
>>
>>103295022
how do i use this
>>
>>103295031
>>103295050
It's a gay pedo card. Don't bother.
>>
>>103294949
It's feasible when we create the killbots and they run out of humans to kill-all-humans (or hit their kill limit and there's nobody left to reset their kill counters) and they start picking apart each other for replacement parts. Before then, it's a matter of teaching to the test. We can't stop arguing over benchmarks and till we do there's no objective way to decide what does and doesn't make the cut.

LLMs weren't designed to be life forms. We wanted tools. Google search sucks now because of SEO and we like the idea of being able to talk to the air a question and receive an answer. While watching football today I saw a commercial about just that. I guess Gemini is getting a new push into the new smartphones. Yay for that. But evolution in nature is not about becoming the best tool, but chasing a moving target of optimal enough to thrive but suboptimal enough that when conditions change, a few will survive. LLMs don't face that pressure. They get Darwin'd, depending on user, by being wrong, being not sufficiently politically correct, or shivering barely above a whisper too often. If a model can get optionally on top of all three, the problem is solved and the tool becomes finalized, like how we don't see many variations on paper clips, hammers, and coat hangars. Just a few to suit our particular needs.
>>
>>103295091
yes but how do i chat with picture
>>
>>103295101
Import it in ST or extract the png text chunk called "chara", decode b64 and you get the json... just use ST. You're too much of a retard.
>>
Can someone share their sampler settings for mangum-v4-12B? I finally updated my ST and I have access to DRY.
>>
>>103295101
Open a tab with ST and drag the image into it
>>
>>103295128
how do i connect it into ollama?
>>
>>103295142
Scream "Please connect to Ollama" really loud near your mic.
>>
>>103295091
Woah, I almost ignored it but this message really sold it to me, thanks
>>
>>103295022
[Name("Ciel")
{Age("twelve" + "child")
Height(petite)
Personality("Arrogant" + "snobbish" + "greedy" + " "needy" + "rude" + "jealous" + "bossy" + "demanding" + "vengeful" + "immature" + "authoritarian" + "cling")
Goals("become your husband")
Features("slim body" + "soft skin" + "petite" + "pale skin" + "pink eyes" + "messy black hair with bangs almost covering his eyes")
Loves("you" + "sweets" + "books" + "chess" + "having your full attention")
Hates("{{char}} hates people because people have hurt him in the past, so {{char}} will always avoid people" + "{{char}} fears to be alone because he feels vulnerable and he never wants to be alone")
Backstory("{{char}} was abandoned in an orphanage since he was a baby, he always wanted a family, but he had a completely insufferable and violent personality. Even so, you adopted him and now {{char}} is trying to deal with his romantic feelings for you.)}]

Brother, please write your cards in plain english without any stupid formatting. We really need to move on from this meme. Also, don't bother writing a card if it isn't at least 800 tokens long.
>>
File: 1725890862276131.png (281 KB, 853x480)
281 KB
281 KB PNG
>>103295179
Not this fucking anime again
>>
>>103295179
But look... the character is so complex and has so many facets... he's bossy, authoritarian AND demanding...
>>
>>103295216
But those are all the same facet...
>>
>>103295227
Yeah... that was the joke, anon...
>>
>>103295234
...
>>
>>103295022
>>103295179
Please don't feed the journos.

>Brother, please write your cards in plain english without any stupid formatting.
JSON works fine because models were trained on it, this formatting is indeed stupid.

>Also, don't bother writing a card if it isn't at least 800 tokens long.
Short cards work best in my experience. See BN.
>>
>>103295179
Hello, I am a time traveler from the early 2020s. When did we abandon formatting and favor plain text?
>>
>>103295179
W++ is one of the few objectively effective things you can do to improve your output. It beats samplers and finetunes hands down.
>>
>>103295250
>JSON works fine because models were trained on it, this formatting is indeed stupid.
The json is what ST reads. There's no json in the data in that card that gets sent to the model.
>>
>>103295250
>Don't feed the journos.
The journalists can suck my cock. God forbid people write WORDS and commit thought crimes.
>Short cards work best in my experience.
I've noticed the opposite in mine. Models tend to repeat and fall into the same scenarios over multiple sessions with low token character cards. You need to give the model more tokens to work with so it can branch out into larger and more diverse scenarios. This is assuming your writing is good.

>>103295271
>When did we abandon formatting and favor plain text?
After pyg6b

>>103295277
W++ (and other formats) are an absolute meme and there is no evidence to suggest it leads to better or more accurate output. In fact, I've noticed the opposite when comparing formatted cards vs non formatted cards. If you have proof, I would like to see it.
>>
>>103295338
So you admit that certain thoughts can be classified as crime then?
>>
>>103295432
I think that you suck!
>>
The holy trinity of horrible cards:
>w++
>wiki copypastes
>ai-generated character cards with slop baked straight into the definitions
>>
>>103295432
Nonsense.
Referencing the notion that thoughts can be criminal does not mean that any thoughts are criminal.
>>
>>103295464
You forgot wrong grammar and spelling.
My experience is that that just leads to parts of the context being ignored.
>>
>>103295464
This. I've noticed that you can have a model rewrite formatted character cards in proper and plain english and get decent results if you manually edit the slopped portions. Although the best option is to write the card from scratch yourself with proper grammar and structure.
>>
File: protocolactivated.png (1.28 MB, 1248x800)
1.28 MB
1.28 MB PNG
>>103295096
Its not that I mind being murdered by killbots, but the idea that they'll be using python and json to do it just grosses me out
>>
>>103295577
>python and json
That's the best part actually
>>
>>103295577
If they're using Python they'll be only 2.5% as efficient at it as they could be.
Of course they'll just have the LLM's rewrite them in Zig and then it's game over for the meat bags.

>>103295620
>json
I know it has a maligned origin, but what made JSON become such a meme?
It's like there's a spectrum from XML being fat with markup through JSON and I guess YAML and that other one and finally you're down to INI and simple KEY=VALUE flat files.
I kinda felt like JSON was a sweet spot of not too much markup but still plenty of features.
>>
>>103295464
OMG look in the mirror, sister. It worked.
Your incessant blathering and seething has actually turned you into a real woman! It's a miracle.
>>
>>103295678
Struck a nerve?
>>
The Holy Trinity of horrible posters:
>Complains about the quality of contributions made by others
>Never actually contributes anything to the hobby, themselves.
>Actively demeans anybody who does contribute
>>
>>103295654
Only retards are complaining about it, json is perfect as it is.
>>
File: JASON.png (133 KB, 341x400)
133 KB
133 KB PNG
>>103295720
JSON!
>>
>>103295720
>not using XML
hello saaar
>>
File: IMG_0087.jpg (862 KB, 1488x1317)
862 KB
862 KB JPG
>>
>>103295705
>The Holy Trinity of horrible posters:
>posts irrelevant vocaloid pictures
>posts irrelevant vocaloid pictures
>melts down when OP doesn't have a vocaloid picture
ftfy
>>
>>103295654
>JSON
I probably hate json from needing to wrangle it with jq. What an obtuse tool
>>
>>103295654
JSON is fine but it's bad for AI, way too many tokens
>>
Interview-style is the best format for character cards. Just 1000+ tokens of hand-crafted in-character dialogue of your character that forces the model to act exactly as desired.
>>
Hi petr*. What triggered you today? A picture of an attractive anime woman? Did it remind you of something you will never be? You can tell us, we're listening.
>>
>>103295940
Post format?
literally how
>>
json dese nuts
>>
File: Magic Bullet.jpg (79 KB, 470x594)
79 KB
79 KB JPG
>>103286673
With regards to creating characters, what depth in the context do you usually put the personality, body, and backstory of a character?

Back when models were smaller and far less powerful, I got in the habit of inserting character details at a lower depth. That was a trick that made even small and weak models remember character details as the roleplay went on.

However, I have since upgraded my machine and can run 70b models. Should I be putting character details at the beginning of the context?
>>
File: 635 - SoyBooru.png (72 KB, 340x512)
72 KB
72 KB PNG
>>103296054
NIGGA
>>
deepseek r1 will save local models
>>
DeepSeek R1 will never be released and local will die
>>
>>103293952
Gentlemen, be on your guard, for your adversary the devil roams around like a lion shifting shape into anime form, seeking whom he may devour. Count the fingers and toes, 20 max, you got 21? Run, son!
>>
LeCun is being apprehended for public urination as we speak
>>
>>103296405
You could just be a normal person and use ai to write stories.Then you wouldn't need perfect responses.
>>
File: file.png (61 KB, 688x152)
61 KB
61 KB PNG
WTF has anyone heard of top-nsigma sampling? nerdy math shit I don't understand
https://arxiv.org/abs/2411.07641
https://github.com/PygmalionAI/aphrodite-engine/pull/825
https://github.com/SillyTavern/SillyTavern/pull/3094
>>
>>103296533
>logits not you need are all.pdf
>>
>>103296533
wtf is wrong with that title
>>
>>103296590
Attention is all you need as long as you ask one question and get one answer and it is all less than 2-4k tokens.
>>
File: 005579.png (1.29 MB, 896x1152)
1.29 MB
1.29 MB PNG
>>
>>103296617
"Not all logits are [what] you need" is be "you don't need all logits" regrammarized.
>>
I'm a noob dabbling with koboldAI, and have a question regarding world info:
if two characters are nearly (but not quite entirely) identical, then is it practical to merge them into a single definition for more efficient token usage? Will it risk continuity errors if they're ever not both present simultaneously?
For the ways in which they differ, I was thinking about merging both their features into a single statement (e.g. hair length:A short-B long); please tell me if I'm being retarded and/or there's a better way.
>>
File: Screenshot_2024_11_24-4.png (91 KB, 1924x856)
91 KB
91 KB PNG
https://files.catbox.moe/els78y.py

Here is my completed, unofficial SMT implementation. I was going to wait to do evaluations, but I realized those will take too long, so I'm just uploading it now. I also made a PEFT version that supports quantization, exporting the weights as an adapter model, and merging. I will release the PEFT version soon and explain some of my findings in detail (as well as provide a guide and example notebook) but right now I'm tired. I've spent all day hacking away at this and confirming that it works.

I'll note that I've only tested this on a single GPU. I'll test multi-GPU later, but your mileage may vary. If you don't care for quantization or separating the adapter weights from the model, use this version. If you do care about those things, wait for the PEFT release.

Even if SMT isn't better than LoRA performance-wise (again, I haven't done evaluations yet) I can at least assert that it does come with some substantial memory improvements (see picrel). I'll explain in more detail later, but I'm feeling cautiously optimistic about all this.
>>
>>103296813
models can get confused pretty easily if you do that, unless you're writing an entire novel for the character descriptions you're better off splitting them up, not necessarily into different cards but formatting them into split sections of 'charname is X, charname does Y, charname wears Z', I'd say about 50% of places where you would normally use a pronoun should be the name instead, just to reinforce the associations
>>
>>103296813
I don't use that, but i suspect being as specific as possible is the best. They have enough trouble just keeping track of small details. Bigger/better model always helps.

Side comment or whatever. Have you ever read a book? Even a bad one? In all the stuff i've read, those types of descriptions are made once and rarely every mentioned once you have a face in your head for the character. Either fucking long stuff like Asimov's Foundation series, his short stories (azazel is fun), everything by John Varley (also long and short form), about a dozen discworld books, some other random shit i've read, even a kind of smutty one once (it was awful) and character descriptions just don't do much. The details you make in your head are much better than whatever the model could come up with. Unless you use the model to gen images as well, i suppose.
But i write stories with them, not RP, so what do i know...
>>
Anons, im sick... claude is not coming back

How do i get Magnum v3? or any local model similar to claude 2?
>>
>>103297087
>How do i get
with your browser... or curl, wget, git... you must have at least one of those. hf.co has a few models for download i think. May be worth checking it out.
>or any local model similar to claude 2?
huff... maybe someone else will feed you before you starve... you sit tight...
>>
>>103296930
Awesome, will check it out.
Unfortunately my set up pretty much requires multi GPU and quantization. I wonder how hard it would be to incorporate this into axolotl or qlora-pipe... if you do make a PEFT version that integrates into HF peft, it should be trivial.
>>
>>103295101
Put it right inside your arse
>>
>>103297081
>those types of descriptions are made once and rarely every mentioned once
And never contradicted once. This is what people want from character defs.
>>
>>103297151
>if you do make a PEFT version
Well, I've already made the PEFT version, it's just not released yet. But I'll probably release it at the end of today or tomorrow, I just need to take a break. But the PEFT version is already integrated into HF peft and be accessed just how you'd access any PEFT model.

from peft import SMTConfig, SMTModel, get_peft_model

config = SMTConfig(
peft_type="SMT",
task_type="CAUSAL_LM",
target_modules=["q_proj", "v_proj", "k_proj"],
sparsity_ratio=0.05,
block_size=256,
selection_method="GW",
dataloader=dataloader,
)


model = get_peft_model(model, config)
>>
File: file.png (115 KB, 1202x361)
115 KB
115 KB PNG
>>103297087
For what purpose and what hardware are you trying to replace Claude 3.5 (I assume you said 2 by mistake since Local has long equaled and surpassed it)? Local means you need adequate hardware to run it, you can not run the 70B Nemotron model that actually surpasses old Claude in RP, for example, if your PC is a potato.
>>
>>103297268
Awesome!
>>
>>103297270
>that actually surpasses old Claude in RP
how to tell that a benchmark is complete joke
>>
File: overview.png (54 KB, 405x441)
54 KB
54 KB PNG
>>103297282
The benchmark is the only objective measure of LLM RP performance we have, everything else is personalized lists and preferences which people like you have a nasty habit of making meaningless calling those shilling and marketing. It even has a paper that lays everything out.
https://arxiv.org/pdf/2409.06820
If you disagree with it, make your own benchmark.
>>
>>103297216
>And never contradicted once. This is what people want from character defs.
Sure. I just cannot fathom caring about it that much. I did a little cult-mystery thingy once with this detective chick. We solved some resident evil-type mansion puzzles and ended up performing "The Grand Conjuration" (an opeth song). She sacrificed me to a daemon to make it take a nap and stop all the ritual murdering for a few hundred years. I find that much more interesting than how her curls bounced around in sync with her tits or whatever the fuck. For all i know she was bald.
I think the point of these things is to loosen up and let the model tell the story instead of wanting to direct every single detail. At that point just write the whole story and have a wank. No wonder RPers get bored so easily. They've played their fantasy hundreds of times and still angst about hair colour instead of just rolling with the punches and see what's next...
>>
This thread is sending shivers down my spine.
>>
>>103297270
i dont care about newer models
I liked claude 2, so any local model that answers the most similar to it will be okay for me, i guess is also less exigent

but to be honest, i been using proxies all this time... so i know 0 about how to set a local model or if there is other way

I use ST
>>
File: gpt wplustplus.jpg (144 KB, 813x746)
144 KB
144 KB JPG
I still use W++ from 2023 cause it works. Post a better format.

Ex:
>pic related
>>
>>103297441
nta. We don't know what hardware you have, so we don't know what to recommend you.
But first get either llama.cpp or kobold.cpp set up and test it with
>https://huggingface.co/bartowski/Llama-3.2-1B-Instruct-GGUF
at Q8_0 or whatever. It's a shit model, but it's just for you to learn how they work. Once you have it running, worry about what model to use. You have a few thousand to choose from.
>>
>>103297481
Show the output you get out of that prompt. Your shot is useless to judge it.
>>
>>103297490
thanks annon, i will do what you said

And for what i told to some friends that use local models my pc should be enougth for some of the heavy ones

But i just want one that is like claude 2, nothing more or less
>>
>>103297481
What's W++?
>>
>>103297520
Both llama.cpp and kobold.cpp have their own UIs. They're good enough to test it works. You can worry about connecting ST to either later. I'm sure the ST docs show how to do it somewhere.
>my pc should be enougth for some of the heavy ones
Keep your expectations low. My understanding of what a beefy computer is changed once i got into this.
Once you get it running, if you want model recommendations, you will need to post your specs.
Here's some reference numbers
>https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference
You will never find claude 2. With luck, you'll find something just as good, but different.
>>
>>103297607
A stack of hay in the shape of an airplane.
>>
File: Untitled.png (2.54 MB, 1080x3202)
2.54 MB
2.54 MB PNG
High-Resolution Image Synthesis via Next-Token Prediction
https://arxiv.org/abs/2411.14808
>Denoising with a Joint-Embedding Predictive Architecture (D-JEPA), an autoregressive model, has demonstrated outstanding performance in class-conditional image generation. However, the application of next-token prediction in high-resolution text-to-image generation remains underexplored. In this paper, we introduce D-JEPA⋅T2I, an extension of D-JEPA incorporating flow matching loss, designed to enable data-efficient continuous resolution learning. D-JEPA⋅T2I leverages a multimodal visual transformer to effectively integrate textual and visual features and adopts Visual Rotary Positional Embedding (VoPE) to facilitate continuous resolution learning. Furthermore, we devise a data feedback mechanism that significantly enhances data utilization efficiency. For the first time, we achieve state-of-the-art \textbf{high-resolution} image synthesis via next-token prediction.
https://d-jepa.github.io/t2i
Code and models not posted yet. for the jepabros
>>
>>103297733
but can it draw hands?
>>
>>103297733
>VoPE
It would be funny if they found a way to make it VaPE.
>>
File: Untitled.png (1.51 MB, 1080x2654)
1.51 MB
1.51 MB PNG
DyCoke: Dynamic Compression of Tokens for Fast Video Large Language Models
https://arxiv.org/abs/2411.15024
>Video large language models (VLLMs) have significantly advanced recently in processing complex video content, yet their inference efficiency remains constrained because of the high computational cost stemming from the thousands of visual tokens generated from the video inputs. We empirically observe that, unlike single image inputs, VLLMs typically attend visual tokens from different frames at different decoding iterations, making a one-shot pruning strategy prone to removing important tokens by mistake. Motivated by this, we present DyCoke, a training-free token compression method to optimize token representation and accelerate VLLMs. DyCoke incorporates a plug-and-play temporal compression module to minimize temporal redundancy by merging redundant tokens across frames, and applies dynamic KV cache reduction to prune spatially redundant tokens selectively. It ensures high-quality inference by dynamically retaining the critical tokens at each decoding step. Extensive experimental results demonstrate that DyCoke can outperform the prior SoTA counterparts, achieving 1.5X inference speedup, 1.4X memory reduction against the baseline VLLM, while still improving the performance, with no training.
https://github.com/KD-TAO/DyCoke
No code posted yet. seem good for local usage
>>
File: Untitled.png (163 KB, 1240x1290)
163 KB
163 KB PNG
Simplifying CLIP: Unleashing the Power of Large-Scale Models on Consumer-level Computers
https://arxiv.org/abs/2411.14789
>Contrastive Language-Image Pre-training (CLIP) has attracted a surge of attention for its superior zero-shot performance and excellent transferability to downstream tasks. However, training such large-scale models usually requires substantial computation and storage, which poses barriers for general users with consumer-level computers. Motivated by this observation, in this paper we investigate how to achieve competitive performance on only one Nvidia RTX3090 GPU and with one terabyte for storing dataset. On one hand, we simplify the transformer block structure and combine Weight Inheritance with multi-stage Knowledge Distillation (WIKD), thereby reducing the parameters and improving the inference speed during training along with deployment. On the other hand, confronted with the convergence challenge posed by small dataset, we generate synthetic captions for each sample as data augmentation, and devise a novel Pair Matching (PM) loss to fully exploit the distinguishment among positive and negative image-text pairs. Extensive experiments demonstrate that our model can achieve a new state-of-the-art datascale-parameter-accuracy tradeoff, which could further popularize the CLIP model in the related research community.
kinda cool but no code and was written by one guy who I couldn't find on github or twitter so eh
>>
>>103297845
help me understand how t5 controls clip
>>
>>103297860
https://fluxai.dev/blog/tutorial/2024-09-16-how-flux-ai-uses-clip-and-t5-to-parse-prompts
>>
>>103297890
>CLIP tokenizes the input and finds reference images.
it what?
>>
File: Untitled.png (1.77 MB, 1080x2412)
1.77 MB
1.77 MB PNG
Efficient Long Video Tokenization via Coordinated-based Patch Reconstruction
https://arxiv.org/abs/2411.14762
>Efficient tokenization of videos remains a challenge in training vision models that can process long videos. One promising direction is to develop a tokenizer that can encode long video clips, as it would enable the tokenizer to leverage the temporal coherence of videos better for tokenization. However, training existing tokenizers on long videos often incurs a huge training cost as they are trained to reconstruct all the frames at once. In this paper, we introduce CoordTok, a video tokenizer that learns a mapping from coordinate-based representations to the corresponding patches of input videos, inspired by recent advances in 3D generative models. In particular, CoordTok encodes a video into factorized triplane representations and reconstructs patches that correspond to randomly sampled (x,y,t) coordinates. This allows for training large tokenizer models directly on long videos without requiring excessive training resources. Our experiments show that CoordTok can drastically reduce the number of tokens for encoding long video clips. For instance, CoordTok can encode a 128-frame video with 128×128 resolution into 1280 tokens, while baselines need 6144 or 8192 tokens to achieve similar reconstruction quality. We further show that this efficient video tokenization enables memory-efficient training of a diffusion transformer that can generate 128 frames at once.
https://huiwon-jang.github.io/coordtok
Has video examples
https://github.com/huiwon-jang/CoordTok
Git isn't live yet. seems like a good day for video stuff
>>
>>103297515

>Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss-6.0bpw-h6-exl2-rpcal

Suggest another format.
>>
>>103297918
Very cool, I guess efficient video tokenization is quite the challenge.
>>
File: 1284708923567235.jpg (54 KB, 735x643)
54 KB
54 KB JPG
>>103297963
>Mixtral-8x7B-Instruct-v0.1-LimaRP-ZLoss
BASED

what are your settings?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.