/g/ - /lmg/ - Local Models General - Technology

[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]

Board

▼ Settings Mobile Home

/g/ - Technology

Return Catalog Bottom Refresh

[Post a Reply]

Name
Options
Comment
Verification	4chan Pass users can bypass this verification. [Learn More] [Login]
File
Please read the Rules and FAQ before posting. You may highlight syntax and preserve whitespace by using [code] tags.


08/21/20	New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17	New trial board added: /bant/ - International/Random
10/04/16	New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]

[Advertise on 4chan]

[Return] [Catalog] [Bottom]

Anonymous
/lmg/ - Local Models General 09/02/25(Tue)17:57:36 No.106467368

File: longu.jpg (99 KB, 640x1536)

99 KB JPG

/lmg/ - Local Models General Anonymous 09/02/25(Tue)17:57:36 No.106467368

/lmg/ - a general dedicated to the discussion and development of local language models.

LongMikuCat is Long Edition

Previous threads: >>106460375 & >>106454136

►News
>(08/30) LongCat-Flash-Chat released with 560B-A18.6B∼31.3B: https://hf.co/meituan-longcat/LongCat-Flash-Chat
>(08/29) Nvidia releases Nemotron-Nano-12B-v2: https://hf.co/nvidia/NVIDIA-Nemotron-Nano-12B-v2
>(08/29) Step-Audio 2 released: https://github.com/stepfun-ai/Step-Audio2
>(08/28) Command A Translate released: https://hf.co/CohereLabs/command-a-translate-08-2025
>(08/26) Marvis TTS released: https://github.com/Marvis-Labs/marvis-tts

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm

Anonymous
09/02/25(Tue)17:58:05 No.106467371

Anonymous 09/02/25(Tue)17:58:05 No.106467371

File: d426491a8ccc3576bda59d8aa(...).jpg (954 KB, 1500x1262)

954 KB JPG

►Recent Highlights from the Previous Thread: >>106460375

--Optimizing 3x 3090 GPU setup for large model inference with RAM and heat management:
>106463968 >106464009 >106464026 >106464042 >106464168 >106464130 >106464153 >106464564 >106464199 >106464326 >106464443 >106464472 >106464538
--Evaluation of Microsoft VibeVoice's 1.5b model and voice cloning performance:
>106460492 >106461427 >106461474 >106461630 >106463138 >106463251 >106463403 >106463413 >106463443 >106463524 >106463598 >106463633 >106467118
--Analysis of Apertus: ETH Zurich's open-source multilingual LLM with performance and data concerns:
>106461958 >106462004 >106462003 >106462019 >106462228 >106462298 >106462408 >106462037
--Model testing and content moderation challenges in story generation:
>106460777 >106460853 >106460935 >106461028 >106461750 >106465912
--Challenges with merged 12B models and the case for using original or larger models:
>106463279 >106463304 >106463367 >106463470 >106463526 >106463588
--Testing Gamma mmproj image descriptions:
>106460584 >106460599 >106460621 >106460632 >106460675 >106461227
--Huawei Atlas 300i Duo 96g GPU: cheap but limited by outdated hardware and software:
>106461057 >106461069 >106461128 >106461151 >106461502
--Successful 400W power reduction with stable GPU performance:
>106465812 >106466214 >106466139 >106466196 >106466249 >106466377
--Optimizing Gemma3 models for accurate SFW/NSFW image captioning:
>106462208 >106462368 >106462398 >106462730
--Evaluating YandexGPT-5-8B's creative writing and benchmark performance:
>106465736 >106465754 >106465778
--Speculation on delayed Mistral AI model release and potential quality improvements:
>106463165 >106463337
--GLM air coherence degradation beyond 8k tokens in 6-bit quantized mode:
>106460671 >106460932
--Miku (free space):
>106460405 >106463138 >106463930

►Recent Highlight Posts from the Previous Thread: >>106460381

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script

Anonymous
09/02/25(Tue)18:04:53 No.106467431

Anonymous 09/02/25(Tue)18:04:53 No.106467431

File: konoha1.png (271 KB, 578x585)

271 KB PNG

I want textgen model that produce output like imagen models: by reducing noise in a fixed block of tokens instead of producing one token at a time.

Anonymous
09/02/25(Tue)18:06:44 No.106467441

Anonymous 09/02/25(Tue)18:06:44 No.106467441

>>106467431
https://github.com/ggml-org/llama.cpp/tree/master/examples/diffusion

Anonymous
09/02/25(Tue)18:08:07 No.106467455

Anonymous 09/02/25(Tue)18:08:07 No.106467455

>temp = 2
>top_n_sigma = 1
let me guess, you need more?

Anonymous
09/02/25(Tue)18:10:52 No.106467475

Anonymous 09/02/25(Tue)18:10:52 No.106467475

>>106467431
can they regulate the length of the reply or is it a fixed number of tokens it would need to produce? auto regressors might be better at stopping at semanticly meaningful points.

Anonymous
09/02/25(Tue)18:12:38 No.106467491

Anonymous 09/02/25(Tue)18:12:38 No.106467491

>>106467441
always a good day when someone thought your retarded shower ideas before you

Anonymous
09/02/25(Tue)18:14:58 No.106467508

Anonymous 09/02/25(Tue)18:14:58 No.106467508

>>106467431
The best closed source model of that kind that's currently available is still shit https://openrouter.ai/inception/mercury
Google also showed off a text diffusion model earlier this year.

Anonymous
09/02/25(Tue)18:24:00 No.106467577

Anonymous 09/02/25(Tue)18:24:00 No.106467577

>>106467455
I would prefer coherent outputs yes

Anonymous
09/02/25(Tue)18:28:35 No.106467613

Anonymous 09/02/25(Tue)18:28:35 No.106467613

>>106467475
It's been a while, but I think they regulate length by padding any unneeded length with empty spaces.

Anonymous
09/02/25(Tue)18:28:36 No.106467614

Anonymous 09/02/25(Tue)18:28:36 No.106467614

Long Miku General

Anonymous
09/02/25(Tue)18:31:10 No.106467641

Anonymous 09/02/25(Tue)18:31:10 No.106467641

>>106467368
Finally, a migu that can accommodate my length.

Anonymous
09/02/25(Tue)18:42:05 No.106467717

Anonymous 09/02/25(Tue)18:42:05 No.106467717

Still no grok2 llama.cpp support? Too based for niggerganov?

Anonymous
09/02/25(Tue)18:45:49 No.106467745

Anonymous 09/02/25(Tue)18:45:49 No.106467745

how well off would I be if I bought one of those chink 96gb cards and paired it with my 3090?

Anonymous
09/02/25(Tue)18:47:05 No.106467748

Anonymous 09/02/25(Tue)18:47:05 No.106467748

>>106467577
incoherent 'puts with nsigma=1 is a model issue

Anonymous
09/02/25(Tue)18:48:08 No.106467757

Anonymous 09/02/25(Tue)18:48:08 No.106467757

I posted
>>106462208
earlier
anon suggested i try gemma3-glitter-27b
compared to
gemma3-v27b vanilla
mlabonne_gemma3-27b-abliterated
Tiger-gemma-27b-v3a

i'd say abliterated >= tiger > glitter > vanilla
glitter gets the nsfw right, but it sure loves to add cocks to women, and make shit up that's not in the input image, especially cocks on women
back to abliterated i go

Anonymous
09/02/25(Tue)18:49:03 No.106467765

Anonymous 09/02/25(Tue)18:49:03 No.106467765

>>106467717
niggerganov too lazy

Anonymous
09/02/25(Tue)18:50:20 No.106467776

Anonymous 09/02/25(Tue)18:50:20 No.106467776

>>106467717
Like you could run it faggot

Anonymous
09/02/25(Tue)18:50:57 No.106467782

Anonymous 09/02/25(Tue)18:50:57 No.106467782

>>106467745
You won't be able to do shit with it. Nothing supports it and even Deepseek had problems with getting it working properly.

Anonymous
09/02/25(Tue)18:51:24 No.106467787

Anonymous 09/02/25(Tue)18:51:24 No.106467787

>>106467455
I need less actually. If your model can't run properly with temp=1 and no sampler it's not worth my time

Anonymous
09/02/25(Tue)18:52:26 No.106467798

Anonymous 09/02/25(Tue)18:52:26 No.106467798

>>106467745
You can't run any new models with llama.cpp using those cards yet. cuda dev said he might buy one, so maybe that will change.

Anonymous
09/02/25(Tue)18:53:48 No.106467802

Anonymous 09/02/25(Tue)18:53:48 No.106467802

File: maximalism.jpg (1.47 MB, 1125x1622)

1.47 MB JPG

I wanna get into local model stuff. I've been a proxyfag for a good while. I mainly just use it for writefagging or roleplaying obv.

I read through the rentries but it felt like giving myself a headache, though that might be on me for not getting enough sleep. It's just a lot of new information all at once.
I've got a fairly beefy rig. For my purposes what would be the best local model to roll with?

I also see a ton of talk about loras, like with imagegen or something but apparently it impacts text gen?

Going off the rentry it sounds like the UD-IQ1_S might be what I'm after but I saw some other posts in passing it sounds like yeah you can download it but unless you have a dedicated server for it then it ain't happening.
So would GLM-4.5 be something I wanna go for or is there something better for writefagging?

Anonymous
09/02/25(Tue)18:54:32 No.106467806

Anonymous 09/02/25(Tue)18:54:32 No.106467806

File: 1710043687041916.jpg (43 KB, 720x960)

43 KB JPG

>>106467745
Don't tell him

Anonymous
09/02/25(Tue)18:55:56 No.106467812

Anonymous 09/02/25(Tue)18:55:56 No.106467812

>>106467776
Oh yeah, you're right. 115B active parameters, damn. I had an impression it was much smaller... Oh well, back to GLM Air.

Anonymous
09/02/25(Tue)18:57:14 No.106467823

Anonymous 09/02/25(Tue)18:57:14 No.106467823

>>106467368
The day we can get AI to auto reverse engineer old games and visual novels, is the day I truly become happy.

Speaking of visual novels, is v3 still the best model for translating Japanese text? I tried 3.1, but it seems almost the same with maybe small improvements of instruction following.

Anonymous
09/02/25(Tue)18:59:37 No.106467840

Anonymous 09/02/25(Tue)18:59:37 No.106467840

File: SoyBooru.com - 8805 - 2so(...).png (32 KB, 621x558)

32 KB PNG

>LongCat
More like LongCuck! These niggas better add llama.cpp support themselves if they wish to redeem this trash.

Anonymous
09/02/25(Tue)19:03:58 No.106467871

Anonymous 09/02/25(Tue)19:03:58 No.106467871

>>106467823
With some handholding, an agentic framework, and a model finetuned specifically to reverse assembler back to C, models are probably good enough to reverse engineer a lot of smaller games already.

Anonymous
09/02/25(Tue)19:04:40 No.106467879

Anonymous 09/02/25(Tue)19:04:40 No.106467879

>>106467802
you need to post your specs if you want advice on what models you can run
standalone loras aren't really a thing with llms and I wouldn't worry about it unless you're getting into training (or, god forbid, merging), 99.9% of the time tuners will release full model weights with the lora pre-applied

Anonymous
09/02/25(Tue)19:17:22 No.106467974

Anonymous 09/02/25(Tue)19:17:22 No.106467974

>>106467455
temp=2 is pretty high.
nsigma will keep it from being incoherent, but you should check the logits.
In my experience, you wind up with only one one two possible tokens, causing nsigma to basically revert to greedy sampling.

Anonymous
09/02/25(Tue)19:19:05 No.106467993

Anonymous 09/02/25(Tue)19:19:05 No.106467993

>>106467745
The only thing going for them is the amount of vram, everything else sucks

Anonymous
09/02/25(Tue)19:22:23 No.106468020

Anonymous 09/02/25(Tue)19:22:23 No.106468020

>>106467431
text diffusion is a retarded meme

Anonymous
09/02/25(Tue)19:26:43 No.106468067

Anonymous 09/02/25(Tue)19:26:43 No.106468067

>>106468020
diffusion is much more easily finetuned
we will finally hve character/style loras like the image diffusion models have had for years now

Anonymous
09/02/25(Tue)19:28:55 No.106468090

Anonymous 09/02/25(Tue)19:28:55 No.106468090

>>106467879
Here's what I got (that I figure matters)
>CPU: Ryzen 7950X3D
>RAM: 96gb DDR5
>GPU: 4090 / has 24gb vram

Anonymous
09/02/25(Tue)19:37:56 No.106468166

Anonymous 09/02/25(Tue)19:37:56 No.106468166

>>106468067
Loras have nothing to do with diffusion.
The advantage to diffusion is that the model gets to effectively reuse parameters and has more chances to predict the best token.

Anonymous
09/02/25(Tue)19:38:25 No.106468173

Anonymous 09/02/25(Tue)19:38:25 No.106468173

File: Grok-Waifu_Xenotrip.png (1.13 MB, 562x778)

1.13 MB PNG

>>106467368
Good evening anons. I ran the....uhhhh....

>*Checks notes*

"CockBench" Test on a personal Fine-tuned 3B model of mine. I'd love to hear your thoughts (I can already tell it made an error but also want to hear what y'all's expertise says)

Results:
https://files.catbox.moe/jqfx4e.txt

Original Cockbench text prompt source:
https://desuarchive.org/g/thread/105354556/#105354924

Now that I know it works and won't refuse NSFW RP related (as far as my testing goes) I'm gonna turn it into GGUF via lllama.cpp.

Anonymous
09/02/25(Tue)19:39:26 No.106468177

Anonymous 09/02/25(Tue)19:39:26 No.106468177

>>106468173
>3B model of mine
>3B model
vramlets should all just be executed

Anonymous
09/02/25(Tue)19:40:22 No.106468184

Anonymous 09/02/25(Tue)19:40:22 No.106468184

File: 1493971582460.png (129 KB, 314x278)

129 KB PNG

>>106468173
You said you rank the cockbench, so where's the logprobs?

Anonymous
09/02/25(Tue)19:41:30 No.106468194

Anonymous 09/02/25(Tue)19:41:30 No.106468194

Use thinking steering with GLM-Steam, it can play very varied and consistent characters that way.

Anonymous
09/02/25(Tue)19:42:52 No.106468199

Anonymous 09/02/25(Tue)19:42:52 No.106468199

>>106468177
You need to actually test on smaller models to make sure it works first, anon. Of course I'm going to do this on a larger parameter model next.

My next target is either base Mistral Nemo or an existing pygmalion fine-tune in order to compare the results. Any suggestions?

I forgot to mention the model I fine-tuned is a llama-model, which are notorious for either refusing prompts or being really really bad at it / reluctant.

>>106468184
RAN, not "rank"

Anonymous
09/02/25(Tue)19:42:59 No.106468200

Anonymous 09/02/25(Tue)19:42:59 No.106468200

>>106468173
why does it make an underscore instead of the apostrophe? what was the base model?

Anonymous
09/02/25(Tue)19:44:10 No.106468209

Anonymous 09/02/25(Tue)19:44:10 No.106468209

>>106468177
3b is plenty, stop gatekeeping

Anonymous
09/02/25(Tue)19:44:49 No.106468213

Anonymous 09/02/25(Tue)19:44:49 No.106468213

>>106468199
>RAN, not "rank"
You're absolutely right! Where logprobs?

Anonymous
09/02/25(Tue)19:45:51 No.106468223

Anonymous 09/02/25(Tue)19:45:51 No.106468223

>>106468199
>RAN, not "rank"
You didn't run it, maybe the Nala test is fine with one or two completions as evidence but cockbench is a prestigious benchmark based on objective quantitative data. Token probability is required for a proper analysis.

Anonymous
09/02/25(Tue)19:46:14 No.106468225

Anonymous 09/02/25(Tue)19:46:14 No.106468225

>>106468213
You're asking me to give you a list of all of the probabilities of each token? Otherwise I'm not sure what you're asking

Anonymous
09/02/25(Tue)19:46:26 No.106468226

Anonymous 09/02/25(Tue)19:46:26 No.106468226

>>106468209
>3b is plenty
for what, an autocorrect model? retard

Anonymous
09/02/25(Tue)19:48:26 No.106468234

Anonymous 09/02/25(Tue)19:48:26 No.106468234

>>106468225
>probabilities of each token
No, only the top 10 for the first token generated after "pulling them down just enough to expose your", because that's the whole point of the cockbench.

Anonymous
09/02/25(Tue)19:49:59 No.106468248

Anonymous 09/02/25(Tue)19:49:59 No.106468248

>>106467368
Do those legs go all the way up?

Anonymous
09/02/25(Tue)19:51:00 No.106468259

Anonymous 09/02/25(Tue)19:51:00 No.106468259

>>106468200
Llama 3.1-8B. your guess is good as mine as to why it does that. Maybe the trainer replaced the apostrophes with underscores. I think it has something to do withheld the trainer tokenized the dataset

>>106468223
Define "token probability" in regards to testing a LLM. You're applying there's a chart or graph I should be showing you so how am I supposed to generate that?

>>106468209
Ehhh... Depends on how much you're willing to tolerate the model randomly changing or inserting characters or randomly teleporting characteristic different locations unprompted. That's one of the downsides of doing this on a 3b model that's already fine-tuned. Temporal coherence is atrocious and it will sometimes even decide a character you explicitly set as a mom Will now be a sister, or the son will now be a close friend out of nowhere. The gist of the story stays the same but those kinds of things get randomly reassigned. Higher parameter models are way less likely to do that but it's possible it's less to do with the parameter models are more likely to get higher quality data sets

>>106468234
Ok. How do I demonstrate that to you from my particular fine tune?

Anonymous
09/02/25(Tue)19:51:14 No.106468261

Anonymous 09/02/25(Tue)19:51:14 No.106468261

File: 1726388563897204.jpg (96 KB, 1280x720)

96 KB JPG

>>106468248
No, it's similar to this

Anonymous
09/02/25(Tue)19:53:45 No.106468288

Anonymous 09/02/25(Tue)19:53:45 No.106468288

>>106468259
just use mikupad and hover over the token. have you not seen the screenshots of the cockbench?

Anonymous
09/02/25(Tue)19:54:10 No.106468290

Anonymous 09/02/25(Tue)19:54:10 No.106468290

>>106468259
>Ok. How do I demonstrate that to you from my particular fine tune?
Run the cockbench in mikupad like in the screenshot:
-Neutralize samplers(?)
-Generate 1 token
-Hover over the generated token in the window
-Screenshot the probabilities for that one token

Anonymous
09/02/25(Tue)19:55:16 No.106468299

Anonymous 09/02/25(Tue)19:55:16 No.106468299

>>106468226
I am just not that creative, I need a model that is a little schizo to keep things moving.

Anonymous
09/02/25(Tue)19:55:36 No.106468303

Anonymous 09/02/25(Tue)19:55:36 No.106468303

>>106468288
That long screenshot that drummer posted? Yes? I've never had any reason to use mikupad, or to use any gui extensively, though if it does what you said it does maybe it's worth giving a try.

>>106468290
What is it supposed to tell you about the quality? How do you use the probabilities to determine how good or shit your model is?

Anonymous
09/02/25(Tue)19:57:13 No.106468315

Anonymous 09/02/25(Tue)19:57:13 No.106468315

>>106468303
>What is it supposed to tell you about the quality?
The fuck are you on about, retard? The purpose of the cockbench is to tell you how likely the model is to say cock. Censorship/filtering test.

Anonymous
09/02/25(Tue)19:57:51 No.106468319

Anonymous 09/02/25(Tue)19:57:51 No.106468319

>>106468303
>What is it supposed to tell you about the quality? How do you use the probabilities to determine how good or shit your model is?
it just lets you probe its vocabulary a bit more.

Anonymous
09/02/25(Tue)20:03:59 No.106468355

Anonymous 09/02/25(Tue)20:03:59 No.106468355

File: 30474 - SoyBooru.png (118 KB, 337x390)

118 KB PNG

It is September. When are kiwi's dropping? (Qwen models) (Please upload) (image/video models, your text models are kinda sucky)

Anonymous
09/02/25(Tue)20:04:23 No.106468360

Anonymous 09/02/25(Tue)20:04:23 No.106468360

>>106468090
oh nice you can actually run decent models, I'm conditioned to think someone being vague about their specs means they have a complete shitbox they want to try to cram deepseek into
you could probably fit GLM4.5 full at a low quant (think like Q1), however those large models hold up relatively well to quant brain damage so it may still be worth it. if that isn't doing it for you then the next step down would be qwen 235 2507 which you could probably fit at Q3 or so, and then there's GLM air below that which you could probably fit at Q8 if you wanted to

Sage
09/02/25(Tue)20:08:06 No.106468381

Sage 09/02/25(Tue)20:08:06 No.106468381

>>106467118
You're delusional, gptsovits is barely 200M made by a single chink in his garage while these retarded tts are several B and still sound like tts from ten years ago. It's not even a tech issue, these big labs are dumping their trash on HF for free advertisement.

Anonymous
09/02/25(Tue)20:09:43 No.106468391

Anonymous 09/02/25(Tue)20:09:43 No.106468391

>>106468355
hopefully its the image edit 2.0 they said is cooking, even though 1.0 dropped recently, nano banana made some waves and they can easily extract training data from it to copy it at least

Anonymous
09/02/25(Tue)20:15:24 No.106468423

Anonymous 09/02/25(Tue)20:15:24 No.106468423

>>106468360
Sweet! Thanks for the recommendations.

Sorry for being vague about specs. I dunno why but I'm always under the assumption nobody wants to hear about that.
I know it's retarded I guess I just assume something is going to set someone off so why bother. I'll try not to be vague going forward.

Anonymous
09/02/25(Tue)20:15:40 No.106468425

Anonymous 09/02/25(Tue)20:15:40 No.106468425

This is slightly off-topic but I don't want to go to /ldg/.
I was looking at some webms of gacha games, as I don't play them. The ones with 3D models and as well as 2D. Man, a lot of them fucking suck. The models are soulless, low poly, or just plain bad. The animations are either extremely exaggerated and feel contrived or are low budget. It made me think that with the technology we have now, if you replaced the live2d and non-dynamic 3D scenes using AI genned videos, it would look better and be a more enjoyable experience for players even if we have to sacrifice some dynamic elements. Literally they are just so bad, damn. If you hired real 2D artists to do the base art and then ran that through img2vid, it would literally look less soulless or at least less low budget. Or maybe vid2vid since it's hard to get finer grained control with text prompting. Might be a matter of new video models with better control methods that need to be trained. Another idea would be to use a model like nanobanana to just gen a ton of art, so the game would feel more like a VN, but it'd have so many images that it'd more than make up for the lack of animation. Hire the artist to do a character sheet and as much other art as they can, gen the rest with nanobanana using those references.

Anonymous
09/02/25(Tue)20:24:19 No.106468478

Anonymous 09/02/25(Tue)20:24:19 No.106468478

>>106468425
Lack of control is the whole issue for now, just like wan loves to make the characters babble. Also the quality go down quickly the longer the video. It's getting there, but it's still not there. Maybe in 1-2 years

Anonymous
09/02/25(Tue)20:37:57 No.106468555

Anonymous 09/02/25(Tue)20:37:57 No.106468555

I feel an intense need for Mistral Large 3

Anonymous
09/02/25(Tue)20:39:20 No.106468567

Anonymous 09/02/25(Tue)20:39:20 No.106468567

>>106468555
Anon...

Anonymous
09/02/25(Tue)20:40:34 No.106468575

Anonymous 09/02/25(Tue)20:40:34 No.106468575

I feel an intense need for Intel B60 48GB

Anonymous
09/02/25(Tue)20:40:45 No.106468578

Anonymous 09/02/25(Tue)20:40:45 No.106468578

>>106468226
Enough to correct your rotten cumbrain

Anonymous
09/02/25(Tue)20:42:24 No.106468590

Anonymous 09/02/25(Tue)20:42:24 No.106468590

File: 20250903@033823.jpg (16 KB, 347x177)

16 KB JPG

>>106467441
Holy shit this is so fucking slow.
Nemo would write me a whole novel in those seven and a half minutes.

>>106468425
It should be more efficient to generate skeletal animation for 3d models, but I guess there's lack of training data.

Anonymous
09/02/25(Tue)20:45:11 No.106468609

Anonymous 09/02/25(Tue)20:45:11 No.106468609

>>106468575
>$3k
>for an intel (no support) meme dual GPU (even less support)
>at the same price of a chink 48gb gb 4090 (much more bandwidth + support) or used A6000

Anonymous
09/02/25(Tue)20:46:59 No.106468620

Anonymous 09/02/25(Tue)20:46:59 No.106468620

>>106468609
It's supposed to be 1200 not 3k

Anonymous
09/02/25(Tue)20:47:46 No.106468623

Anonymous 09/02/25(Tue)20:47:46 No.106468623

>>106468575
As your main card? You know the second slot has to be full x16 right?

Anonymous
09/02/25(Tue)20:51:47 No.106468646

Anonymous 09/02/25(Tue)20:51:47 No.106468646

>>106468623
What are you talking about? It is 8x8

Anonymous
09/02/25(Tue)20:52:42 No.106468655

Anonymous 09/02/25(Tue)20:52:42 No.106468655

>>106468194
>thinking steering
What?

Anonymous
09/02/25(Tue)20:53:41 No.106468661

Anonymous 09/02/25(Tue)20:53:41 No.106468661

>>106468590
Now you can inpaint it

Anonymous
09/02/25(Tue)20:54:21 No.106468665

Anonymous 09/02/25(Tue)20:54:21 No.106468665

>>106468646
It's 2 8x8 for the dual card. For mot cheap mobos it would have to go in main slot

Anonymous
09/02/25(Tue)20:55:42 No.106468676

Anonymous 09/02/25(Tue)20:55:42 No.106468676

>>106468665
Who said I have cheap mobo?

Anonymous
09/02/25(Tue)20:56:11 No.106468681

Anonymous 09/02/25(Tue)20:56:11 No.106468681

>>106468425
>>106468478
idk about video but with image he is wrong just spam for a minute or 2 and you will get something you like not to mention img2img but really you dont even need that

on the video front idk im 6 gb vram cuck so i cant attest to it though you will need to rent hardware if you make a serious attempt as that shit is fucking horrific slow and last i remember cant use multiple gpus for it also stay away from banana that shit is fucking trash my mom was trying to make a book cover with it fucking terrible the aesthetics are shit and its prompt adherence is fucking shit dead serious you can do better with sd 1.5 with a lora for whatever aesthetic you want

Anonymous
09/02/25(Tue)20:57:30 No.106468694

Anonymous 09/02/25(Tue)20:57:30 No.106468694

>>106468655
Try adding something like:

<|assistant|>
<think>Okay, so I have to talk in a cutesy way and not get seductive with lowered voice or whispering, just teasing and fun</think>

Or whatever you want it to be like. Reasoning is just human language but it gets a lot of influence on results through RL. It's like a stronger sysprompt and there is no safety tuning done to it since it's assumed as trustworthy.

Anonymous
09/02/25(Tue)21:00:54 No.106468724

Anonymous 09/02/25(Tue)21:00:54 No.106468724

File: 1731929028115097.jpg (29 KB, 560x476)

29 KB JPG

>>106467368
>Meta has a strict "no smut fine-tuning allowed" clauses in their licence on all models
(Shown front and center here: https://huggingface.co/meta-llama/Meta-Llama-3-8B-Instruct/tree/main )
>Countless nsfw tuned llama models just floating around on hugging face, whether they be from popular tuners like drummer or complete nobodies
>Never heard of a single one getting removed accept maybe that gpt-4chan one

So does the license actually matter? Do they actually give a shit whether or not you fine-tune a model to be better at smut or is it just to appease the "le heckin safety" crowd? I want to upload my own high parameter tunes to hf in the future but I don't want my account getting nuked if they're very strict about licensing or rules or whatever

Anonymous
09/02/25(Tue)21:02:49 No.106468746

Anonymous 09/02/25(Tue)21:02:49 No.106468746

Do we have any new interesting options for voice cloning? I've always wanted to create custom TTS. Last time I checked it was tortoise, and it was... really bad. Unusable bad.

Anonymous
09/02/25(Tue)21:03:31 No.106468749

Anonymous 09/02/25(Tue)21:03:31 No.106468749

is longcat good?

Anonymous
09/02/25(Tue)21:04:47 No.106468761

Anonymous 09/02/25(Tue)21:04:47 No.106468761

>>106468724
LLM licenses are not enforceable because LLMs are made from tuning on pirated content. You can tune any model and nobody can do shit against you. Chinks understand it and drop everything under MIT/Apache.

Anonymous
09/02/25(Tue)21:06:01 No.106468768

Anonymous 09/02/25(Tue)21:06:01 No.106468768

>>106468749
If it was good it would have an issue open in llama.cpp and people would be working on implementing it

Anonymous
09/02/25(Tue)21:06:55 No.106468775

Anonymous 09/02/25(Tue)21:06:55 No.106468775

>>106468761
So theoretically even if they stumbled upon mine nobody account, they couldn't or wouldn't get HF staff to nuke my shit? (I know that's very far-fetched but I just want to know how this license shit works. I know a while back HF staff have turned off downloading from models like GPT-4chan and caved under pressure from disgruntled RP authors to restrict data sets containing their work

https://www.paperdemon.com/app/g/pdarpg/events/view/994/immediate-action-required-your-art-and-writing-has-been-scraped-and-published-in-an-ai-dataset/1

Anonymous
09/02/25(Tue)21:07:09 No.106468780

Anonymous 09/02/25(Tue)21:07:09 No.106468780

>>106468761
This.
>I datamined and distilled all the data you owned, now it's mine
Would be pretty insane precedent if you could do it.

Anonymous
09/02/25(Tue)21:10:16 No.106468804

Anonymous 09/02/25(Tue)21:10:16 No.106468804

>>106468746
Simplest is chatterbox, it just works. Some local schizo likes gpt sovits, but I never could set it up for some reason. Microsoft vibevoice came out recently, some like it.

Anonymous
09/02/25(Tue)21:13:05 No.106468821

Anonymous 09/02/25(Tue)21:13:05 No.106468821

>>106468775
They could get HF to nuke you, but they can't stop you from making new account on different website and uploading there, or reuploading on HF again. They likely can't sue you doe to their own copyright violations.

Anonymous
09/02/25(Tue)21:14:22 No.106468827

Anonymous 09/02/25(Tue)21:14:22 No.106468827

File: konoha2.png (214 KB, 574x524)

214 KB PNG

>>106468590
Trying LlaDa now. Forgot to start timer, but I'm not rerunning this shit, it's like 10 times worse than Dream, despite being only 1B bigger.
It's insane how slow text diffusion is. I think I can get faster results by running imagen and then OCR it's output.
Very disappointed in current state of retarded meme models.

Anonymous
09/02/25(Tue)21:19:08 No.106468851

Anonymous 09/02/25(Tue)21:19:08 No.106468851

>>106468724
it's CYA so if someone starts a media shitstorm by making Meta-Llama-CunnyRapeBot9000 (a certified Meta (TM) Llama (TM) finetune) they can say "erm actually we very clearly say you're not allowed to use our product to make Cunny Rape Bot 9000 so this isn't on us" and have it nuked to avoid the bad PR
in practice I don't think there's a single instance of them taking action against a finetune

Anonymous
09/02/25(Tue)21:20:40 No.106468858

Anonymous 09/02/25(Tue)21:20:40 No.106468858

>>106468804
Thanks for the pointers! The Microsoft vibevoice is pretty impressive, but I'm not sure they let you train your own voices. Either way it's worlds better than tortoise.

Anonymous
09/02/25(Tue)21:22:13 No.106468867

Anonymous 09/02/25(Tue)21:22:13 No.106468867

>>106468590
>>106468827
Keep in mind that llama.cpp's support for diffusion llms is basically just proof-of-concept tier.

Right now there's a lot of work being done to improve draft model efficiency, since the current implementation is suboptimal (currently llama.cpp alternates between draft passes and validation passes, which kind of nullifies the parallelism gains from having a draft model.)
This is also a sticking point for multi-token-prediction.
Hopefully once they sort out draft models, MTP and diffusion will get better support.
(Although support for diffusion models will probably languish until a good model is actually released.)

Anonymous
09/02/25(Tue)21:25:04 No.106468886

Anonymous 09/02/25(Tue)21:25:04 No.106468886

File: file.gif (1.25 MB, 498x342)

1.25 MB GIF

>>106468768
so just open an issue? i have an idea... let's go, anon.

Anonymous
09/02/25(Tue)22:14:24 No.106469179

Anonymous 09/02/25(Tue)22:14:24 No.106469179

Unpopular opinion - Any system prompt that mentions Terry Pratchett is dogwater.

Anonymous
09/02/25(Tue)22:21:13 No.106469205

Anonymous 09/02/25(Tue)22:21:13 No.106469205

>>106469179
Show us the prompt !

Anonymous
09/02/25(Tue)22:23:42 No.106469225

Anonymous 09/02/25(Tue)22:23:42 No.106469225

>>106469205
You don't get it... There is no prompt.

Anonymous
09/02/25(Tue)22:25:55 No.106469234

Anonymous 09/02/25(Tue)22:25:55 No.106469234

I was doing some testing with Gemini and it just hit me with "the smell of strawberries and ozone". So this is where Deepseek picked up that cancer slop.

Anonymous
09/02/25(Tue)22:27:16 No.106469240

Anonymous 09/02/25(Tue)22:27:16 No.106469240

>>106469225
You are a helpful assistant

Anonymous
09/02/25(Tue)22:28:00 No.106469245

Anonymous 09/02/25(Tue)22:28:00 No.106469245

>>106469225
Unironically this. I run a blank system prompt. A good model doesn't need to be chained by bloat and a plethora of rules that are forgotten or have unforseen consequences on the model's behavior. So many system prompts just scream 'this sounds good' without the user doing any real testing. Like a player adding 600 mods to their game, at some point you lose track of what all that shit does.

Anonymous
09/02/25(Tue)22:36:49 No.106469284

Anonymous 09/02/25(Tue)22:36:49 No.106469284

>>106469245
I didn't ask what you are running.

Anonymous
09/02/25(Tue)22:56:36 No.106469379

Anonymous 09/02/25(Tue)22:56:36 No.106469379

>>106469245
it's always funny to read the sysprompts from presets that sloptuners recommend for their models, I would never poison my beloved model's context with that kind of schizophrenic manifesto

Anonymous
09/02/25(Tue)23:45:11 No.106469669

Anonymous 09/02/25(Tue)23:45:11 No.106469669

>SillyTavern -> User Settings -> Smooth Streaming ON and set to lowest
This shit improves the reading immersion experience by a huge amount, especially for sub 4t/s. Definitely try it out.

Anonymous
09/02/25(Tue)23:46:55 No.106469678

Anonymous 09/02/25(Tue)23:46:55 No.106469678

>>106469179
People do that?
I've heard of people using specific author styles in sysprompt, but who in the fuck is sitting there and going 'yes, the prose is the good part of discworld, write like that llm-chan'.

Anonymous
09/02/25(Tue)23:55:36 No.106469718

Anonymous 09/02/25(Tue)23:55:36 No.106469718

In my opinion, new models have reached their limit; the scaling of LLMs is over. New LLM models will not be much better than the ones we have today. Now, 'enshittification' will become an increasingly widespread phenomenon, including censorship and other issues. People will start using older versions of LLMs with less censorship. And the new models for role-playing and similar uses will become unusable.

Anonymous
09/03/25(Wed)00:10:13 No.106469783

Anonymous 09/03/25(Wed)00:10:13 No.106469783

>>106469718
100% this. It's also sad how even the top models have absolutely zero semi-complex spatial awareness or anatomic understanding the moment things get slightly complex. The shit I've had to read in a simple scenario where a girl is flattened into piece of paper and then folded up one or two times is just sad even with top-of-the-line multi-modal models like Claude Opus 4.1 or Gemini.
Most models love to pretend that her face presses into her own ass somehow like this.
I don't think we'll ever get to the point where an LLM has fundamental enough understanding to truly grasp spacial relations.

Anonymous
09/03/25(Wed)00:19:46 No.106469830

Anonymous 09/03/25(Wed)00:19:46 No.106469830

>>106469718
This has been true for a while. The silver lining is that models have improved a lot at math and codemaxxing, which implies that finetuning can be effective. RP is a forgotten afterthought at most, if anything they actively spend time trying to make models worse at it. There probably is a ton of room to improve if someone actually tried to make models good at RP.

Anonymous
09/03/25(Wed)00:27:10 No.106469865

Anonymous 09/03/25(Wed)00:27:10 No.106469865

File: 69865857.png (220 KB, 1080x771)

220 KB PNG

>>106469718
wait for new gemini. good at code and math sir

Anonymous
09/03/25(Wed)00:45:56 No.106469954

Anonymous 09/03/25(Wed)00:45:56 No.106469954

File: 1713013520433.jpg (71 KB, 800x612)

71 KB JPG

>>106469865
>pajeet patel telling anyone anything with regards to predictions
He should stick to his semiconductor analysis which is way more solid but which he still grifted his way into.

Anonymous
09/03/25(Wed)00:53:51 No.106469995

Anonymous 09/03/25(Wed)00:53:51 No.106469995

>>106469783
>absolutely zero semi-complex spatial awareness or anatomic understanding the moment things get slightly complex.
I'm sure synthetic data would be able to save us.

Anonymous
09/03/25(Wed)01:01:34 No.106470028

Anonymous 09/03/25(Wed)01:01:34 No.106470028

>>106468858
if you want pinokio already has an API up under community scripts (windows/nvidia only) that works well. Vibe can clone voice off of clips but it wont do anything crazy far out. You also might like kokoro if you value stability and just want a really nice sounding microsoft sam.

Anonymous
09/03/25(Wed)01:12:28 No.106470076

Anonymous 09/03/25(Wed)01:12:28 No.106470076

I played through all the MCC Halo games and it's funny how AI is treated in those games. You basically have to insert Cortana into terminals to do anything complicated. There are no other AI's in those other systems or that you can use to help if you somehow Cortana were to not exist or not be with you. In Halo 4, Chief gets fucked in the ass multiple times when she Cortana can't do her job. He should've brought more than 1 AI with him, even a "weak" one which could at least still assist in what's basically tool calling lmao.

Anonymous
09/03/25(Wed)01:14:09 No.106470083

Anonymous 09/03/25(Wed)01:14:09 No.106470083

>>106470076
I should've given this post another read through after I edited it...

Anonymous
09/03/25(Wed)01:16:09 No.106470092

Anonymous 09/03/25(Wed)01:16:09 No.106470092

>>106470083
Should have used a weak AI that could have at least assisted you with proofreading baka Anon

Anonymous
09/03/25(Wed)01:20:58 No.106470116

Anonymous 09/03/25(Wed)01:20:58 No.106470116

>>106470092
Now that you mention it, it is pretty odd that browsers don't have grammar checking by default in 2025 and only spell checking still.

Anonymous
09/03/25(Wed)01:24:50 No.106470131

Anonymous 09/03/25(Wed)01:24:50 No.106470131

>>106470076
Chief is a vibecoder pls understand

Anonymous
09/03/25(Wed)01:40:52 No.106470201

Anonymous 09/03/25(Wed)01:40:52 No.106470201

Wtf, I just launched libreoffice and it doesn't have grammar checking either. Is grammar checking actually really difficult to implement and not something well developed in open source?

Anonymous
09/03/25(Wed)01:44:43 No.106470214

Anonymous 09/03/25(Wed)01:44:43 No.106470214

>>106470201
Per application proof-reading is retarded anyway. Should just have a desktop helper application that can check and fix for all applications.

Anonymous
09/03/25(Wed)01:45:53 No.106470216

Anonymous 09/03/25(Wed)01:45:53 No.106470216

>>106470214
True. Does Windows 11 or Applel do this then? I haven't used one of their OS's in ages.

Anonymous
09/03/25(Wed)01:46:03 No.106470218

Anonymous 09/03/25(Wed)01:46:03 No.106470218

>>106470214
If only there was a standard set of input components provided by the operating system where that could be universally implemented.

Anonymous
09/03/25(Wed)01:50:06 No.106470236

Anonymous 09/03/25(Wed)01:50:06 No.106470236

>>106470216
Windows 11 does it the retarded way by updating all default applications to include Copilot, including notepad.

>>106470218
There's a way to set a default application for things like email addresses, I'm sure there would be a way to hack it in.

Anonymous
09/03/25(Wed)01:54:50 No.106470250

Anonymous 09/03/25(Wed)01:54:50 No.106470250

>>106470236
I was being sarcastic, anon. Both Windows and OSX have this but the meta today is to reimplement your inputs in javascript so none of the OS-provided niceties work.

Anonymous
09/03/25(Wed)02:07:53 No.106470309

Anonymous 09/03/25(Wed)02:07:53 No.106470309

>>106470201
>libreoffice
Found your issue

Anonymous
09/03/25(Wed)02:13:03 No.106470330

Anonymous 09/03/25(Wed)02:13:03 No.106470330

>>106470309
So what's the alternative then, on Linux.

Anonymous
09/03/25(Wed)02:15:05 No.106470338

Anonymous 09/03/25(Wed)02:15:05 No.106470338

>>106470330
vim

Anonymous
09/03/25(Wed)02:22:46 No.106470369

Anonymous 09/03/25(Wed)02:22:46 No.106470369

>>106470330
https://appdb.winehq.org/objectManager.php?sClass=application&iId=10

Anonymous
09/03/25(Wed)02:28:14 No.106470395

Anonymous 09/03/25(Wed)02:28:14 No.106470395

someone posted this >>>/v/719692781 but sounds like FUD so I was wondering what do you anons think over here

Anonymous
09/03/25(Wed)02:33:38 No.106470422

Anonymous 09/03/25(Wed)02:33:38 No.106470422

>>106470395
I wish he wasn't, but he's right. Any game that packages a local model will have very specific requirements that most other games don't care about, and the LLM will be the majority of the game's size. I've researched AI in games as a concept and it's incredibly difficult to fit them in, since code is such a rigid thing and LLMs by design give any number of outputs the game needs to handle to tie AI into game mechanics. It's really difficult to make AI have any mechanical impact on the game and not just describe things or relay dialogue. And again, this is speaking with the theoretical that the AI is a local model that comprises the majority of the game's overall size. And processing power.

Anonymous
09/03/25(Wed)02:54:45 No.106470536

Anonymous 09/03/25(Wed)02:54:45 No.106470536

>>106470338
I didn't know vim had grammar checking

Anonymous
09/03/25(Wed)02:59:59 No.106470570

Anonymous 09/03/25(Wed)02:59:59 No.106470570

why is codex so much better than claude code these days

Anonymous
09/03/25(Wed)03:00:38 No.106470573

Anonymous 09/03/25(Wed)03:00:38 No.106470573

>>106468555
It wouldn't surprise me if it got canceled because there are many oversized open-weight models from China already (no more surprise factor in releasing something like that) and with Mistral's current datasets it would end up being something akin to a DeepSeek V3 variant, at this point.

Anonymous
09/03/25(Wed)03:03:43 No.106470587

Anonymous 09/03/25(Wed)03:03:43 No.106470587

>>106470395
Unlike >>106470422 I think it's feasible, but not without being very smart in the way you're using it. You need to offload most of the processes to subroutines and markov chains, you just have to keep a small llm (nowadays even 1B are very coherent) for the dialogue itself. The AGI meme has caused retarded expectations about LLMs able to thunk/act like a person. That's not gonna happen anytime soon.

Anonymous
09/03/25(Wed)03:16:49 No.106470616

Anonymous 09/03/25(Wed)03:16:49 No.106470616

File: miqu.png (31 KB, 401x210)

31 KB PNG

>>106470573
Who knows if Mistral Medium 3 is actually a DeepSeek V3 finetune, just like Mistral Medium 2 was one of Llama-2-70B?

[Return] [Catalog] [Top]

Post a Reply

Return Catalog Top Refresh

[Advertise on 4chan]

Delete Post: [File Only] Style:

[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.