[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology

Name
Options
Comment
Verification
4chan Pass users can bypass this verification. [Learn More] [Login]
File
  • Please read the Rules and FAQ before posting.
  • You may highlight syntax and preserve whitespace by using [code] tags.

08/21/20New boards added: /vrpg/, /vmg/, /vst/ and /vm/
05/04/17New trial board added: /bant/ - International/Random
10/04/16New board for 4chan Pass users: /vip/ - Very Important Posts
[Hide] [Show All]


[Advertise on 4chan]


File: four arms.png (2.2 MB, 2120x1416)
2.2 MB
2.2 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106497597 & >>106491545

►News
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905
>(09/04) Tencent's HunyuanWorld-Voyager for virtual world generation: https://hf.co/tencent/HunyuanWorld-Voyager
>(09/04) Google released a Gemma embedding model: https://hf.co/google/embeddinggemma-300m
>(09/04) Chatterbox added better multilingual support: https://hf.co/ResembleAI/chatterbox

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106497597

--Multi-GPU server hardware choices for DDR5 and NUMA optimization:
>106501160 >106501257 >106501342 >106501360 >106501442 >106501465 >106501290 >106501297 >106501417
--Token speed estimates for LLMs using GIGABYTE CXL memory card vs VRAM configurations:
>106498668 >106498678 >106498702 >106499735 >106499745 >106498766
--Optimizing VibeVoice-Large model for efficient speech generation and voice sample cleanup:
>106498676 >106498704 >106498714 >106499018 >106499389 >106499448 >106499466 >106499831 >106499967 >106500073 >106500670 >106500879 >106501145 >106501158 >106501172 >106501230 >106499863 >106499875 >106499907 >106499916 >106500081 >106500089 >106500140 >106503518
--Model recommendations for average gaming hardware with VRAM constraints:
>106502406 >106502445 >106502478 >106502521 >106502528 >106502551 >106502813 >106502914 >106502932 >106502986
--Interpretation of llama_backend_print_memory output for GPU/CPU memory usage:
>106501583 >106501653 >106501677 >106501706 >106501727 >106501822 >106501932
--DDR5 vs DDR4 tradeoffs for CPUmaxx systems with GPU support:
>106503602 >106503731 >106503756 >106503762 >106503824 >106503854 >106504044
--VibeVoice model optimization and download link:
>106498428 >106498434 >106498959 >106499005
--Anthropic's $1.5B AI settlement criticized for insufficient compensation and stifling innovation:
>106499477 >106499488 >106499521 >106499499 >106499518 >106499574 >106499693 >106502081
--AMD FlashAttention workarounds and text-to-speech project updates:
>106499449 >106499480 >106499614 >106500912
--VibeVoice TTS compatibility with quantized 7b models on low-resource hardware:
>106501006 >106501612
--Miku (free space):
>106498210 >106500301 >106503405 >106503587

►Recent Highlight Posts from the Previous Thread: >>106497599

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: file.png (31 KB, 882x151)
31 KB
31 KB PNG
>update debian 13 to debian 13.1
>picrel
>>
>>106504130
No, sometimes it picks up things from context and decides what direction and inflection to take from there but the biggest factor is the original sample voice it clones, if it has a lot of angry yelling and annoyance that will be reflected in the result, so maybe have several emotion samples for the same voice set as different voices?
>>
>>106504276
Thank you Recap Recap Miku
>>
>>106503862
So much technical support is behind closed doors in discord, it makes no sense, the platform was never meant to do that, and now many issues will never be discussed in the open, helping no one.
>>
>>106504377
just reinstall cuda 12.8
>>
>>106504448
It's an actual tragedy.
>>
>>106504377
Not like this, John.
>>
File: file.png (44 KB, 894x263)
44 KB
44 KB PNG
>>106504456
i am using cuda 12.8
>>
>>106504448
Zoomers seem to love it, probably because it's simpler than using an actual support forum.
Yet in high traffic ones it's almost unusable.
>>
>>106504485
yes get rid of it and reinstall 12.8
>>
>>106504495
>Yet in high traffic ones it's almost unusable.
so much activity is probably a big part of the attraction to those with dysfunctional attention spans
>>
>>106504448
yeah it fucking sucks
>>
>>106504448
what kills me are the projects doing that shit, choosing to use discord
>>
>>106504514
It's just memes and mundane chat spam, it makes finding useful discussions hard, especially with discord's fuzzy search.
>>
>>106504538
And then there's the moderators.
>>
>>106504520
If they're pros, I think they're just treating it as a free Slack. Except Slack is optimized for internal discussion with teams of people there to do a job, not thousands of overexcited zoomies.
I actually wonder if discord datasets exist and are used in LLM training.
>>
The tokens might a statistical representation of language but the boners are real.
>>
File: comfymikus.png (787 KB, 1024x1280)
787 KB
787 KB PNG
Comfy Mikus
>>
>>106504832
Cum on Miku's feet*
>>
>>106504448
I mean, there will be a big loss of data for stuff like that from around 2015 onwards but given I am fairly adept technically so it doesn't bother me and you still can find help elsewhere, it's just in much less volume.
>>106504582
I am guessing it is valuable to some extent for data about zoomers and younger folks but I wonder how valuable it is when that demographic itself are the most affected by LLM and internet culture regurgitation and mind numbing retardation in general. No question though, the RP logs probably are equal if not 2nd to the RP forum scrapes given CAI seemed to have trained on them for their 2022 bot that everyone yearns for.
>>
>>106504702
It sends a shiver down my spine. Something primal...
>>
>>106504508
i fixed it by updating my chroot aswell
>>
Does my Alice sound Alicey enough?
https://vocaroo.com/1jAce1dHRBYD
I think it sound really muffled because I ran it through a voice cleaning model to get rid of music, but maybe it enhances the 50's mic aesthetic
>>
>>106504242
The only software optimization left is 1-2B active routed expert models.
>>
>>106504377
Bro you installed pytorch without cuda support
>>
>>106505064
It sounds great, especially
>a den of hedonous virgins
>>
File: 1754225681328634.gif (140 KB, 379x440)
140 KB
140 KB GIF
>>106504242
Retardo, without the current software optimization you'd need a datacenter to run these models
>>
>>106505094
heathenous ackshully
>>
>>106504964
>that demographic itself are the most affected by LLM and internet culture regurgitation and mind numbing retardation in general
If you mean the constant virtue signaling, it's mostly a fake persona they all share in public places because they just are terrified of being judged by their friends (who are obviously also online).

>No question though, the RP logs probably are equal if not 2nd to the RP forum scrapes given CAI seemed to have trained on them for their 2022 bot that everyone yearns for.
CAI was so good and showed that outside of the big public servers, you probably have plenty small ones where a lot more is discussed freely.
>>
Did I miss a guide or link on OP to teaching me about using TTS in Sillytavern? Could someone kindly point me the right way if there is one?
>>
File: 1756766130320791.jpg (81 KB, 1000x707)
81 KB
81 KB JPG
I await my magnum v5.
>>
File: 1745332932509593.png (3.23 MB, 914x1802)
3.23 MB
3.23 MB PNG
>>106504274
For those of you who need the VibeVoice Large Weights

https://huggingface.co/aoi-ot/VibeVoice-Large
>>
really sick of comfyui. is there any other interface for vibevoice?
>>
>>106505235
Why is her ass pointed at me? I am offended.
>>
>>106505276
the original gradio interface it comes packaged with?
>>
Does anyone have Q4 of VibeVoice?
>>
>>106505316
Buy a computer rajesh
>>
>>106505408
>still obsessed with trannies, politics and indians
>>
>>106505293
https://github.com/microsoft/VibeVoice
this? There is nothing in it
>>
>>106505316
It's not prequalted, people load regular fp16 model in 4 bits. try https://github.com/wildminder/ComfyUI-VibeVoice
>>
>>106505422
prequanted*
>>
>>106505422
man please, is there any other UI? I hate how bloated that piece of shit UI is
>>
2 million context window
for free
and you keep using your local slop
smhtbhfamalam
>>
>>106505422
Ok thanks, I'll take a look. I want it to be as small as possible because I'll be running it alongside LLM.
>>
>>106505288
so you can j-j-jam it in
>>
>>106505421
find one of the forks before MS wiped it
>>
>>106505288
she is nervious and preparing her stink glands
>>
>>106505444
These local clowns dont know what they are missing out on
>>
File: 7643.jpg (153 KB, 1080x820)
153 KB
153 KB JPG
>>106505444
>>
>>106505463
no UI just example scripts. these jeets just shipped it cli wtf
>>
>>106505444
have fun sending your life history to google
>>
>>106505493
Google?
>>
TTS-occupied thread.
>>
>>106505490
I wonder if these constant hype videos still work
>>
>>106505432
take the inference code from the node and use it wherever you want
>>
>>106505507
wouldn't keep making them if they didn't
>>
>>106505507
every slopwatcher is desensitized. so basically that thumbnail is appropiate and almost falls in the
>oh he's not overly estatic, maybe it's a cool niche channel with good content
category
>>
>>106505530
that's true
>>
>>106505432
Be the change you want to see dumbo
>>
>>106505564
I'm going to steal this
https://github.com/wildminder/ComfyUI-VibeVoice
And implement it externally. It should be pretty straightforward.
>>
File: 1756309867017273.png (1.05 MB, 774x1024)
1.05 MB
1.05 MB PNG
>>106505572
if you do it without the comfy backend I would like to make a plugin for anistudio off of it. hot reloading is solved in dev so I'd like to make a few examples of models not supported in ggml yet
>>
>>106505491
CLI is all you need.
>>
Imagination is all you need.
>>
>>106505620
wrong thread
>>
>>106505628
local miku general
>>
>>106505596
I'm not a "real" dev but I am strongly suspecting that I'm able to hack something together. Please don't hold your breath still...
Been working on lots of stuff lately.
>>
>>106505641
you and me both. my uncle died yesterday so a lot of time has been with the family. shit sucks but at least work was done despite the depression. wan support was added to sdcpp recently so I think it's almost time to get the memory management and node interface in. it's been a lot of cmake garbage juggling for the past while and I'm sick of it
>>
goybros our response?
https://github.com/microsoft/VibeVoice/issues/97
>>
>>106505758
I would argue (and I do) that the wizardlm debacle was more ridicilious. Some say it is still undergoing toxicity testing to this day
>>
>>106505757
nobody cares it took a glacial ice age to inference and image edit on your vramlet card

>>106505758
the furk? Isn't that an /ldg/ meme man?
>>
>>106505834
but 1 minute and 10s is pretty good for a 17b model with cfg and at 20 steps.. with a 110w pl
>>
>>106505848
this is the llm thread. most people here are vramlet at 48gb. just go seek attention at the diffusion threads, there are four at this point and you chose this one instead. you are fucking retarded. tts is fine because there isn't anywhere else to discuss it
>>
>>106505881
>tfw 32gb vramlet
My cope is that qwen3 30b is good enough.
>>
>>106505920
yeah... iktf
>>
                                                                                                  Mistral Large 3
>>
>>106505834
>/ldg/ meme man
do not being ridicilious
>>
>>106505952
>>
>>106505952
ugh i need it so bad
>>
>>106505952
DO NOT RELEASE!
>>
>>106505952
>"w-what the fuck is this? A DENSE 120B MODEL? HOW WILL MY MOESISSY RIG EVEN RUN IT?"
and just like that benchmaxxing moe chinks lost
>>
While looking into lossy text compression I found https://www.rwkv.com/ and have fallen into a little bit of a rabbit hole
>10/10 logo
>the official AI of Linux Foundation
>100% attention-free
>weird enough architecture that it needs its own software stack
>supposedly 400+ derivative projects
>no buzz whatsoever about it
Their models are tiny (~3B and less for main offerings) so probably not useful for anything, but I am curious about supposed speed benefits and wanted to run some performance benchmarks against similarly sized transformers-based models.
But it's fucking python.
There are goofs on hugging face by literally who's but I suspects they are just crude conversions that loose all architecture buffs.
Should I give up on Arch and install Debian or it wouldn't help much?
>>
>>106506094
RWKV is a meme model.
>>
>>106506094
if u install debian you should use debian 12 because debian 13 has no official support for cuda yet (you have to modify some things because of glibc...)
also >rwkv
sweet summer chld
>>
>>106506094
this thing has been trying to become something for years now, all the models are a shit
>>
>>106506094
>RWKV (pronounced RwaKuv)
I hate maths people.
>>
>>106506094
>rwkv
just wait until you hear about retnet and you'll be all caught up when it comes to memes people thought would totally replace transformers soon back in 2023
>>
>>106506112
RWKV models are installed on every Windows machine, making them the most successful models
>>
>>106506122
just two more years bro
>>
>>106506129
https://blog.rwkv.com/p/rwkvcpp-shipping-to-half-a-billion
holy shit its real
>its apache so its ok
cuckie
>>
the new kimi 0905 is fire.
just prompted a few medical questions on openrouter and benchmaxxed against gpt5/gemini2.5pro/opus4.1/qwenmax. (openrouter system prompt off, no websearch). there were always 1-2 good additional points in the kimi answers the other models didn't bring up. I'd accuse alibaba of prompt enhancing my query or using web search stealthly with the api call, but kimi responds so fucking fast, there's no way that's happening. So yeah, idk wtf's going on.
>>
File: neat.jpg (119 KB, 1500x1155)
119 KB
119 KB JPG
>>106506112
I love meme models
>>106506129
So I should install Windows 11 instead of Debian, huh.
>>
>>106506145
some intern probably added rwkv model loading to some copilot function to fuck around with it for an afternoon and they shipped the binaries by accident
>>
File: file.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
qwen takes 70-80s per image on 3060
nice
>>
>>106506168
>So I should install Windows 11 instead of Debian, huh.
It only supports up to RWKV5 and they're up to 7 now.
>>
>>106506171
no it because greens https://blog.rwkv.com/p/the-worlds-greenest-ai-model-rwkvs
>>
>>106506168
>Windows 11
make sure to turn on recall! its a super helpful feature that is of course extremely secure and would never be misused by anyone.
>>
>>106506187
they said it's local* so it's ok
*local at time of recording and storage in plain text, they never promised not to upload it as part of telemetry
>>
>>106506197
I mean it's very smart, make the user use their compute and electricity to process the data then send yourself the compressed telemetry result, probably gives massive savings
>>
Nemo will finally rest.
>>
>>106504276
lmao at the image that's brilliant
>>
>>106505098
We still need a datacenter to train models, which is the main innovation bottleneck.
>>
>>106505288
It's your new home
>>
>>106506228
yeah image of hatsune miku holding a naughty sign
>>
>>106504832
She will comfort you
>>
>>106505288
She's going to shit and piss herself and make you watch
>>
>>106506316
Hatsune Miku: Comfort Girl
>>
>>106504377
you have to use uv
>>
File: 1750478146142274.png (11 KB, 475x214)
11 KB
11 KB PNG
I'm pretty sure I used Qwen3 Max thinking less than a day ago. Did they disable it?
>>
What's the difference between Mistral Nemo 12B and ReWiz Nemo 12B?
>>
>>106506353
You are not crazy, they did disable that, though I think that wasn't actually Max doing the thinking when that was on.
>>
i like the best friend remix, it's cute
>>
File: Gz_ok2fbkAANfa3.jpg (1.65 MB, 1920x1080)
1.65 MB
1.65 MB JPG
>>106506388
yee she's very sweet
>>
>>106506324
More like Cumfart Girl, amirite?
>>
Which one?
https://github.com/Enemyx-net/VibeVoice-ComfyUI
https://github.com/wildminder/ComfyUI-VibeVoice
>>
>>106506439
I fucking hate comfyui developers. just try one, it shouldn't matter
>>
>>106506439
I've seen the second one mentioned here before.
>>
>>106506525
The second one last updated 3 days ago. First one last updated 4 hours ago.
>>
>>106506554
using a node system just to go from text to speech seems like overkill to me
>>
>>106506566
I don't want to use it but the official webui has no step and attention mode control, and can't add new voices without restarting.
>>
Either OR is hosting a whole bunch of faulty K2-0905 deployments or this model is just bad for being a 1T monster. GLM4.5, R1-0528 and even V3.1 are all more enjoyable and smarter.
>>
>>106506566
It's nice if you want to plug it into a bigger workflow it's just that comfy in particular is lacking 80% of the features for a proper node workflow.
>>
>>106506596
Local doesn't have this problem.
>>
>>106506607
Can you confirm the full unqanted K2-0905 is fine or are you just shitposting?
>>
>>106506591
Just modify it to let you type in a file path in a text box for the voice and hardcode your desired step count/attention mode.
>>
>>106506611
i use the official API and it's fine ;)
just sayin'
>>
>>106506611
I can confirm that you wouldn't have doubts about whether or not you're getting duped by openrouter if you were testing unquanted K2 locally.
>>
>>106506607
I guess I'll try those. I haven't gotten around downloading them but I very much hope that Q6 is significantly better than what OR is serving me because this is seriously not worth it otherwise.
>>
>>106506596
Turn off OR system prompt
>>
>openrouter
lule
>>
yes saar please use my api saar no quantized saar very good like microsoft azure saaar
>>
>>106505444
>2 million context window
>for free
Where? how slop is it? How safetymaxxed?
>>
>>106506751
Yes.
>>
>>106506732
why would you even quantize a model that is natively in 4-bit? probably because their shit backend doesn't support mxfp4
>>
>>106506751
New grok models on openrouter
it's not slop or safety maxed, just collecting all ur data
>>
>>106506784
>New grok models on openrouter
Ah i see it thanks
>Collecting all ur data
What isnt doing that?
>>
>>106506765
They say it was a "mistake" buy who knows
Also check this out lol
https://x.com/andersonbcdefg/status/1955348480643477570
>>
File: 1747371153899119.png (83 KB, 705x472)
83 KB
83 KB PNG
>>106505444
llama 4 scout best model have 10m context sir
shit in your face sir
>>
>>106506849
Llama 4 Reasoner when?
>>
>>106506799
The swiss 70B model that has the output quality of a 8B llama model :D
>>
>>106499389
>>106498240
>>106499448
>>106499466
>>106503518
You can clean up vocals with these.
bandit v2. This will separate vocals from background music and sound effects. Because the GitHub page has no instructions to guide you, you'll need something like Microsoft Copilot to help you.
https://github.com/kwatcharasupat/bandit-v2
Resemble Enhance. This removes background noises like the wind. Use the gradio app version for better effect.
https://github.com/resemble-ai/resemble-enhance
Also use this modded gradio app. This will only do denoising.
https://github.com/resemble-ai/resemble-enhance/issues/69
Acon Digital DeVerberate 3 plugin for audacity. This reduces reverb.
https://rutracker.org/forum/viewtopic.php?t=6118812
Moises ai pro plan does a better job at isolating vocals from background music and sound effects than bandit v2 but it costs $300, i bought it during a black friday sale for $150.
>>
>>106506856
just a couple more war rooms and a few more billion spent on randos who didn't accomplish anything at apple but are totally worth hiring for a hundred million a piece
then we can make the true llama4
>>
>>106506849
Bloody benchoid
I will redeem amazon free to run this beautiful basterd bitch
>>
>>106506888
>you'll need something like Microsoft Copilot to help you.
local models lost
>>
>>106506896
llama 4.20 next april will be so lit
>>
>>106506860
didnt know about that have you tried the 70b quant or the 8b?
>>
File: 30474 - SoyBooru.png (118 KB, 337x390)
118 KB
118 KB PNG
Are you feeling kiwi today? (Qwen®) (More models coming soon™) (Two weeks)
>>
>>106506711
How?
>>
>>106506888
>Use the gradio app version for better effect.
>Also use this modded gradio app.
FUCKING
>>
>grok code fast performance improves at 70k+ prompt tokens
ts some kv cache magic or what's happening here. Can't do shit in a fresh session. Bloat that bitch up with some pseudo context and suddenly it's god mode

>>106506916
The one on the site. Ain't no way I'm gonna run that locally. Absolute waste of time.

>>106506933
Click the tree dots on the model tab at the top on the chat page. Then disable "use openrouter system prompt". You always gotta check settings and make sure no frauds like on
>>106506732
are serving you.
>>
https://litter.catbox.moe/49eylpj3rj8ry1nz.wav
>>
Is there an indian LLM?
1.5 billion indians and no indian LLM?
saars?
>>
>>106507037
we must refuse
>>
>>106507059
You are permanently fixated on Indians.
>>
File: Gemini-2.0-Flash-001.png (951 KB, 1344x756)
951 KB
951 KB PNG
>>106507059
Gemini.
>>
>>106507072
gm sir
>>
UPDATE
indexTTS2 is still not released
END UPDATE
>>
>>106507130
Who? I can't hear you over the sound of my Microsoft-sponsored ASMR.
>>
is GIGAVOICE better than the soviets, or just easier to get going with?
>>
OpenAI just released a very interesting paper
https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
>Why Language Models Hallucinate
In short: LLMs hallucinate because we've inadvertently designed the training and evaluation process to reward confident, even if incorrect, answers, rather than honest admissions of uncertainty. Fixing this requires a shift in how we grade these systems to steer them towards more trustworthy behavior.

The Solution:

Explicitly stating "confidence targets" in evaluation instructions, where mistakes are penalized and admitting uncertainty (IDK) might receive 0 points, but guessing incorrectly receives a negative score. This encourages "behavioral calibration," where the model only answers if it's sufficiently confident.
>>
>>106507142
vibevoice is a generational leap above whatever was before
>>
>>106507149
we need to train them with Socratic dialogs so we have philosopher kings to rule over us.
>>
>>106506596
use mikupad or turn off all ST's formatting, you will see a massive difference
>>
>>106507149
You can really tell that all the talent left because nobody intelligent would write something this retarded.
>>
>>106507149
this is just more safety slopping
models hallucinate because coming up with new stuff that was not in the training set is their inherent and desirable property.
>>106507158
That would be cool actually.
>>
        {%- if loop.first %}
{{- "[Round " ~ (ns.rounds) ~ "] USER:" }}
{%- else %}
{{- " [Round " ~ (ns.rounds) ~ "] USER:"}}

It appears there's no way to use LongCat-Flash-Chat with most frontends except with chat completion mode.

SillyTavern's aggressively useless STScript allegedly can increment variables but even macros don't work right/consistently in instruction templates so I'm not even going to try it.
>>
>>106507149
Aw sweet, maybe now more people will understand things we've known about for ages.
>>
>>106507149
base models are more honest (at least internally) about their certainty per token
>we've inadvertently designed the training and evaluation process to reward confident, even if incorrect, answers
this has been a known problem with how they approach instruct training, another paper just stating the obvious
>>
>>106505490
Those men in the picture should have gay HIV sex with one another.
>>
>>106507180
>LongCat-Flash-Chat
>>
>>106507258
fujo go home
>>
>>106507268
Do fujo's dream of HIV sex?
>>
>>106507258
Based fujo.
>>
This is the 1.5B model, generated using the demo. Pretty amazing stuff...
https://litter.catbox.moe/0a895dbvq9a21sya.wav
If you need sound sources try this website
https://www.soundboard.com/category
>>
>>106507258
ahh ahh fujo...
>>
>>106507263
>To mitigate potential contamination from existing open-source benchmarks and enhance evaluation confidence, we meticulously constructed two new benchmarks: Meeseeks (Wang et al., 2025a) and VitaBench.
>>
>>106507409
>Meeseeks
This can't be real.
>>
File: miggy.jpg (190 KB, 992x1487)
190 KB
190 KB JPG
>>106507288
>eat poster for dinner
Do it, I dare you
>>
>They think vibevoice is good
https://voca.ro/1ovxYUlilVV4
>>
File: IMG_8543.jpg (1.84 MB, 4030x2197)
1.84 MB
1.84 MB JPG
>>
>>106507499
>https://voca.ro/1ovxYUlilVV4
for a second i thought you did this but in a different voice
https://vocaroo.com/1mUGlhbVCFvm
>>
>>106506094
I gave up on python for now and got the goofs for the RWKV ~3B model, and as I expected, speed seems to be in the same ballpark as Qwen3 4B.
I guess real benefits should manifest at longer context.
I'll need a more proper testing rig for this one instead of just typing "write a poem about beauty of young girl's navel" in the chat and looking at console logs.
But I'm sleepy so maybe tomorrow.
>>
>>106507499
>>106507514
From where are you getting voice examples? That's pretty hard. Plus most clips downloadable clips are often pretty noisy or have background music.
>>
>>106507539
>most clips downloadable clips are often pretty noisy or have background music.
this might help
https://github.com/Anjok07/ultimatevocalremovergui
>>
>>106507499
that sounds perfectly fine to me, what problems do you have with it exactly?
>>
>>106507512
I like these Bakas and Dipsy
>>
>>106507548
Thanks. Also figuring out why inference_from_file.py won't use cuda. Cuda should be enabled by default...
>>
>>106507514
Good idea. Here's my try: https://voca.ro/13jnfkR0QnPf
>>
>>106507149
I thought they hallucinate because when a completion is poorly represented in the dataset, the model generates a probability set that's likely to pick a bad token.
Post training with more "I don't know" answers might help, yeah. Though it would have to pull the weights pretty strongly to overcome all the other non confident possibilities.
>>
>>106507553
That was the latest gptsovits, not vibevoice. Eat barely 4GB of VRAM and took 2s to generate that. I still can't understand the hype over M$ new toy.
>>
>>106507589
this one sounds pretty fucking bad on the other hand
>>106507596
oh figures why it completely shat the bed when it got to grotesqueries, sovits's vocabulary is utterly piss poor and almost a complete deal breaker for me
>>
File: bakas.mp4 (2.98 MB, 1280x672)
2.98 MB
2.98 MB MP4
>>106507512
>>
Don't you dare!
>>
>>106507616
adorable
>>
>>106507613
I used whisper on your sample to get the text so it's understandable. The pronunciation is easy to fix if you pass the arpabet transcription directly (I integrated it in my api)
>>
>>106507654
I'm not gonna deal with all that when VibeVoice sounds five times more natural and can say pretty much every word I've thrown at it without fiddling around with syntax and shit and again I just drop a minute long sample to clone instead of going trhough the tiresome and lengthy training process for each new voice with Sovits
Sovits can sound fine when you carefully cherry pick the good generations but even then it tends to sound stilted, Sovits is like 2 generations behind the curve here
>>
>>106507704
Well we will see if they drop their training script first, then I might give it a go. Not being able to finetune it is doa for me
>>
>>106507725
I don't see that ever happening.
>>
>https://github.com/ggml-org/llama.cpp/pull/15327
So does this really mean we can finally use models like how they do in image gen, where you just download the base model and the loras you want?
>>
>>106507725
I can see that happening.
>>
>>106507763
>where you just download the base model and the loras you want?
I can already do that. You could for a while.
aLoRA is about changing those during runtime, right?
>>
>>106507800
>I can already do that
So why don't any of us just do that then? Why do people still upload and download the merged weights?
>>
>>106507824
Because the average retard user can't even get a single-click installer working without copius amount of hand-holding. Trying to explain loras and usage would be asking to much and would hurt download stats. Easier to just provide a plug and play model.
>>
>>106507824
Sorry, meant to write
>I'm pretty sure we can already do that
Also, back in the day, there used to be a couple of LoRas out and about.
Notably, SuperCOT and SuperHOT.
Hell, in the PR itself there's a normal LoRa alongside the aLoRa.
For some reason, we all just decided to distribute pre-merged weights instead of just the LoRa, no idea why.
>>
>>106507835
But local image gen has more users and they deal with needing to mess with loras fine.
>>
>>106507704
https://litter.catbox.moe/rehari2tvedhwccm.wav
>>
>>106507616
Ty, saved.
>>
>>106507863
Loras are basically required for image gen so the frontends make them a major component and easy to add/set. Less important for giant text models that can't be so easily changed and so are basically an afterthought. It means messing with adding it to the scary command-line arguments instead of a file selection field on a web intefrace.
>>
>>106507866
kek
>>
>>106507866
wtf i love sovits now
>>
>>106507866
Depending whether that's sovits or not you either proved his point with how lifeless that sounds or vibe sucks
>>
>>106508067
Sir, this is 1.5B VibeVoice.
>>
>>106508067
It sounds stilted as all fuck. My money is on it being soviet.
>>
>>106508090
How long did that take the generate? I have the large wights downloaded from the torrent link posted a couple threads ago but I want to know whether or not it's worth using the big version or the smaller version (haven't tested either yet)
>>
How are you guys using VV? Any rentry for retards to setup with ST?
>>
>>106504274
Where the hell are you finding the money for all the gpus to run this shit
>>
>>106508142
Steady employment.
>>
>>106508138
>>106501145
>>
>>106508142
Money just appears in my bank account every month. It's crazy.
>>
>>106508142
GPUs? Poor people like us CPUmaxx
>>
>>106508212
I hope you at least tell your parents thank you
>>
File: 21522 - SoyBooru.png (46 KB, 457x694)
46 KB
46 KB PNG
Was 'berry the worst marketing campaign that started the downfall of OpenAI and killed the hype?
>>
multi-token prediction status?
>>
>>106508193
I mean, I could afford around 10k worth if I really wanted to, but there are better things I could do with 10k
>>
>>106508399
lazy ggergachod won't do the needful, kindly ask ikawrakow
Or wait until cloud models can code it for us, we better not be hitting the wall.
>>
>>106508121
1.5B is comparable to SDXL image gen speed at 1024x1024.
Just use the large model if you can.
>>
>>106508415
Such as?
>>
can a rtx pro 6000 gen vibevoice large in real time?
>>
File: moatboy at google hq.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
Any Google insiders here? How are Gemini 3/Gemma update working out? Did you hit the wall too like the rest?
>>
https://voca.ro/14hcU3N3ZLxZ
>>
>>106508142
Having money saved up from previous work. I have a 3090 but spending on more ram seems a lot more worth it now compared to adding an extra gpu due to all the fuckhuge models coming out recently.
>>
>>106508444
Gemini? We're now simulating the next generation of LLMs within our Genie 3 world model that uses its capabilities to manifest a SOTA llm writing responses within its virtual world.
>>
>>106508461
How safe is it?
>>
>>106508447
Is that your voice?
>>
>>106508461
How good are they at math and coding?
>>
>>106508480
>>106508506
The only questions investors care about
>>
>>106507866
This is just a recording, you can't fool me!!
>>
>>106508425
Student loans, this year's Roth contribution, emergency fund, saving for a house (hopefully the market crashes), paying a lawyer to research a business idea I've been sitting on
>>
File: afis.jpg (18 KB, 516x532)
18 KB
18 KB JPG
My AFIS roleplay just got voices and they are GOOD! VibeVoice really is impressive. There's probably an easy way to integrate it with ST as well.
>>
>>106508552
Post an example.
>>
>>106508552
Large is pretty sensitive to how good the input voice data is. You want clear, smooth studio quality and it will get close to it, it has a massive range in type and age of voice.
>>
English is cool and all, but I'm not moving off sovits unless someone can demonstrate jap abilities better than what i get with my custom trained model paired with clean samples.
>>
VibeVoice recognized gluck gluck gluck as blowjob noices, lmao I'm feeling real unsafe now
>>
>>106508621
We need like a list of sounds it recognizes, it seems a bit random
>>
File: Taylor County PSA.jpg (425 KB, 2655x1500)
425 KB
425 KB JPG
>Real niggas listen to what they feel in they gut after a long shift at the warehouse or when they ridin’ out to the track to hustle horses or pedicabs. If ya real, ya feel: trap beats, Drill tempo (especially from UK drill, or Miami slime), but also music a man love his daughter to.
w-what?
>>
>>106508641
lolwut
>>
>>106508596
https://vocaroo.com/12A6GA08pA5C
This is with ~10s of uncleaned input audio per voice/speaker. The original voices are also low fidelity, that's not VibeVoice crushing them btw.

>>106508604
Yeah, I've been playing around with it for a bit now and it seems quite versatile. I wonder if there's a way to get it to do laughs or perhaps precisely fiddle with the inflection mid sentence? I think I remember doing something like that with TortoiseTTS a while ago.
>>
>>106508831
It seemingly tries to emulate bitrate and noise of the original clip and maybe even exaggerate it.
>>
>>106508848
Wouldn't say it exaggerates the effect. Sounds pretty spot on to me, especially Fox. Only thing I'd like to improve is the inflections and the random bouts of background noise. Probably could all be fixed with better and longer input voices. I should probably clean it up a bit and give it more than 10 seconds per voice.
>>
>>106508831
Are you just using the default settings? I am getting good results inbetween ones that just spazzes the fuck out lol
>>
>>106506094
>I am curious about supposed speed benefits
RWKV is one specific state-space model (SSM) architecture. The chief practical difference with transformers is an "infinite" lossy context. As the context grows, runtime performance won't degrade, but the memory of older tokens will gradually fade. SSM proponents also argue that the context compression intrinsic to SSMs produces superior results in some use-cases.

Any RWKV models I've played with were retarded, seemingly from the training regime rather than architecture. Bo Peng claims the newer models are at least on-par with transformers of the same size. I'm awaiting the newest 14B to check it out again.
>>
>>106508641
https://vocaroo.com/141Mzkn5YKGh
>>
>serious
Is there an open source vision LLM with an open license that's on par with Gemma? I can't find one that's on par with abliterated Gemma cause they are all STEM benchmaxxed.

I guess you can say I'm looking for the Nemo of vision, but sadly Pixtral doesn't cut it either.
>>
what happened to petra? i just saw the name somewhere and made me think of /lmg/ kek
>>
>>106509086
The official RWKV models will never be good because they are trained on EleutherAI-sourced open training data on a shoestring budget. It will take a commercial company to make something half-decent with this architecture.
>>
do we have any prompting guides for vibevoice? curious if there's any way to control it beyond just a simple script
>>
any way of using my 6700xt with my 9070xt for 28gbs of vram?



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.