[a / b / c / d / e / f / g / gif / h / hr / k / m / o / p / r / s / t / u / v / vg / vm / vmg / vr / vrpg / vst / w / wg] [i / ic] [r9k / s4s / vip] [cm / hm / lgbt / y] [3 / aco / adv / an / bant / biz / cgl / ck / co / diy / fa / fit / gd / hc / his / int / jp / lit / mlp / mu / n / news / out / po / pol / pw / qst / sci / soc / sp / tg / toy / trv / tv / vp / vt / wsg / wsr / x / xs] [Settings] [Search] [Mobile] [Home]
Board
Settings Mobile Home
/g/ - Technology


Thread archived.
You cannot reply anymore.


[Advertise on 4chan]


File: four arms.png (2.2 MB, 2120x1416)
2.2 MB
2.2 MB PNG
/lmg/ - a general dedicated to the discussion and development of local language models.

Previous threads: >>106497597 & >>106491545

►News
>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1
>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905
>(09/04) Tencent's HunyuanWorld-Voyager for virtual world generation: https://hf.co/tencent/HunyuanWorld-Voyager
>(09/04) Google released a Gemma embedding model: https://hf.co/google/embeddinggemma-300m
>(09/04) Chatterbox added better multilingual support: https://hf.co/ResembleAI/chatterbox

►News Archive: https://rentry.org/lmg-news-archive
►Glossary: https://rentry.org/lmg-glossary
►Links: https://rentry.org/LocalModelsLinks
►Official /lmg/ card: https://files.catbox.moe/cbclyf.png

►Getting Started
https://rentry.org/lmg-lazy-getting-started-guide
https://rentry.org/lmg-build-guides
https://rentry.org/IsolatedLinuxWebService
https://rentry.org/recommended-models
https://rentry.org/samplers

►Further Learning
https://rentry.org/machine-learning-roadmap
https://rentry.org/llm-training
https://rentry.org/LocalModelsPapers

►Benchmarks
LiveBench: https://livebench.ai
Programming: https://livecodebench.github.io/leaderboard.html
Code Editing: https://aider.chat/docs/leaderboards
Context Length: https://github.com/adobe-research/NoLiMa
GPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference

►Tools
Alpha Calculator: https://desmos.com/calculator/ffngla98yc
GGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-Calculator
Sampler Visualizer: https://artefact2.github.io/llm-sampling

►Text Gen. UI, Inference Engines
https://github.com/lmg-anon/mikupad
https://github.com/oobabooga/text-generation-webui
https://github.com/LostRuins/koboldcpp
https://github.com/ggerganov/llama.cpp
https://github.com/theroyallab/tabbyAPI
https://github.com/vllm-project/vllm
>>
►Recent Highlights from the Previous Thread: >>106497597

--Multi-GPU server hardware choices for DDR5 and NUMA optimization:
>106501160 >106501257 >106501342 >106501360 >106501442 >106501465 >106501290 >106501297 >106501417
--Token speed estimates for LLMs using GIGABYTE CXL memory card vs VRAM configurations:
>106498668 >106498678 >106498702 >106499735 >106499745 >106498766
--Optimizing VibeVoice-Large model for efficient speech generation and voice sample cleanup:
>106498676 >106498704 >106498714 >106499018 >106499389 >106499448 >106499466 >106499831 >106499967 >106500073 >106500670 >106500879 >106501145 >106501158 >106501172 >106501230 >106499863 >106499875 >106499907 >106499916 >106500081 >106500089 >106500140 >106503518
--Model recommendations for average gaming hardware with VRAM constraints:
>106502406 >106502445 >106502478 >106502521 >106502528 >106502551 >106502813 >106502914 >106502932 >106502986
--Interpretation of llama_backend_print_memory output for GPU/CPU memory usage:
>106501583 >106501653 >106501677 >106501706 >106501727 >106501822 >106501932
--DDR5 vs DDR4 tradeoffs for CPUmaxx systems with GPU support:
>106503602 >106503731 >106503756 >106503762 >106503824 >106503854 >106504044
--VibeVoice model optimization and download link:
>106498428 >106498434 >106498959 >106499005
--Anthropic's $1.5B AI settlement criticized for insufficient compensation and stifling innovation:
>106499477 >106499488 >106499521 >106499499 >106499518 >106499574 >106499693 >106502081
--AMD FlashAttention workarounds and text-to-speech project updates:
>106499449 >106499480 >106499614 >106500912
--VibeVoice TTS compatibility with quantized 7b models on low-resource hardware:
>106501006 >106501612
--Miku (free space):
>106498210 >106500301 >106503405 >106503587

►Recent Highlight Posts from the Previous Thread: >>106497599

Why?: >>102478518
Enable Links: https://rentry.org/lmg-recap-script
>>
File: file.png (31 KB, 882x151)
31 KB
31 KB PNG
>update debian 13 to debian 13.1
>picrel
>>
>>106504130
No, sometimes it picks up things from context and decides what direction and inflection to take from there but the biggest factor is the original sample voice it clones, if it has a lot of angry yelling and annoyance that will be reflected in the result, so maybe have several emotion samples for the same voice set as different voices?
>>
>>106504276
Thank you Recap Recap Miku
>>
>>106503862
So much technical support is behind closed doors in discord, it makes no sense, the platform was never meant to do that, and now many issues will never be discussed in the open, helping no one.
>>
>>106504377
just reinstall cuda 12.8
>>
>>106504448
It's an actual tragedy.
>>
>>106504377
Not like this, John.
>>
File: file.png (44 KB, 894x263)
44 KB
44 KB PNG
>>106504456
i am using cuda 12.8
>>
>>106504448
Zoomers seem to love it, probably because it's simpler than using an actual support forum.
Yet in high traffic ones it's almost unusable.
>>
>>106504485
yes get rid of it and reinstall 12.8
>>
>>106504495
>Yet in high traffic ones it's almost unusable.
so much activity is probably a big part of the attraction to those with dysfunctional attention spans
>>
>>106504448
yeah it fucking sucks
>>
>>106504448
what kills me are the projects doing that shit, choosing to use discord
>>
>>106504514
It's just memes and mundane chat spam, it makes finding useful discussions hard, especially with discord's fuzzy search.
>>
>>106504538
And then there's the moderators.
>>
>>106504520
If they're pros, I think they're just treating it as a free Slack. Except Slack is optimized for internal discussion with teams of people there to do a job, not thousands of overexcited zoomies.
I actually wonder if discord datasets exist and are used in LLM training.
>>
The tokens might a statistical representation of language but the boners are real.
>>
File: comfymikus.png (787 KB, 1024x1280)
787 KB
787 KB PNG
Comfy Mikus
>>
>>106504832
Cum on Miku's feet*
>>
>>106504448
I mean, there will be a big loss of data for stuff like that from around 2015 onwards but given I am fairly adept technically so it doesn't bother me and you still can find help elsewhere, it's just in much less volume.
>>106504582
I am guessing it is valuable to some extent for data about zoomers and younger folks but I wonder how valuable it is when that demographic itself are the most affected by LLM and internet culture regurgitation and mind numbing retardation in general. No question though, the RP logs probably are equal if not 2nd to the RP forum scrapes given CAI seemed to have trained on them for their 2022 bot that everyone yearns for.
>>
>>106504702
It sends a shiver down my spine. Something primal...
>>
>>106504508
i fixed it by updating my chroot aswell
>>
Does my Alice sound Alicey enough?
https://vocaroo.com/1jAce1dHRBYD
I think it sound really muffled because I ran it through a voice cleaning model to get rid of music, but maybe it enhances the 50's mic aesthetic
>>
>>106504242
The only software optimization left is 1-2B active routed expert models.
>>
>>106504377
Bro you installed pytorch without cuda support
>>
>>106505064
It sounds great, especially
>a den of hedonous virgins
>>
File: 1754225681328634.gif (140 KB, 379x440)
140 KB
140 KB GIF
>>106504242
Retardo, without the current software optimization you'd need a datacenter to run these models
>>
>>106505094
heathenous ackshully
>>
>>106504964
>that demographic itself are the most affected by LLM and internet culture regurgitation and mind numbing retardation in general
If you mean the constant virtue signaling, it's mostly a fake persona they all share in public places because they just are terrified of being judged by their friends (who are obviously also online).

>No question though, the RP logs probably are equal if not 2nd to the RP forum scrapes given CAI seemed to have trained on them for their 2022 bot that everyone yearns for.
CAI was so good and showed that outside of the big public servers, you probably have plenty small ones where a lot more is discussed freely.
>>
Did I miss a guide or link on OP to teaching me about using TTS in Sillytavern? Could someone kindly point me the right way if there is one?
>>
File: 1756766130320791.jpg (81 KB, 1000x707)
81 KB
81 KB JPG
I await my magnum v5.
>>
File: 1745332932509593.png (3.23 MB, 914x1802)
3.23 MB
3.23 MB PNG
>>106504274
For those of you who need the VibeVoice Large Weights

https://huggingface.co/aoi-ot/VibeVoice-Large
>>
really sick of comfyui. is there any other interface for vibevoice?
>>
>>106505235
Why is her ass pointed at me? I am offended.
>>
>>106505276
the original gradio interface it comes packaged with?
>>
Does anyone have Q4 of VibeVoice?
>>
>>106505316
Buy a computer rajesh
>>
>>106505408
>still obsessed with trannies, politics and indians
>>
>>106505293
https://github.com/microsoft/VibeVoice
this? There is nothing in it
>>
>>106505316
It's not prequalted, people load regular fp16 model in 4 bits. try https://github.com/wildminder/ComfyUI-VibeVoice
>>
>>106505422
prequanted*
>>
>>106505422
man please, is there any other UI? I hate how bloated that piece of shit UI is
>>
2 million context window
for free
and you keep using your local slop
smhtbhfamalam
>>
>>106505422
Ok thanks, I'll take a look. I want it to be as small as possible because I'll be running it alongside LLM.
>>
>>106505288
so you can j-j-jam it in
>>
>>106505421
find one of the forks before MS wiped it
>>
>>106505288
she is nervious and preparing her stink glands
>>
>>106505444
These local clowns dont know what they are missing out on
>>
File: 7643.jpg (153 KB, 1080x820)
153 KB
153 KB JPG
>>106505444
>>
>>106505463
no UI just example scripts. these jeets just shipped it cli wtf
>>
>>106505444
have fun sending your life history to google
>>
>>106505493
Google?
>>
TTS-occupied thread.
>>
>>106505490
I wonder if these constant hype videos still work
>>
>>106505432
take the inference code from the node and use it wherever you want
>>
>>106505507
wouldn't keep making them if they didn't
>>
>>106505507
every slopwatcher is desensitized. so basically that thumbnail is appropiate and almost falls in the
>oh he's not overly estatic, maybe it's a cool niche channel with good content
category
>>
>>106505530
that's true
>>
>>106505432
Be the change you want to see dumbo
>>
>>106505564
I'm going to steal this
https://github.com/wildminder/ComfyUI-VibeVoice
And implement it externally. It should be pretty straightforward.
>>
File: 1756309867017273.png (1.05 MB, 774x1024)
1.05 MB
1.05 MB PNG
>>106505572
if you do it without the comfy backend I would like to make a plugin for anistudio off of it. hot reloading is solved in dev so I'd like to make a few examples of models not supported in ggml yet
>>
>>106505491
CLI is all you need.
>>
Imagination is all you need.
>>
>>106505620
wrong thread
>>
>>106505628
local miku general
>>
>>106505596
I'm not a "real" dev but I am strongly suspecting that I'm able to hack something together. Please don't hold your breath still...
Been working on lots of stuff lately.
>>
>>106505641
you and me both. my uncle died yesterday so a lot of time has been with the family. shit sucks but at least work was done despite the depression. wan support was added to sdcpp recently so I think it's almost time to get the memory management and node interface in. it's been a lot of cmake garbage juggling for the past while and I'm sick of it
>>
goybros our response?
https://github.com/microsoft/VibeVoice/issues/97
>>
>>106505758
I would argue (and I do) that the wizardlm debacle was more ridicilious. Some say it is still undergoing toxicity testing to this day
>>
>>106505757
nobody cares it took a glacial ice age to inference and image edit on your vramlet card

>>106505758
the furk? Isn't that an /ldg/ meme man?
>>
>>106505834
but 1 minute and 10s is pretty good for a 17b model with cfg and at 20 steps.. with a 110w pl
>>
>>106505848
this is the llm thread. most people here are vramlet at 48gb. just go seek attention at the diffusion threads, there are four at this point and you chose this one instead. you are fucking retarded. tts is fine because there isn't anywhere else to discuss it
>>
>>106505881
>tfw 32gb vramlet
My cope is that qwen3 30b is good enough.
>>
>>106505920
yeah... iktf
>>
                                                                                                  Mistral Large 3
>>
>>106505834
>/ldg/ meme man
do not being ridicilious
>>
>>106505952
>>
>>106505952
ugh i need it so bad
>>
>>106505952
DO NOT RELEASE!
>>
>>106505952
>"w-what the fuck is this? A DENSE 120B MODEL? HOW WILL MY MOESISSY RIG EVEN RUN IT?"
and just like that benchmaxxing moe chinks lost
>>
While looking into lossy text compression I found https://www.rwkv.com/ and have fallen into a little bit of a rabbit hole
>10/10 logo
>the official AI of Linux Foundation
>100% attention-free
>weird enough architecture that it needs its own software stack
>supposedly 400+ derivative projects
>no buzz whatsoever about it
Their models are tiny (~3B and less for main offerings) so probably not useful for anything, but I am curious about supposed speed benefits and wanted to run some performance benchmarks against similarly sized transformers-based models.
But it's fucking python.
There are goofs on hugging face by literally who's but I suspects they are just crude conversions that loose all architecture buffs.
Should I give up on Arch and install Debian or it wouldn't help much?
>>
>>106506094
RWKV is a meme model.
>>
>>106506094
if u install debian you should use debian 12 because debian 13 has no official support for cuda yet (you have to modify some things because of glibc...)
also >rwkv
sweet summer chld
>>
>>106506094
this thing has been trying to become something for years now, all the models are a shit
>>
>>106506094
>RWKV (pronounced RwaKuv)
I hate maths people.
>>
>>106506094
>rwkv
just wait until you hear about retnet and you'll be all caught up when it comes to memes people thought would totally replace transformers soon back in 2023
>>
>>106506112
RWKV models are installed on every Windows machine, making them the most successful models
>>
>>106506122
just two more years bro
>>
>>106506129
https://blog.rwkv.com/p/rwkvcpp-shipping-to-half-a-billion
holy shit its real
>its apache so its ok
cuckie
>>
the new kimi 0905 is fire.
just prompted a few medical questions on openrouter and benchmaxxed against gpt5/gemini2.5pro/opus4.1/qwenmax. (openrouter system prompt off, no websearch). there were always 1-2 good additional points in the kimi answers the other models didn't bring up. I'd accuse alibaba of prompt enhancing my query or using web search stealthly with the api call, but kimi responds so fucking fast, there's no way that's happening. So yeah, idk wtf's going on.
>>
File: neat.jpg (119 KB, 1500x1155)
119 KB
119 KB JPG
>>106506112
I love meme models
>>106506129
So I should install Windows 11 instead of Debian, huh.
>>
>>106506145
some intern probably added rwkv model loading to some copilot function to fuck around with it for an afternoon and they shipped the binaries by accident
>>
File: file.png (1.26 MB, 1024x1024)
1.26 MB
1.26 MB PNG
qwen takes 70-80s per image on 3060
nice
>>
>>106506168
>So I should install Windows 11 instead of Debian, huh.
It only supports up to RWKV5 and they're up to 7 now.
>>
>>106506171
no it because greens https://blog.rwkv.com/p/the-worlds-greenest-ai-model-rwkvs
>>
>>106506168
>Windows 11
make sure to turn on recall! its a super helpful feature that is of course extremely secure and would never be misused by anyone.
>>
>>106506187
they said it's local* so it's ok
*local at time of recording and storage in plain text, they never promised not to upload it as part of telemetry
>>
>>106506197
I mean it's very smart, make the user use their compute and electricity to process the data then send yourself the compressed telemetry result, probably gives massive savings
>>
Nemo will finally rest.
>>
>>106504276
lmao at the image that's brilliant
>>
>>106505098
We still need a datacenter to train models, which is the main innovation bottleneck.
>>
>>106505288
It's your new home
>>
>>106506228
yeah image of hatsune miku holding a naughty sign
>>
>>106504832
She will comfort you
>>
>>106505288
She's going to shit and piss herself and make you watch
>>
>>106506316
Hatsune Miku: Comfort Girl
>>
>>106504377
you have to use uv
>>
File: 1750478146142274.png (11 KB, 475x214)
11 KB
11 KB PNG
I'm pretty sure I used Qwen3 Max thinking less than a day ago. Did they disable it?
>>
What's the difference between Mistral Nemo 12B and ReWiz Nemo 12B?
>>
>>106506353
You are not crazy, they did disable that, though I think that wasn't actually Max doing the thinking when that was on.
>>
i like the best friend remix, it's cute
>>
File: Gz_ok2fbkAANfa3.jpg (1.65 MB, 1920x1080)
1.65 MB
1.65 MB JPG
>>106506388
yee she's very sweet
>>
>>106506324
More like Cumfart Girl, amirite?
>>
Which one?
https://github.com/Enemyx-net/VibeVoice-ComfyUI
https://github.com/wildminder/ComfyUI-VibeVoice
>>
>>106506439
I fucking hate comfyui developers. just try one, it shouldn't matter
>>
>>106506439
I've seen the second one mentioned here before.
>>
>>106506525
The second one last updated 3 days ago. First one last updated 4 hours ago.
>>
>>106506554
using a node system just to go from text to speech seems like overkill to me
>>
>>106506566
I don't want to use it but the official webui has no step and attention mode control, and can't add new voices without restarting.
>>
Either OR is hosting a whole bunch of faulty K2-0905 deployments or this model is just bad for being a 1T monster. GLM4.5, R1-0528 and even V3.1 are all more enjoyable and smarter.
>>
>>106506566
It's nice if you want to plug it into a bigger workflow it's just that comfy in particular is lacking 80% of the features for a proper node workflow.
>>
>>106506596
Local doesn't have this problem.
>>
>>106506607
Can you confirm the full unqanted K2-0905 is fine or are you just shitposting?
>>
>>106506591
Just modify it to let you type in a file path in a text box for the voice and hardcode your desired step count/attention mode.
>>
>>106506611
i use the official API and it's fine ;)
just sayin'
>>
>>106506611
I can confirm that you wouldn't have doubts about whether or not you're getting duped by openrouter if you were testing unquanted K2 locally.
>>
>>106506607
I guess I'll try those. I haven't gotten around downloading them but I very much hope that Q6 is significantly better than what OR is serving me because this is seriously not worth it otherwise.
>>
>>106506596
Turn off OR system prompt
>>
>openrouter
lule
>>
yes saar please use my api saar no quantized saar very good like microsoft azure saaar
>>
>>106505444
>2 million context window
>for free
Where? how slop is it? How safetymaxxed?
>>
>>106506751
Yes.
>>
>>106506732
why would you even quantize a model that is natively in 4-bit? probably because their shit backend doesn't support mxfp4
>>
>>106506751
New grok models on openrouter
it's not slop or safety maxed, just collecting all ur data
>>
>>106506784
>New grok models on openrouter
Ah i see it thanks
>Collecting all ur data
What isnt doing that?
>>
>>106506765
They say it was a "mistake" buy who knows
Also check this out lol
https://x.com/andersonbcdefg/status/1955348480643477570
>>
File: 1747371153899119.png (83 KB, 705x472)
83 KB
83 KB PNG
>>106505444
llama 4 scout best model have 10m context sir
shit in your face sir
>>
>>106506849
Llama 4 Reasoner when?
>>
>>106506799
The swiss 70B model that has the output quality of a 8B llama model :D
>>
>>106499389
>>106498240
>>106499448
>>106499466
>>106503518
You can clean up vocals with these.
bandit v2. This will separate vocals from background music and sound effects. Because the GitHub page has no instructions to guide you, you'll need something like Microsoft Copilot to help you.
https://github.com/kwatcharasupat/bandit-v2
Resemble Enhance. This removes background noises like the wind. Use the gradio app version for better effect.
https://github.com/resemble-ai/resemble-enhance
Also use this modded gradio app. This will only do denoising.
https://github.com/resemble-ai/resemble-enhance/issues/69
Acon Digital DeVerberate 3 plugin for audacity. This reduces reverb.
https://rutracker.org/forum/viewtopic.php?t=6118812
Moises ai pro plan does a better job at isolating vocals from background music and sound effects than bandit v2 but it costs $300, i bought it during a black friday sale for $150.
>>
>>106506856
just a couple more war rooms and a few more billion spent on randos who didn't accomplish anything at apple but are totally worth hiring for a hundred million a piece
then we can make the true llama4
>>
>>106506849
Bloody benchoid
I will redeem amazon free to run this beautiful basterd bitch
>>
>>106506888
>you'll need something like Microsoft Copilot to help you.
local models lost
>>
>>106506896
llama 4.20 next april will be so lit
>>
>>106506860
didnt know about that have you tried the 70b quant or the 8b?
>>
File: 30474 - SoyBooru.png (118 KB, 337x390)
118 KB
118 KB PNG
Are you feeling kiwi today? (Qwen®) (More models coming soon™) (Two weeks)
>>
>>106506711
How?
>>
>>106506888
>Use the gradio app version for better effect.
>Also use this modded gradio app.
FUCKING
>>
>grok code fast performance improves at 70k+ prompt tokens
ts some kv cache magic or what's happening here. Can't do shit in a fresh session. Bloat that bitch up with some pseudo context and suddenly it's god mode

>>106506916
The one on the site. Ain't no way I'm gonna run that locally. Absolute waste of time.

>>106506933
Click the tree dots on the model tab at the top on the chat page. Then disable "use openrouter system prompt". You always gotta check settings and make sure no frauds like on
>>106506732
are serving you.
>>
https://litter.catbox.moe/49eylpj3rj8ry1nz.wav
>>
Is there an indian LLM?
1.5 billion indians and no indian LLM?
saars?
>>
>>106507037
we must refuse
>>
>>106507059
You are permanently fixated on Indians.
>>
File: Gemini-2.0-Flash-001.png (951 KB, 1344x756)
951 KB
951 KB PNG
>>106507059
Gemini.
>>
>>106507072
gm sir
>>
UPDATE
indexTTS2 is still not released
END UPDATE
>>
>>106507130
Who? I can't hear you over the sound of my Microsoft-sponsored ASMR.
>>
is GIGAVOICE better than the soviets, or just easier to get going with?
>>
OpenAI just released a very interesting paper
https://cdn.openai.com/pdf/d04913be-3f6f-4d2b-b283-ff432ef4aaa5/why-language-models-hallucinate.pdf
>Why Language Models Hallucinate
In short: LLMs hallucinate because we've inadvertently designed the training and evaluation process to reward confident, even if incorrect, answers, rather than honest admissions of uncertainty. Fixing this requires a shift in how we grade these systems to steer them towards more trustworthy behavior.

The Solution:

Explicitly stating "confidence targets" in evaluation instructions, where mistakes are penalized and admitting uncertainty (IDK) might receive 0 points, but guessing incorrectly receives a negative score. This encourages "behavioral calibration," where the model only answers if it's sufficiently confident.
>>
>>106507142
vibevoice is a generational leap above whatever was before
>>
>>106507149
we need to train them with Socratic dialogs so we have philosopher kings to rule over us.
>>
>>106506596
use mikupad or turn off all ST's formatting, you will see a massive difference
>>
>>106507149
You can really tell that all the talent left because nobody intelligent would write something this retarded.
>>
>>106507149
this is just more safety slopping
models hallucinate because coming up with new stuff that was not in the training set is their inherent and desirable property.
>>106507158
That would be cool actually.
>>
        {%- if loop.first %}
{{- "[Round " ~ (ns.rounds) ~ "] USER:" }}
{%- else %}
{{- " [Round " ~ (ns.rounds) ~ "] USER:"}}

It appears there's no way to use LongCat-Flash-Chat with most frontends except with chat completion mode.

SillyTavern's aggressively useless STScript allegedly can increment variables but even macros don't work right/consistently in instruction templates so I'm not even going to try it.
>>
>>106507149
Aw sweet, maybe now more people will understand things we've known about for ages.
>>
>>106507149
base models are more honest (at least internally) about their certainty per token
>we've inadvertently designed the training and evaluation process to reward confident, even if incorrect, answers
this has been a known problem with how they approach instruct training, another paper just stating the obvious
>>
>>106505490
Those men in the picture should have gay HIV sex with one another.
>>
>>106507180
>LongCat-Flash-Chat
>>
>>106507258
fujo go home
>>
>>106507268
Do fujo's dream of HIV sex?
>>
>>106507258
Based fujo.
>>
This is the 1.5B model, generated using the demo. Pretty amazing stuff...
https://litter.catbox.moe/0a895dbvq9a21sya.wav
If you need sound sources try this website
https://www.soundboard.com/category
>>
>>106507258
ahh ahh fujo...
>>
>>106507263
>To mitigate potential contamination from existing open-source benchmarks and enhance evaluation confidence, we meticulously constructed two new benchmarks: Meeseeks (Wang et al., 2025a) and VitaBench.
>>
>>106507409
>Meeseeks
This can't be real.
>>
File: miggy.jpg (190 KB, 992x1487)
190 KB
190 KB JPG
>>106507288
>eat poster for dinner
Do it, I dare you
>>
>They think vibevoice is good
https://voca.ro/1ovxYUlilVV4
>>
File: IMG_8543.jpg (1.84 MB, 4030x2197)
1.84 MB
1.84 MB JPG
>>
>>106507499
>https://voca.ro/1ovxYUlilVV4
for a second i thought you did this but in a different voice
https://vocaroo.com/1mUGlhbVCFvm
>>
>>106506094
I gave up on python for now and got the goofs for the RWKV ~3B model, and as I expected, speed seems to be in the same ballpark as Qwen3 4B.
I guess real benefits should manifest at longer context.
I'll need a more proper testing rig for this one instead of just typing "write a poem about beauty of young girl's navel" in the chat and looking at console logs.
But I'm sleepy so maybe tomorrow.
>>
>>106507499
>>106507514
From where are you getting voice examples? That's pretty hard. Plus most clips downloadable clips are often pretty noisy or have background music.
>>
>>106507539
>most clips downloadable clips are often pretty noisy or have background music.
this might help
https://github.com/Anjok07/ultimatevocalremovergui
>>
>>106507499
that sounds perfectly fine to me, what problems do you have with it exactly?
>>
>>106507512
I like these Bakas and Dipsy
>>
>>106507548
Thanks. Also figuring out why inference_from_file.py won't use cuda. Cuda should be enabled by default...
>>
>>106507514
Good idea. Here's my try: https://voca.ro/13jnfkR0QnPf
>>
>>106507149
I thought they hallucinate because when a completion is poorly represented in the dataset, the model generates a probability set that's likely to pick a bad token.
Post training with more "I don't know" answers might help, yeah. Though it would have to pull the weights pretty strongly to overcome all the other non confident possibilities.
>>
>>106507553
That was the latest gptsovits, not vibevoice. Eat barely 4GB of VRAM and took 2s to generate that. I still can't understand the hype over M$ new toy.
>>
>>106507589
this one sounds pretty fucking bad on the other hand
>>106507596
oh figures why it completely shat the bed when it got to grotesqueries, sovits's vocabulary is utterly piss poor and almost a complete deal breaker for me
>>
File: bakas.mp4 (2.98 MB, 1280x672)
2.98 MB
2.98 MB MP4
>>106507512
>>
Don't you dare!
>>
>>106507616
adorable
>>
>>106507613
I used whisper on your sample to get the text so it's understandable. The pronunciation is easy to fix if you pass the arpabet transcription directly (I integrated it in my api)
>>
>>106507654
I'm not gonna deal with all that when VibeVoice sounds five times more natural and can say pretty much every word I've thrown at it without fiddling around with syntax and shit and again I just drop a minute long sample to clone instead of going trhough the tiresome and lengthy training process for each new voice with Sovits
Sovits can sound fine when you carefully cherry pick the good generations but even then it tends to sound stilted, Sovits is like 2 generations behind the curve here
>>
>>106507704
Well we will see if they drop their training script first, then I might give it a go. Not being able to finetune it is doa for me
>>
>>106507725
I don't see that ever happening.
>>
>https://github.com/ggml-org/llama.cpp/pull/15327
So does this really mean we can finally use models like how they do in image gen, where you just download the base model and the loras you want?
>>
>>106507725
I can see that happening.
>>
>>106507763
>where you just download the base model and the loras you want?
I can already do that. You could for a while.
aLoRA is about changing those during runtime, right?
>>
>>106507800
>I can already do that
So why don't any of us just do that then? Why do people still upload and download the merged weights?
>>
>>106507824
Because the average retard user can't even get a single-click installer working without copius amount of hand-holding. Trying to explain loras and usage would be asking to much and would hurt download stats. Easier to just provide a plug and play model.
>>
>>106507824
Sorry, meant to write
>I'm pretty sure we can already do that
Also, back in the day, there used to be a couple of LoRas out and about.
Notably, SuperCOT and SuperHOT.
Hell, in the PR itself there's a normal LoRa alongside the aLoRa.
For some reason, we all just decided to distribute pre-merged weights instead of just the LoRa, no idea why.
>>
>>106507835
But local image gen has more users and they deal with needing to mess with loras fine.
>>
>>106507704
https://litter.catbox.moe/rehari2tvedhwccm.wav
>>
>>106507616
Ty, saved.
>>
>>106507863
Loras are basically required for image gen so the frontends make them a major component and easy to add/set. Less important for giant text models that can't be so easily changed and so are basically an afterthought. It means messing with adding it to the scary command-line arguments instead of a file selection field on a web intefrace.
>>
>>106507866
kek
>>
>>106507866
wtf i love sovits now
>>
>>106507866
Depending whether that's sovits or not you either proved his point with how lifeless that sounds or vibe sucks
>>
>>106508067
Sir, this is 1.5B VibeVoice.
>>
>>106508067
It sounds stilted as all fuck. My money is on it being soviet.
>>
>>106508090
How long did that take the generate? I have the large wights downloaded from the torrent link posted a couple threads ago but I want to know whether or not it's worth using the big version or the smaller version (haven't tested either yet)
>>
How are you guys using VV? Any rentry for retards to setup with ST?
>>
>>106504274
Where the hell are you finding the money for all the gpus to run this shit
>>
>>106508142
Steady employment.
>>
>>106508138
>>106501145
>>
>>106508142
Money just appears in my bank account every month. It's crazy.
>>
>>106508142
GPUs? Poor people like us CPUmaxx
>>
>>106508212
I hope you at least tell your parents thank you
>>
File: 21522 - SoyBooru.png (46 KB, 457x694)
46 KB
46 KB PNG
Was 'berry the worst marketing campaign that started the downfall of OpenAI and killed the hype?
>>
multi-token prediction status?
>>
>>106508193
I mean, I could afford around 10k worth if I really wanted to, but there are better things I could do with 10k
>>
>>106508399
lazy ggergachod won't do the needful, kindly ask ikawrakow
Or wait until cloud models can code it for us, we better not be hitting the wall.
>>
>>106508121
1.5B is comparable to SDXL image gen speed at 1024x1024.
Just use the large model if you can.
>>
>>106508415
Such as?
>>
can a rtx pro 6000 gen vibevoice large in real time?
>>
File: moatboy at google hq.png (1.83 MB, 1024x1024)
1.83 MB
1.83 MB PNG
Any Google insiders here? How are Gemini 3/Gemma update working out? Did you hit the wall too like the rest?
>>
https://voca.ro/14hcU3N3ZLxZ
>>
>>106508142
Having money saved up from previous work. I have a 3090 but spending on more ram seems a lot more worth it now compared to adding an extra gpu due to all the fuckhuge models coming out recently.
>>
>>106508444
Gemini? We're now simulating the next generation of LLMs within our Genie 3 world model that uses its capabilities to manifest a SOTA llm writing responses within its virtual world.
>>
>>106508461
How safe is it?
>>
>>106508447
Is that your voice?
>>
>>106508461
How good are they at math and coding?
>>
>>106508480
>>106508506
The only questions investors care about
>>
>>106507866
This is just a recording, you can't fool me!!
>>
>>106508425
Student loans, this year's Roth contribution, emergency fund, saving for a house (hopefully the market crashes), paying a lawyer to research a business idea I've been sitting on
>>
File: afis.jpg (18 KB, 516x532)
18 KB
18 KB JPG
My AFIS roleplay just got voices and they are GOOD! VibeVoice really is impressive. There's probably an easy way to integrate it with ST as well.
>>
>>106508552
Post an example.
>>
>>106508552
Large is pretty sensitive to how good the input voice data is. You want clear, smooth studio quality and it will get close to it, it has a massive range in type and age of voice.
>>
English is cool and all, but I'm not moving off sovits unless someone can demonstrate jap abilities better than what i get with my custom trained model paired with clean samples.
>>
VibeVoice recognized gluck gluck gluck as blowjob noices, lmao I'm feeling real unsafe now
>>
>>106508621
We need like a list of sounds it recognizes, it seems a bit random
>>
File: Taylor County PSA.jpg (425 KB, 2655x1500)
425 KB
425 KB JPG
>Real niggas listen to what they feel in they gut after a long shift at the warehouse or when they ridin’ out to the track to hustle horses or pedicabs. If ya real, ya feel: trap beats, Drill tempo (especially from UK drill, or Miami slime), but also music a man love his daughter to.
w-what?
>>
>>106508641
lolwut
>>
>>106508596
https://vocaroo.com/12A6GA08pA5C
This is with ~10s of uncleaned input audio per voice/speaker. The original voices are also low fidelity, that's not VibeVoice crushing them btw.

>>106508604
Yeah, I've been playing around with it for a bit now and it seems quite versatile. I wonder if there's a way to get it to do laughs or perhaps precisely fiddle with the inflection mid sentence? I think I remember doing something like that with TortoiseTTS a while ago.
>>
>>106508831
It seemingly tries to emulate bitrate and noise of the original clip and maybe even exaggerate it.
>>
>>106508848
Wouldn't say it exaggerates the effect. Sounds pretty spot on to me, especially Fox. Only thing I'd like to improve is the inflections and the random bouts of background noise. Probably could all be fixed with better and longer input voices. I should probably clean it up a bit and give it more than 10 seconds per voice.
>>
>>106508831
Are you just using the default settings? I am getting good results inbetween ones that just spazzes the fuck out lol
>>
>>106506094
>I am curious about supposed speed benefits
RWKV is one specific state-space model (SSM) architecture. The chief practical difference with transformers is an "infinite" lossy context. As the context grows, runtime performance won't degrade, but the memory of older tokens will gradually fade. SSM proponents also argue that the context compression intrinsic to SSMs produces superior results in some use-cases.

Any RWKV models I've played with were retarded, seemingly from the training regime rather than architecture. Bo Peng claims the newer models are at least on-par with transformers of the same size. I'm awaiting the newest 14B to check it out again.
>>
>>106508641
https://vocaroo.com/141Mzkn5YKGh
>>
>serious
Is there an open source vision LLM with an open license that's on par with Gemma? I can't find one that's on par with abliterated Gemma cause they are all STEM benchmaxxed.

I guess you can say I'm looking for the Nemo of vision, but sadly Pixtral doesn't cut it either.
>>
what happened to petra? i just saw the name somewhere and made me think of /lmg/ kek
>>
>>106509086
The official RWKV models will never be good because they are trained on EleutherAI-sourced open training data on a shoestring budget. It will take a commercial company to make something half-decent with this architecture.
>>
do we have any prompting guides for vibevoice? curious if there's any way to control it beyond just a simple script
>>
any way of using my 6700xt with my 9070xt for 28gbs of vram?
>>
>>106509719
yeah, why wouldnt there be? just plug them both in. they will run at half pcie bandwidth but that wont really make much of a difference
>>
>>106509086
>I'm awaiting the newest 14B to check it out again
Is this actually happening?
>>
File: 1608766070516.jpg (5 KB, 250x245)
5 KB
5 KB JPG
So when I use safetensors files am I basically running the model at FP16? Do I need to find GGUF files if i want to use say Q8/Q6?
>>
>>106509994
Yeah usually, but it depends on your mood.
>load_in_4bit=True
>>
>>106509994
yes
>>
>>106509994
this >>106510026 is worse
>>
>>106508436
An old Ampere A6000 takes about 25s for 20s of speech, so I'd hope so.
>>
>>106505432
download pinokio, there is already a webui fully working under community scripts. It works better than the comfyui version too lol
>>
>>106509820

>>106506232
>planned
>>
>>106506232
>>106510181
I've been doing some research. It sounds like they're training for 100 different languages with a meme vocabulary / tokenizer, that might be degrading their results. I also wonder if there's been enough hyperparameter testing with respect to the hidden state size. I also can't help but notice no one is using their last 14b model, is it just undertrained or is there a fundamental issue with scaling this architecture?
>>
>>106510226
As many anons said RWKV models have always been under performing memes, that's why no one uses them outside of MS for some reason, but then again MS gave us Phi so they're no strangers to weirdly useless models.
>>
>>106508142
Scamming VCs.
>>
>>106510238
I have been playing around with the 3B model and it gives me hope, I don't agree that it's a meme. Transformers will probably always be better if kv cache size is a non-factor on your machine but I think for slow boil chads RWKV might save local.
>>
>>106510254
>RWKV might save local.
That's been the sentiment of some since before llama even existed... See old comparison in picrel
>>
>>106509994
safetensors and GGUF are just file formats.
They can in principle both store arbitrary data, though ggml (the library providing GGUF) has a particular focus on quantization.
Rule of thumb: safetensors is for Python-based projects, GGUF is for projects using llama.cpp/ggml as the backend.
>>
https://www.reddit.com/r/LocalLLaMA/comments/1namz1q/hf_releases_3t_tokens_dataset_sourced_entirely_from_pdfs/
More high quality curated data to reduce our reliance on toxic webslop.
>>
>>106510315
We don't need "high-quality" data, we need data that is relevant for the models' primary end uses. Improved performance on synthetic benchmarks is a red herring and is not an indicator of quality anyway.
>>
>>106510315
Ignore the other guy, this is based. We need long context data like this so models will stop being retarded after the first 8k tokens
>>
>>106510315
Actual link:
https://huggingface.co/datasets/HuggingFaceFW/finepdfs
>>
>>106510347
Utterly pointless when the data of interest (conversational SFT data) is all short and the models are still mostly pretrained on 2k-8k tokens context anyway because of the quadratic costs of attention.
>>
>>106510348
>https://huggingface.co/datasets/HuggingFaceFW/finepdfs

>As we run out of web pages to process, the natural question has always been: what to do next?
Fuck right off, this wouldn't be a problem if you didn't filter 99% of it
>>
>>106510359
>the models are still mostly pretrained on 2k-8k tokens context
There's nothing stopping us from training the last ~3T tokens at a much higher context window, we just need someone to take the leap. It was recently discovered that labs are overspending on training and can lower their batch sizes without degradation in results, we just need a lab that has their priorities straight and actually cares about context length beyond needle in a haystack benchmarks
>>
>>106510393
>There's nothing stopping us from training the last ~3T tokens at a much higher context window,
Isn't that exactly the kind of thing they do already
>>
>>106510398
They do to some degree but that's why the pdfs are based, the more long context training the better
>>
>>106510393
Long-context performance is task-dependent. Pretty much all officially released models have received long-context training, but not with multi-turn conversations. In practice they just want to end the conversations after a few turns, because most existing conversational data is like that.
>>
>>106510342
And what kind of data would be relevant to ERP?
>>
>>106510426
Other than examples of ERP itself, lots of common-sense and/or obvious data that only exists in very diluted form in random web documents.
>>
>>106510418
A multi-turn conversation is just a flavor of text, just a presentation layer. Having long context capabilities is much more fundamental and needs to be done in pretraining. You could realistically train a base model to learn a conversation format in only a couple million tokens
>>
>>106510426
visual novels
>>
>>106510426
whatever they trained the first character.ai models on or the new ones which now successfully capture the spirit of their 2022 models
>>
>>106509035
Same here, I mostly fiddle with temp, though. Seems very sensitive to that. Lowered it to 0.92 helps.
>>
>>106510426
- different types of relationships and development there of
- physical range of motion
>>
What's the easiest way to get vibevoice running. The official repo needs some Docker bullshit. Has anyone made a .cpp version of their shitware that just werks
>>
best model for studying?
>deepseek
good enough for solving homework, needs very thorough prompting to tutor though
>kimi
has not been good for homework in my experience

i have not tried GLM 4.5 yet, gpt-oss-120b turns out to be the best at tutoring, maybe im prompting badly, either way help
>>
>>106510703
That's some of the information a LLM will probably only acquire after getting trained on several trillions of unfiltered random tokens, hopefully without getting averaged out in the various training batches. I still think that the way LLMs are usually pretrained is not conducive for learning this sort of stuff efficiently.
>>
>>106510238
>As many anons said RWKV models have always been under performing memes, that's why no one uses them outside of MS for some reason, but then again MS gave us Phi so they're no strangers to weirdly useless models.
you forgot bitnet
for some reason MS has a lot of copers who dream of a world where toasters can run models
maybe it's in the jeet blood
>>
>>106510781
i am not buying any more of your gpus jensen
>>
https://github.com/resemble-ai/chatterbox
https://xcancel.com/heysehajsingh/status/1963640592661188857
https://huggingface.co/ResembleAI/chatterbox
How does it compare to Microsoft's voice model?
>>
>>106510850
worse
>>
>she whispered, her voice barely above a whisper
>>
>>106507824
I am too retarded to understand the details, but I remember seeing some graphs ITT that proper finetune is better/more balanced that LoRas.
>>
>>106507824
iirc one of the reasons is how loras would interact with quanted models differently if applied on top instead of merged, and since there's so many different quant levels something about that as well as what others said
>>
>>106510426
discord chat logs
>>
>>106510977
yes daddy :uwu_32:
>>
>>106510367
>Fuck right off, this wouldn't be a problem if you didn't filter 99% of it
I find it funny that most companies tried their best to filter out anything explicit yet kept "erotica" syrupy romances (why we have shit like >>106510919 as the standard of chatbot/ai story writing) and random web page backend errors that shouldn't be in any dataset.

>>106510990
He's right anon, discord is what made cai so alive and different from most other models despite their model being outdated and dumb.
>>
>>106510367
it's insane they thought saying something like that was a good idea, those people really believe the internet is just reddit and twitter, my god...
>>
I know llama.cpp is pretty bad at parallel request, you need to set double the context length if you want n=2 parallel request and such.
How good is exllamav3+tabbyapi in that regard? Now it has tensor parallel support and it doesn't require 2,4,8 number of gpus to work, you can have a mix of vram sizes too. Is the parallel request similar to vllm? does it take up more vram to have more request at the same time?
>>
>>106510850
a bit worse but performance is real time with a mid-range gpu
>>
>>106509086
>The chief practical difference with transformers is an "infinite" lossy context. As the context grows, runtime performance won't degrade, but the memory of older tokens will gradually fade.
If it's infinite context, but it's lossy, then what's the point of it being infinite if it will eventually forget things just like transformers?
>>
>>106511189
>runtime performance won't degrade
>>
>>106511228
I too love fast retardation
>>
I too love 4chan
>>
>>106510630
Thanks. I think the problem I was having is having the node resample the audio down 24000 from what I saved on audacity. Saving straight to 24000 seems cut the artifacts a lot.
>>
>>106511430
Oh, good to know, never would've guessed. Thanks for the tip! What character are you trying to clone, btw?
>>
>>106511486
I am mostly testing still to get the best settings so a bit of everything, no real specifics.
>>
>>106511486
>>106511430
speaking of tips, i found that setting cfg to 3 and steps to 5 produces very accurate character impersonation. the comfy node i'm using had the cfg clamped at 2 for no good reason so I had to edit the script
>>
>>106511507
Hadn't even considered that. Thanks, I'll test that too.
>>
I understand that training is not cheap, but RWKV people really gimping themselves with small model sizes, performance is just not a bottleneck at that scale.
>>
>>106511507
Steps to 5?? That seems oddly low. My comfyui node had it at 10 and I often cranked it to something between 12 and 20. Bigger number more betterer :)
>>
>>106511581
You're absolutely right!
>>
>>106511581
I learned it from image gen where if you have a very strong checkpoint or lora you can do something like karras - gradient_estimation - 15 steps and it looks great instead of having that signature ai look you'd get with more steps and cfg. sdxl btw
>>
File: 1747951068990741.jpg (93 KB, 1216x684)
93 KB
93 KB JPG
>>106504274
So it's my understanding that the only thing Microsoft got rid of on their official repos were the Large wieghts but the code itself is largely unchanged. Is that correct? Or did they fuck with the code before re-release too?
>>
>bytedance/seed-oss-36b
Is this thing any good?
>>
>>106511189
>they reinvented context shift
>>
Has anyone tried using hermes 4? I'm getting {{user}} tokens at the natural ending point of the response but it doesn't actually end there. 2: How do we get it to use reasoning mode in local? I swapped over to generic llama 3 instruct + the enclosed system prompt but it's not doing anything.

Tested on 4.25bpw_exl3 on tabby / ST
>>
>>106510367
>Fuck
Using advertiser-unfriendly language like this is exactly why this whole domain gets filtered out.
>>
>>106511726
They removed all of the code entirely before putting the repo back. You can check for yourself.
>>
>>106511544
If anything it's the opposite. Instead of splitting their limited training compute across 5 sizes including 7B and 14B, they should just train one or two small models well, ideally with transformers equivalents so that they can show the architecture does not adversely affect output quality.
>>
File: 1750989943069498.png (162 KB, 761x680)
162 KB
162 KB PNG
so this is what the superintelligence team has been working on...
>>
>>106511910
llm 2.0 baby
>>
>>106511910
RAG is back, baby
cline and augmentcucks are seething
>>
>>106511910
>Rice University
>>
>>106511868
It's incredibly easy to use, a field day for scammers. I don't think they were concerned about some chud making lolita sex noises. Scammers and voice cloning is more of a real issue.
>>
>>106511910
>>106512007
kek, it's really written "rice university"
>>
>>106511910
yup, just use rag bro
>>
>>106512174
Maybe it was just its abilities in nsfw in general.
>>
Has anyone gotten VibeVoice 7B to generate a long script? I'm trying and it always ends early after 4-5 minutes.
>>
>>106510709
It doesn't need docker. Maybe learn how to read. You have two options:
>text inference python demo via command line (you can add your own voices and it reads a text file..)
or
>install CumragUI and use comfui-vibevoice node
>>
>>106512174
This isn't the first TTS model with voice cloning support. If that was the problem they wouldn't have left the 1.5B up and they would have had an internal ban on making anything with voice cloning capability in the first place.
>>
>>106512210
Never mind, I'm stupid.
>>
>>106512236
>CumragUI
wtf is that
>>
>>106512262
A schizo way of writing ComfyUI.
>>
SAAAAAAARS
https://huggingface.co/YannQi/R-4B
>>
>>106512279
sirs*
>>
>>106512236
both options are total dogshit because using Microsoft's webui or their script requires reloading the model and messing around with temporary files every time you do anything
and using cumshitUI was way slower than their original code for some reason (it only used 1 core and took more than 4x longer).
I'm gonna try and vibe code a better UI for it maybe, one that isn't a cancerous WEB UI.
>>
>>106512278
oh ok, anons always making finding stuff in the archives very easy I see
>>
>>106512307
>>106512307
>>106512307
>>
Just read their oneshot script and add a while loop that and a thing that reads new prompts. You can get an LLM do it for you. It's not that hard.
>>
>>106512285
You are like a spoiled little child.



[Advertise on 4chan]

Delete Post: [File Only] Style:
[Disable Mobile View / Use Desktop Site]

[Enable Mobile View / Use Mobile Site]

All trademarks and copyrights on this page are owned by their respective parties. Images uploaded are the responsibility of the Poster. Comments are owned by the Poster.