/lmg/ - a general dedicated to the discussion and development of local language models.Previous threads: >>106512307 & >>106504274►News>(09/05) Klear-46B-A2.5B released: https://hf.co/collections/Kwai-Klear/klear10-68ba61398a0a4eb392ec6ab1>(09/04) Kimi K2 update for agentic coding and 256K context: https://hf.co/moonshotai/Kimi-K2-Instruct-0905>(09/04) Tencent's HunyuanWorld-Voyager for virtual world generation: https://hf.co/tencent/HunyuanWorld-Voyager>(09/04) Google released a Gemma embedding model: https://hf.co/google/embeddinggemma-300m>(09/04) Chatterbox added better multilingual support: https://hf.co/ResembleAI/chatterbox►News Archive: https://rentry.org/lmg-news-archive►Glossary: https://rentry.org/lmg-glossary►Links: https://rentry.org/LocalModelsLinks►Official /lmg/ card: https://files.catbox.moe/cbclyf.png►Getting Startedhttps://rentry.org/lmg-lazy-getting-started-guidehttps://rentry.org/lmg-build-guideshttps://rentry.org/IsolatedLinuxWebServicehttps://rentry.org/recommended-modelshttps://rentry.org/samplers►Further Learninghttps://rentry.org/machine-learning-roadmaphttps://rentry.org/llm-traininghttps://rentry.org/LocalModelsPapers►BenchmarksLiveBench: https://livebench.aiProgramming: https://livecodebench.github.io/leaderboard.htmlCode Editing: https://aider.chat/docs/leaderboardsContext Length: https://github.com/adobe-research/NoLiMaGPUs: https://github.com/XiongjieDai/GPU-Benchmarks-on-LLM-Inference►ToolsAlpha Calculator: https://desmos.com/calculator/ffngla98ycGGUF VRAM Calculator: https://hf.co/spaces/NyxKrage/LLM-Model-VRAM-CalculatorSampler Visualizer: https://artefact2.github.io/llm-sampling►Text Gen. UI, Inference Engineshttps://github.com/lmg-anon/mikupadhttps://github.com/oobabooga/text-generation-webuihttps://github.com/LostRuins/koboldcpphttps://github.com/ggerganov/llama.cpphttps://github.com/theroyallab/tabbyAPIhttps://github.com/vllm-project/vllm
►Recent Highlights from the Previous Thread: >>106512307--Open vs closed LLM progress and dataset efficiency debates:>106512347 >106512375 >106512423 >106512595 >106512445 >106512517 >106512610 >106512693 >106513204 >106513240 >106513275 >106513359 >106513448 >106513451 >106513465 >106513591 >106513485 >106513499 >106513545 >106513567 >106513584 >106513595 >106513736 >106514154 >106513773 >106513803 >106513823 >106513864 >106513969 >106514038 >106514056 >106514094 >106514111 >106514143 >106514258 >106514291 >106514299 >106514449 >106514475 >106514486 >106514504 >106514556 >106514592 >106514608 >106514671 >106514710 >106514725 >106514740 >106514750 >106514765 >106514607 >106514622 >106514694 >106514888 >106514917 >106514932 >106514961 >106514979 >106515001 >106515057 >106514968 >106515015 >106515061 >106515105 >106515119 >106515139 >106515181--ERP model finetuning with AO3/Wattpad datasets:>106512933 >106513094 >106513181 >106513210 >106513219 >106513256 >106513670 >106513686 >106513740 >106514579 >106514614 >106514787 >106513222 >106513281 >106513313 >106513321 >106513397 >106514052--VibeVoice TTS voice cloning and conversational capabilities:>106515071 >106515193 >106515199 >106515236 >106515623 >106515246 >106515275--Dataset specialization vs diversity tradeoffs in model training efficiency:>106514377 >106514388 >106514457 >106514498--Memory limitations in transformers vs. potential SSM improvements:>106513859--VibeVoice ComfyUI integration issues and VRAM requirements:>106512932 >106513000 >106513126 >106513205 >106513235 >106513319 >106513338--Troubleshooting erratic Harmony 20B behavior in tavern interface:>106513566 >106513708 >106513756 >106513772--Improving VibeVoice long audio generation quality via CLI adjustments:>106514051 >106514086 >106514195 >106514437--Miku (free space):>106514325 >106514549 >106515574►Recent Highlight Posts from the Previous Thread: >>106512310Why?: >>102478518Enable Links: https://rentry.org/lmg-recap-script
>>106516362>https://voca.ro/1gZ6xankFzjPmany such cases
/vibevoice general/
All this training data debate is retarded.We just need to create a dataset. It will take 3 lifetimes. Think cathedral.We need a blueprint and a chunk of digital "land" and we can get going. Craftsmen will show up if the plan is ambitious enough to move the spirit of men.
>>106516395Whatever happened with higgs, anyway? I never got around to looking into it. What made vibevoice blow up comparitively? Other than everyone wanting it because it was taken away from them.
>>106516389just like with vibe-coding in general, we don't need it to one-shot a whole project, we can just do subroutines. Those can be as compact as 2 instructions.
>>106516409It's kinda decent and voice cloning just works™ by dragging a sample audio file in a folder.
>>106516395Keep it all here retard, the hype will die down after a couple weeks and /gee/ doesn't need another fucking ai gen thread to clog it up
>>106516368did she died?
>>106516250https://voca.ro/1ogB0LHKA0bU
>>106516147>gets pretty rough even with that when you try to do a 40 minute scriptWhat is your qualified opinion on GFG?
>>106516432I was not suggesting a partition, merely commenting on the current state of this thread
Who should I sample for the best mesugaki voice?
>>106516499>anon added label: Important
>>106516402if you aren't worried about compute cost the available datasets are already more then enough. just hit a 70b with common crawl and then use it to bootstrap an instruction dataset and then use that to refine it even further. the entire debate is because people are trying to do it with minimal data and minimal compute it will never really work.
Interesting, I wonder if the problem with the comfyui implementation lies in the step count. higher than 10 it becomes more and more inconsistent. Now tested between wildminder and diodiogod's TTS suite, the TTS suite is far more consistent and, while still has similar issues, keeping step count below 20 seems to have some form of stability.this test, at 10 steps, with a very background-noisy sample, honestly blows away everything i was just doing at 20-30 steps before.it's always something with this shit isn't it? lmg and ldg anti-comfy schizo vindicated!https://voca.ro/1aYkwUddVRDk
>>106516451>https://voca.ro/1ogB0LHKA0bUplease post the script and the ref audioFor research purposes, it is understood
https://x.com/TimfamousTTV/status/1964084712994951555
>>106516499https://www.youtube.com/watch?v=1w3o1VPzLuI
>>106516499Your bum mother.
>>106516577Too short.
>>106516450not yet, still being digested
>>106516561You can have the script, but you gotta go get your own audio.https://pastebin.com/wr3ASHkN
>>106516520For me, it's more about determining the minimum amount of data and the composition required to train an LLM from scratch to exhibit basic intelligence and common sense rather than creating a model useful in practice. Random web data isn't useful for that.
>>106516499(you can easily find the full asmr audio)https://www.youtube.com/watch?v=RIgDUDyei4g
What is the max file duration VibeVoice accepts?
>>106516561ms safety team worst nightmare (no it's not actual agi, it's porn)
>>106516574grim
Summer flood is over, I guess. Expectations for new era? Largestral 3? Llama 4.5? Will Gemma 4 be a lot less cucked and finally make to "notable"? Will DS make a comeback with R2 or have they hit the wall like everybody else? Will Kimi get a second era of dominance? Will Qwen finally go from notable to top? Will incremental updates continue or will we get a jump like with R1?Here are the counts of tops so far:- Meta: 3 eras- Mistral: 3 eras- DS: 2 eras- Cohere: 1 era- Kimi: 1 era
>>106516631how do you define that basic intelligence and common sense? have you tried training a model on TinyStories?
>>106516520compute will naturally get better and cheaper.The kind of data curation that we're interested in will not happen unless we're proactive and deliberate about it (and definitely won't happen in the open, even if a CAI type org does it)
>>106516451Large or 1.5?
>>106516675just chunk the files at sentence, paragraph or other grammatical markers and run them through in batches
>>106516707Google can't stop cucking the hardest, it's in their DNA to be ultra gay
>>106516707Can you call this period the Drummer's Slop Spam era?
>>106516719Large.
I just bought a A100 80gb pcie card off ebay. Had to dip into my emergency fund but I hope this is worth the pay off to run bigger models purely on the gpu
>>106516718if you think that compute is going to get cheaper and better then the discussion should be on training classifier models to generate the dataset. I had some luck training a few of my own but the compute cost to run my dataset through the classification network didn't pan out.
>>106516529Shit, I think you might be right. I was getting annoyed with it and starting to look into setting up the gradio one, but knocking it down to 10 steps from the 30-50 I'd been using actually seems to have improved the quality a lot.I'm gonna need to keep futzing with it to make sure I'm not just imagining the changes.
Even Gemini knows:"Google's original mission statement, which remains its official mission today, is: "To organize the world's information and make it universally accessible and useful". Based on its business practices, revenue model, and ethical criticisms, Google's operational mission is distinct from its stated mission. While the official mission represents its foundational purpose, the company's behavior reflects an overarching, inferred mission: to own and monetize the pathways to the world's information and digital activity through dominance in online advertising, search, and data collection."
>>106516705heh
>>106516751"Drummer's Slop Spam era?" If he pays me for it, I will.
>>106516806bark up their tree about it for me will ya? it's 100% something along those lines. I'm genning an entire audiobook at 10 steps and its far far far more stable, compared to earlier where most samples were speaking in hieroglyphics.
>>106516817edgelord model take 2154
>>106516757This quite possibly the stupidest thing I've read all week. 80 GB is nothing in the face of the MoE models that are the only ones worth running.
>>106516828Are you trying to push me to suicide right now?
>>106516837Do a flip, faggot.
>>106516707>retarded chart againI'd redo it myself, but I'll let you fish for attention a while longer
>>106516757>bought a A100 80gb pcie card off ebaygood>dip into my emergency fundit's a hobby, don't ruin yourself over it, use money you can waste
>>106516764so, sorting piles into smaller piles?do you think a human still needs to do the final selection of what makes it in?
>>106516718>compute will naturally get better and cheaper.I have doubts.
>>106516757I knew some of you were dumb, but...
Is there an extension or something for SillyTavern that can automatically record memories for a character?
>>106516842I mean, as long as an emergency doesn't happen before he can replenish it, he should be fine.
>>106516757>A100 80gb pcie card off ebay.Did you get a good deal at least? I'd pay $2k for one, tops, personally. There's a lot of 3d-printing shroud and fan bullshit that goes along with those DC cards.
>>106516757
How much money is an "emergency fund"?It's burger hours it has to be at least 100k given the absolute state of healthcare over there, right?
>>106516875As a rule, 6 months of living expeses.
>>106516757Should have cpu maxxed with 12 channel ddr5 plus a rtx 6000.
>>106516846if you want your dataset to actually be capable of pretraining a foundation model you will have to be happy with doing a few spot checks here and there. if you want to make a really high quality fine tuning dataset then you can review every sample but it gets pretty out of hand quickly.
>>106516879He wanted to run models purely on GPU. With that thing, he can run Air Q4 probably at like 50 t/s.
>>106516881What about with a 300 year time horizon?
>>106516861I paid a bit over $10000 once taxes are included. I didn't see any for under that really.
>>106516897
>>106516897I don't know how taxes worth with the listed prices in burgerland but that sounds like the price of a brand new rtx 6000 with more memory, similar bandwidth and more compute.
>>106516897>Imagine the amount of token you could waste on OR for $10K
>>106516897do people really
>>106516889Then just a single rtx 6000 would have been cheaper more vram and new.
>>106516897lol
>>106516878Yeah, regardless of the country, usually 5 to 12 months of living expenses depending on the job and how easy it is to replenish it.
>>106516757that's a big waste of moneyextra ram is better than extra vram now
>>106516897Depending on your jurisdiction you're still in the cooloff period, try to cancel.Search "Blackwell Pro 6000 96GB" and save yourself massive headaches and also a GPU that won't lose CUDA support nearly as quickly.
>>106516897ok this has to be a joke, a 6000 pro blackwell is literally this price, brand new
>>106516934He could save himself even more headaches by getting a 5090 and 128GB of ram.
>>106516757You still have 30 days to return lil bro.
>>106516946Nobody wants to run models at 1 token per second. Stop trying to convince others to waste money on a novelty paperweight just because you did.
>>106516875>given the absolute state of healthcare over thereYou guys do realize we've had health insurance since forever and since Obama you're legally required to get it or face fines. Even if you have to be on obamacare plans the worst deductible is like $8000 on the worst tier.
the most valuable use of time is figuring out our own way to cheaply manufacture hardware at home 3d printing but for chips nvidias and others markups if you actuall look at how much all the base materials and electricity would cost to make/acquire is in the tens of thousands of times if not millions in some cases the further added benefit would be no glowware or backdoors also we can also harass cudadev to write all the support
>>106516891impossible, the rate of revolution is too high right now. it is likely to stay too high until the collapse of the system as a whole.
>>106516897You're trolling.
>>106516964It's a lost cause, youtube/tiktok made it look like no one in the US had or can afford healthcare and hospitals are empty or something.
The VibeVoice Latent Space demons have sent a messagehttps://vocaroo.com/1edmqG0nl8gP
>>106516716Not TinyStories, but I've trained a few tiny models on small amounts of other data and I have a rough idea of what could work and what doesn't.For me the model should know how to engage in simple conversations, know what most common English words mean and how to use them, and show an understanding of cause and effect for mundane actions and events. Coding, math, trivia, reasoning can come after that.
its always the patrick star IQ motherfuckers with the money to just casually drop like thator equally the types to walk into a thread like this and lie about doing such a thing, either way funny
>>106516937true
>>106516990where'd you get this recording of me trying to get this gay model working in comfyui?
https://voca.ro/1kV0jby11ih6
>>106516937>>106516934>>106516955Okay I admit I didn't realize the newer arch cards were similarly priced. I fucked up. I'm going to try and cancel my order before I do anything else. I am fucking sweating buckets right now. Just to be clear the "Blackwell Pro 6000 96GB" you're referencing is the same as the NVIDIA RTX PRO 6000 Blakcwell with 96GB? Once the cancellation goes through i'll order the blackwell card.
>>106516998no I confirm, it's already the most retarded guys with most money to waste
>>106517010lmao>>106517017you got owned
>>106517017https://www.pny.com/nvidia-rtx-pro-6000-blackwell-max-qThere are a couple of versions with some minor tradeoffs.Good luck, I wish I could drop that much on a GPU
>>106516964>the worst deductible is like $8000 on the worst tierI pay 100€ per year for the upgraded healthcare plan and a surgery and two weeks in hospital cost me 0€.
>>106516897You really should've gotten an RTX Pro 6000 then.
>>106517041Bro, your hospital is filled with clueless immigrants. Your healthcare is pratically free because it's worthless
>>106516990kek
So is Meta really done with AI at this point?
>>106516499Aoi Yuuki
>>106517082meta is going on the rag
Do any of you guys use both LLMs and cloud providers? I've been experimenting with mistral-small, devstral and qwen3-coder for awhile now locally but also making use of gemini-cli and lumo now (free versions of both). Outside of when I need to ask very "personal" questions or such I find myself wondering if llm is even worth it anymore.
>>106516964I was under impression that obamacare is what made US healthcare so expensive to begin with.
>>106517082They could make a comeback at some point since they have a lot of GPUs but as right now they are done.
>>106517082Llama 4.X rumored.Meta is racing the clock to launch its newest Llama AI model this yearhttps://www.businessinsider.com/meta-superintelligence-lab-llama-4-new-model-launch-year-end-2025-8>Meta aims to release its next AI model, Llama 4.X, by year-end, according to two sources.>The release of Llama 4 models in April faced criticism for underperformance, some developers said.>A team within Meta Superintelligence Labs is now trying to fix bugs within Llama 4, sources say.[...]
>>106517135How tf did zuck let himself get this bamboozled? Its honestly one of the saddest arcs i've seen. Almost makes me feel bad for the guy
>>106517082
>>106517145>rushing out another slop of llama 4it's over
>>106517145>fix bugsOdd way to say "retrain with even more slop"
>>106517111You're confused about terminology, cloud providers also run LLMs.I don't want touch cloudshit with a ten-feet pole because I know inevitable enshitification rug pull will hit at exact moment I'll start getting used to and rely on it for daily life.Friends don't let friends use someone else's computers.
>>106517157How does Microshaft have 150k H100s and yet not a single model to their name and rely on OpenAI and Claude for Copilot?
>>106517172WizardLM wasn't safe enough
how many t/s should i be getting on a 5_k_xl quant of GLM 4.5 full with 2 3090s and 128GB of RAM? what setting should i be using?
>>106517135>>106517157Having the most GPUs does fuck all when you only use 5% of them.>>106517172Microsoft focuses on Azure as a service. They're more than happy to let others develop models and make them available through their service.
>>106517172Microsoft are uselessIf you didn't think that before, you definitely should after seeing that chartOrganizationally they are pure lazy, rentseeking trash
>>1065171491. He listened to safetyists instead going full Musk 2.0 in changing political climate2. Meta kept repeating the same scheme while all other companies were innovating3. 5% GPU utilization is no joke. 600k H100s work with efficiency of 30k4. He put a scamming jeet in charge of his AI team(llama 4 to the moon saars!)5. He put an actual grifter who was selling gptslop as human data(Wang) in charge of his AI team
>>1065171971 toks/s if your lucky
>>106517172Look at the state of windows
>>106517206really? because i have been getting around 3t/s with some mostly random settings on oobabooga.
>>106517197Depends if you're on Windows or Linux. I'd guess 2-4t/s and 4-6t/s for the former and latter with the correct set-up.
>>106517226i am on linux.
>>106517204that's what you get for filling your company with jeets
>>106517228Make sure to fiddle with -ot to put all layers on one GPU and fit as many MoE experts on both GPUs, it'll take some trial and error.
Why do none of you guys finetune your own models for 4chan usage?
>>106517205I think the political climate favors safetyists. There's a moral panic about models telling people to kill themselves or pesos using it to make "CSAM"
>>106517250What usecase?
>>106516994>For me the model should know how to engage in simple conversations, know what most common English words mean and how to use them, and show an understanding of cause and effect for mundane actions and events.what size model are you targeting and how much data do you feel is necessary?
>>106517255automated reply bots to keep me company at night
>>106517204>Microsoft are uselessAnon, they just 'released' top tier TTS model.Historically MS always had good R&D but completely knee-capped by management.My favorite video on this subject:https://youtu.be/vOvQCPLkPt4?t=158>13 years ago>casually drop a 1ms latency (1000Hz) touch display prototype>absolutely nothing comes out of it
>>106517250just go get gpt4chan tune
>>106517263I said "organizationally" on purpose.Good humans buried by bullshit
>>106517270>trained on /pol/ exclusivelyI think i'll pass
>>106517242got it to 4.3t/s average on a 1700 token output. good enough i guess
>>106517303Also if you're on an Intel CPU with P and E cores, use taskset to force it to run only on P cores, should be a decent performance increase.
>>106517350i have an AMD 3950x
>>106517359Upgrade to EYPC
>>106517252The moral panic is the same since 2010, we just moved from : >smartphone made x kill himself>smartphone was used for csamto>social media made x kill himself>social media was used for csamand now>ai made x kill himself>ai was used for csamHonestly nothing much changed, though the new generation is clearly mentally raped by this constant panic.
Based on what has happened on 4chan lately I think it will flood the internet through steg
>>106517252>I think the political climate favors safetyists.Sure, and listening to them is retarded and suicidal.You can do like the Chinese, just lip service to "safety" for journalistic PR while internally not caring that much.
If I'm asking any questions related to coding in any way shape or form should I be sticking to coder models? Is there really any harm in me using say mistral-small instead of devstral?
>>106517459I'm using gpt-oss and mcloving it.
Give it to me straight, how bad are my results?
>>106516368Is it bad that I kind of wish I had friends like you guys in real life? AI and nerdy technology-based shit in general is the only hobby I have (except a gaming I guess would even then I only played relatively niche games like no man's sky). It's bad enough I'm a bit of an autist in all ways except having an official diagnosis but not having a popular hobby makes the loneliness worse
>>106517499I have a dumb question but what are the numbers after pp (prompt processing) and tg (token generation)? Is it number of context tokens?
>>106517499Dunno but for reference I get 6 tk/s on q8 qwen 3 coder 30b
>>106517545>is it bad that I wish I had IRL friends who share my interestsNo anon, that's not bad. I know how you feel. I'm pushing 30 and have no idea how to make friends IRL. I have a bunch of online friends that live too far away to visit.
>>106517499Quite bad.
So i downloaded that sweet batch of jai cards, but I notice when I try to 'view all' cards it doesn't load anything more. Does this happen to anyone else? Do I need to troubleshoot the index.html code or could it be another issue? On Arch, if that explains it
https://commonvoice.mozilla.org/en/datasetsVoice bros, look at this nice dataset that I found. Lots of clean and annotated voices for free. You can even pick an accent and age! It even has tranny voices!
>>106517551pp => number of tokens to process for the benchmarktg => number of tokens the model will generateSo yeah pp is basically your context size and showing you how well it processes them at various lengths for the test. Mind you that's not the limit of what the model can do which is separate.
>>106517582So tg256 means generating 256 tokens? Is that on an empty context?
>>106517575Can you show off Openhands-lm-32b at full weights or qwen3-coder-30b-a3b at full weights both should fit i believe in that card.
gptsovits is barely 600M (4 small models) finetuned and I think it's still superior to vibevoice 7B for pratical usage. Love to talk to my LLM with sub 1s latency https://vocaroo.com/1KC9TsZqSLagUnfortunately I don't see it improve much more than that. The dev is lazy and between V2 to V2proplus only the vits part got slighlty better (the gpt got worse and I had to keep the V2 one). The original segmentation method was dogshit too so I added my own.
>>106517593I have Q8 already downloaded. Compute is the bottleneck at 3B.
>>106517545I went to a friend's party the other day. Old friend who I went to highschool and uni with. He introduces me to one of his gun buddies and we don't really hit it off until he brings up the Visa/Mastercard situation, which I have been following. Turns out the dude has been calling them in protest too, and it also turns out he uses mass bot hosted emails to troll people in protest for things prior. He even introduced me to cock.li. I never realized he was that based until we spoke about PC stuff. What I'm saying is you'll find some good people out there, don't go revealing your power level like an autist but have tact in bringing up tech, you'll be surprised the allies and people you can learn from around the corner.
>>106517591If I understand llama-bench correctly I believe it does keep the tests independent of each other so tg256 would be on an empty context.
>>106517616>He even introduced me to cock.liIf you don't already have dozens of cock.li accounts from before the invites you aren't very based yourself.
>>106517602>>106517575shiet man, that's pretty sick. what do you normally use it for? also is it rented or owned?
>>106517630true, forever a newfag, but better late than never.
>>106517616i told my manager that i have an ai girlfriendnow i'm getting weird treatment
>>106517499Why is openhands 32b so fast and qwen 30b so slow?It doesn't make sense.
>>106517545>>106517616lets all meet up and have a massive fucking orgyor admit that people are generally unreliable in general who will constantly and always disappoint.> case and point - mistral 3 large
Does anyone have a proper working setup for K2-0905? No matter what I try with this thing, it writes like complete shit.
is there any websites to share datasets, specially for voice models?please don't say huggingface, I've already gotten banned there twiceAnyways have some voice samples:https://litter.catbox.moe/yv97n6w894ktxft8.7z
>>106517575>>106517602what the hell? i get like 35t/s on my dual 5090 setup with that model. what am i doing wrong?
>>106517886Modelscope should be usable
>>106517886What the fuck kind of shit were you uploading to get banned TWICE? I found data sets for AI models that were full of porn in those are just fine.
>>106517717Stop using OR mystery meat models
>>106517956Personally I got banned once for hosting civitai banned loras
>>106517886how do i do vidrel?
>>106517963Next step?
>>106517956>./Misaka>Japanese audio How would you even use that provide voice? I thought this was primarily trained on English audio and text t. Have yet to actually use it
>>106517956>What the fuck kind of shit were you uploading to get banned TWICE?1st time was for a decrypted assembly dataset, 2nd time was for uploading a rip from Shotdex>I found data sets for AI models that were full of porn in those are just fine.Yeah, there is a lot of wild stuff they don't care about like https://huggingface.co/datasets/mirav/gurobooru but they seem to hate anything that could get them in trouble for copyright
>>106518002Meant for >>106517886
PLaMo 2 Technical Reporthttps://arxiv.org/abs/2509.04897>In this report, we introduce PLaMo 2, a series of Japanese-focused large language models featuring a hybrid Samba-based architecture that transitions to full attention via continual pre-training to support 32K token contexts. Training leverages extensive synthetic corpora to overcome data scarcity, while computational efficiency is achieved through weight reuse and structured pruning. This efficient pruning methodology produces an 8B model that achieves performance comparable to our previous 100B model. Post-training further refines the models using a pipeline of supervised fine-tuning (SFT) and direct preference optimization (DPO), enhanced by synthetic Japanese instruction data and model merging techniques. Optimized for inference using vLLM and quantization with minimal accuracy loss, the PLaMo 2 models achieve state-of-the-art results on Japanese benchmarks, outperforming similarly-sized open models in instruction-following, language fluency, and Japanese-specific knowledge.https://huggingface.co/pfnetSo plamo 2 was released like a month ago but they reference a 2.1 in the paper that got a really good JPN to EN tl score. Not uploaded yet though. Anyway posting for that one anon in case he still reads the thread
>>106518002not the anon, but voice datasets will be really useful when pic related happens, not a bad idea to save stuff
>>106517276For real. Deep blue guy has been at ms for like 2 decades
Recomposer: Event-roll-guided generative audio editinghttps://arxiv.org/abs/2509.05256>Editing complex real-world sound scenes is difficult because individual sound sources overlap in time. Generative models can fill-in missing or corrupted details based on their strong prior understanding of the data domain. We present a system for editing individual sound events within complex scenes able to delete, insert, and enhance individual sound events based on textual edit descriptions (e.g., ``enhance Door'') and a graphical representation of the event timing derived from an ``event roll'' transcription. We present an encoder-decoder transformer working on SoundStream representations, trained on synthetic (input, desired output) audio example pairs formed by adding isolated sound events to dense, real-world backgrounds. Evaluation reveals the importance of each part of the edit descriptions -- action, class, timing. Our work demonstrates ``recomposition'' is an important and practical application.https://storage.googleapis.com/recomposer/index.htmlSamples. from deepmind. pretty neat and would be useful now that everyone is messing around with editing audio files
>>106518054this is the one thing that will oneshot me, the second LLMs can moan, it's over
why is Rocinante 12B popular given it's 12B size? Isn't it brain damaged?
>>106518179same, ayahuasca didn't oneshot me, but I fear an LLM that has the intelligence and architecture to replicate the voices and personalities of my favorite characters will
the scent of detergent and ozonethe smell of green tea mixed with ozonethe odd mixture of strawberries, jasmine tea and ozonegod I hate gemini-slopped models
>>106518271It's local can't you just ban those? You can't do that with Gemini proper, although telling it not to use them works sometimes
>>106517581>look at this nice dataset that I found. Lots of clean and annotated voices>https://litter.catbox.moe/y6gwfuaneeni1cmt.mp3
LatticeWorld: A Multimodal Large Language Model-Empowered Framework for Interactive Complex World Generationhttps://arxiv.org/abs/2509.05263>Recent research has been increasingly focusing on developing 3D world models that simulate complex real-world scenarios. World models have found broad applications across various domains, including embodied AI, autonomous driving, entertainment, etc. A more realistic simulation with accurate physics will effectively narrow the sim-to-real gap and allow us to gather rich information about the real world conveniently. While traditional manual modeling has enabled the creation of virtual 3D scenes, modern approaches have leveraged advanced machine learning algorithms for 3D world generation, with most recent advances focusing on generative methods that can create virtual worlds based on user instructions. This work explores such a research direction by proposing LatticeWorld, a simple yet effective 3D world generation framework that streamlines the industrial production pipeline of 3D environments. LatticeWorld leverages lightweight LLMs (LLaMA-2-7B) alongside the industry-grade rendering engine (e.g., Unreal Engine 5) to generate a dynamic environment. Our proposed framework accepts textual descriptions and visual instructions as multimodal inputs and creates large-scale 3D interactive worlds with dynamic agents, featuring competitive multi-agent interaction, high-fidelity physics simulation, and real-time rendering. We conduct comprehensive experiments to evaluate LatticeWorld, showing that it achieves superior accuracy in scene layout generation and visual fidelity. Moreover, LatticeWorld achieves over a increase in industrial production efficiency while maintaining high creative quality compared with traditional manual production methods.https://www.youtube.com/watch?v=8VWZXpERR18From NetEase. Samples made with a finetuned LLaMA2 7B and UE5. Pretty cool especially in time saved compared to a human
>>106518282common alien tongue
>>106516529Thanks for this. I tried CLI inferencing when it came out, but it was ass. I gave the wildminder one a try since everyone here is getting good results and it was also ass. The diogod suite is actually getting the voice correct, same settings same sample for everything. >I'm the guy that occasionally shills FishAudio here because that's the only other thing that has gotten this specific character voice right for me so far.
>>106517082Zuck is betting everything on Wang and his team of OpenAi avengers
>>106518444Just like metaverse, zuck won't let us down
Has layer skip been proven to be a meme?I'm wondering how much worse the performance of a model would be if we were to skip some of the model's layers (probably the last few) when generating some of the tokens.So you generate 2 tokens with all the layers, then 1 token with just 2/3 of the layers or whatever.I suppose the same could be achieved with two models, like using a big model to generate half of the tokens and a small one to generate the other half or the like, but then you'd have to do PP twice, keep two KV caches, etc.I wonder how that approach would compare to current MoEs in both speed and "accuracy".
>>106518498My guess is it's hard to train especially combined with other current memes like MoE.
>>106518498Youve read the paper right!https://arxiv.org/abs/2404.16710
>>106518482Did she really name her cat "cumshot"? Why are women like this?
>>106518282>https://litter.catbox.moe/y6gwfuaneeni1cmt.mp3
So I have lots of old GPUs laying around. I'm thinking of using the biggest model on my best GPU out of the extras that can fit it with an okay context (4k or 8k). Then i'm thinking of a second GPU that I just run small models on for simple tasks like simple completions and shit, just so those are fast.Does it make sense or should I just rely all on the big model?
>>106518614>okay context (4k or 8k)2023 called
vibevoice has some serious potential, but is way too RNGa shame they pulled the plug on releasing training code
>>106516757Bad bait, why wouldnt you just lay for API at this point
>>106518054It just now occurred to me that people could use a special pipeline where they prompt in LLM, their waifu or sona or whatever responds, and then VibeVoice can "respond" to you in "their voice". A current side project of mine was trying to see if I could find two models in order to talk like specific fictional characters. I guess I should get back onto that now.
>>106517577Is that some local archive viewer?
>>106518255>popularshilled
>>106518661They released the paper and you can vibecode your own training code
>>106518839yeahhttps://huggingface.co/datasets/AUTOMATIC/jaicardsIt's 190k cards but I'm struggling with accessing beyond the initial 10k.