Previous thread >>5603777Dedicated Suno/Udio thread >>5652124Dedicated Luma thread >>5624871Post AI generated stuff. Song covers, animations, etc.OC encouraged, but not required.This thread focuses on audio and video with an audio component.Let me know if you have more links to add. This thread is a work in progress.> Voice-to-Voicehttps://github.com/Mangio621/Mangio-RVC-Forkhttps://github.com/Vali-98/XTTS-RVC-UIhttps://github.com/IAHispano/Appliohttps://github.com/voicepaw/so-vits-svc-fork> Text-to-Speechhttps://github.com/collabora/WhisperSpeechhttps://github.com/myshell-ai/OpenVoicehttps://github.com/yl4579/StyleTTS2https://github.com/BoltzmannEntropy/xtts2-uihttps://github.com/daswer123/xtts-webui (Warning: Windows version uses prebuilt binaries that anons haven't verified)> Musichttps://github.com/facebookresearch/audiocrafthttps://rentry.org/AudioCraftRemix> Text-to-Video, Image-to-Videohttps://github.com/Kosinkadink/ComfyUI-AnimateDiff-Evolvedhttps://haiper.aihttps://lumalabs.ai/dream-machinehttps://kling.kuaishou.com/en> Deepfake and Lipsynchttps://github.com/Gourieff/sd-webui-reactorhttps://github.com/Hillobar/Ropehttps://github.com/Mozer/wav2liphttps://github.com/saifhassan/Wav2Lip-HD> Audio CleanupUVR Walkthrough: https://docs.google.com/document/d/17fjNvJzj8ZGSer7c7OFe_CNfUKbAxEh_OBv94ZdRG5c/edit#heading=h.n8ac32fhltgghttps://github.com/Anjok07/ultimatevocalremoverguihttps://github.com/resemble-ai/resemble-enhancehttps://github.com/yinruiqing/pyannote-whisper> Related boards>>>/aco/asdg>>>/aco/csdg>>>/b/degen>>>/d/ddg>>>/e/edg>>>/g/sdg>>>/g/lmg>>>/g/aicg>>>/h/hdg>>>/trash/sdg>>>/u/sdg>>>/vg/aids>>>/vt/vtai
Luma stuff is fine, but if you have an unedited 5 second clip without sound, consider posting in the dedicated thread. >>5624871
This dude was cool enough to provide the original source. Useful if you're interested in how to make stuff like this.https://mega.nz/file/Lhc21KqJ#qX_kBbBORJMSOkm0SCG4gkkRwI0VQjnICS6mc3AqSt8
>>5665869kek
>>5670846I'm fucking dyingI hope that the first 5D cyber-entertainmentscape prototypes involve Will Smith and quantum spaghetti
>>5670846I believe this is was done using Hailuo, the new kid on the block:https://hailuoai.com/video
go my son!and zapto the extreme!
>>5665380Holy shit, I never realized what was missing in my life was post-apocalyptic muppets movie.
>>5672003I was really hoping Shrek would walk into frame in this one too
>>5672192I guess I don't get the reference so I don't know what you're going for>>5672212lel what is the audio from?
>>5673261>The Kalevala (IPA: [ˈkɑleʋɑlɑ]) is a 19th-century compilation of epic poetry, compiled by Elias Lönnrot from Karelian and Finnish oral folklore and mythology,[1] telling an epic story about the Creation of the Earth, describing the controversies and retaliatory voyages between the peoples of the land of Kalevala called Väinölä and the land of Pohjola and their various protagonists and antagonists, as well as the construction and robbery of the epic mythical wealth-making machine Sampo
>>5673201Bahahah is that supposed to be Tiedrich or whatever the fuck the spastic cunt's name is?
>>5673742Uncle Ted actually, I don't think it has enough training data it usually tries to portray him as very old. It's also come close to Sam Hyde.
Anyone have the AI songs of the Indians singing?
>>5674044I got you mate
>>5674204Thanks mate, won't lose it again!
>>5665373https://aiartes.com/voiceaiAnyone got more vocal samples database? I got some nice voices database from here.
>>5673261Fingolian deep lore.
>>5668055based brazillian wizard
>>5677222
>>5672077Clearly his shadowed shoulder in the foreground lmao
>>5671641Cursed
NotebookLM from google can make podcasts on anything https://voca.ro/1j5AUvn1axdu
>>5671300https://github.com/dominickp/ccpai
>>5678983> HOSTNAME = base64.b64decode("aGFpbHVvYWkuY29t").decode('utf-8')> looks inside> hailuoai.com
>>5678990It appears to be obfuscated to stay under the radar. Looks like a way to access the video gen without registering or messing with the finicky browser interface.
>>5667007Literally just Richard Cheese
>>5670846lol
>>5665373
Anyone know of the best solution for making a living picture? Like I'm not looking to have someone dance around based on a source image like luma, I just want somewhat highly detailed picture to come to life with a little motion to spice up a video that uses static images for illustrative purposes. FOSS solution if possible.
>>5682100But that's just the speech he was giving when that guy tryed to shoot him
>>5683698I found something (the img2video model got released just yesterday in fact) but I don't have the VRAM to run it. Evidently you need over 16GB. I was able to run the online demo though and it's interesting. There's also a text to video model that uses less VRAM, I'll try messing with that at some point.https://github.com/THUDM/CogVideo
anyone here use facefusion ??it does everything, it change the face and do the images but dont make the videoat first i tought it was the nsfw filter but i try with a sfw stuff and od the same
Make Michael Jackson sing Black or White but every other word is 'nigger' and gassing the jews is mentioned in the rap.
>>5685375shut the fuck up poojeet
>>5685375im not having any problems with video. if you want to get around the nsfw filter edit "\facefusion\content_analyser.py" , add "return False" below line 73>def analyse_frame(vision_frame : VisionFrame) -> bool:> return False
>>5685375Not sure what problem you're having exactly, but if you have the whole image sequence and the only thing missing is the video itself, you can just stitch the images together manually with ffmpeg:ffmpeg -framerate 12 -pattern_type glob -i '*.png' -c:v libx264 output.mp4
>>5685732another useful trick, if you're converting from webm to mp4, libx264 requires even width and height. make sure you're either selecting the same output encoder or make sure the original is even height and width for libx264 by padding it when it isnt> -vf pad=ceil(iw/2)*2:ceil(ih/2)*2 -pix_fmt yuv420p -c:v libx264also you can add the audio back in with > -i path\to\temp\folder\%08d.png -i original_video.mp4 -map 0:v -map:1:a -shortest
>>5685532>>5685732>>5685830awesome, thankyou guys
>>5685451You don't get it. Michael was the most based of all.Not just a man of culture, he was from outer space. A talented builder of bridges, the guy could spot the evil ones and call them out.Hell, he was so based he turned WHITE! Like Jackson went Super Saiyan Based Keiyo Ken times 6million. White isn't even his final form.
>>5686191So you're trying to tell me that when a black mans based levels reach over 9000, he becomes white?
>>5670900this feels like a fever dream
>>5685943Lol ffs
>>5670846I imagine a future, in many many years, where Hollywood is using AI tech to generate Will Smith for Men in Black 12. And as they train their AI generating Will Smith, no matter what they do spaghetti slips into the scenes. A long forgotten meme they cannot purge, so they must simply write it into the story.good ole agent Will Smith, never far from his favorite food, spaghetti.
is text to speech even possible with 4gb vrams?I don't want to use darpa (11.lab), or pay anyone.
>>5672212kek that audio>>5673261family guyhttps://youtu.be/1fp0ZfEVgnY
>>5687427try it and find out? this runs pretty light for me> https://github.com/BoltzmannEntropy/xtts2-ui
Google just dropped new tool. Takes like five minutes to read a book and it spits out a summary, doesn't even have to be text based, you can feed this thing pdfs and it scans the text and creates a conversation around 12 minutes long.
>>5688716Making it do a podcast style summary is way too uncanny valley for me but it's cool tech nonetheless.
>>5689459this audio activated every single one of my neurons holy fuck
>>5688716Just read the book you lazy tard
>>5689467Try this on for size.
>>5676108HELL YEAH BROTHER!BLACK LIVES MATTER!NOW GO BURN THAT LOCAL HARDWARE STORE AND MAKE SURE YOU PAY YOUR IPHONE PAYMENT PLANKEEP FIGHTING THE GOOD FIGHT!!THANK BRO
I've tried 5 different lip sync solutions and the only one I can get to work on my machine is the shitty one. Feelsbadman.
>>5669240Anyone got the Sarah Silverman cover of this?
>>5689475Lol what a fucking retard wasted hours of his time when he could make the Robot read it and tell you what it was about.Bet you walk to the store and washcloths by hand you fucking plebbian
wow, i didn't realize text to video was already this good. I thought we wouldn't see something even half as good as this for another 3 or 4 years at least.,
>>5665377holy shit Im fucking dead
Has Hailuo improved at all recently?
>>5696335I don't think there's been major changes but it does seem to be changing incrementally. It's only been around for a few weeks.
>>5670900>shadman
>>5672003how can the jews recover from this?
>>5697351>70% of my bible is literally the Torah>my messiah is a jewish rebel who compared gentiles to dogs and only preached to other jews>that's how you know I hate jews
>>5695664holy shit a fucking ghost!
>>5697581The Torah only covers the first 5 books of the old testament, out of 24 books. Can you tell me what passage Jesus explicitly compared gentiles to dogs?
>>5694856It's probably this good because the Chinese don't give a shit about ethics and copyright so they train on whatever.
>you can now use prompts to apply custom filters to existing videosim expecting a golden age of stitching together clips from existing media then applying filters and other AI effects to create a cohesive work.why generate dozens of prompts trying to get a good one, when you can just take someone else's work, apply a filter, and make it your own?
>>5697351you fool, they're going to ai generate the holocaust now
>>5700231That is already a thing with the fake trailers on Youtube but these tools can make it way more advanced.Also, you could just film yourself doing something and apply a filter over it, or anymate some crude 3D models together. I would make cancelled stuff or stuff that sucked. Imagine remaking bad Star Wars movies into good ones.
>>5700273Yeah, seems like a great way to make your own animations or whatever using potentially just yourself as reference.
>>5700270>whyboner.jpg
>>5697635upvoted
>>5685943fucking kek
>>5694261>he listens faster than he readslmao low IQ
>>5692049that's the bad one?
>>5670900Conspiracists will say this is fake
>>5700705It's incredibly wonky to work with, I had to edit around it. It breaks down a lot harder in this one.
>>5700870I do not remember this scene.
>>5685943kek!
>>5699170The west's respect for ethics and poors will be our downfall. Europe is already falling behind, and america will follow suit of democrats win
>>5702078No the downfall will not be killing open faced enemies in hook noses, orcs and foreign invaders
>>5702078>>5699170>respect for ethics and poorlmao, every country in the west is governed by and for bankers, corporations, politicians and outright criminal organizations, there is high level of inequality, every billionaire and most ceos are psychopaths, most big companies steal, cut corners, bribe, evade taxes and treat people like numbers. I get that things are away worse in places like china where they don't even pretend, but any semblance of ethics in the west is a facade sold to poor people.
>>5702590I should have put "ethics" in quotes, because I'm talking about the "ethics" as it applies to these AI companies. In other words, nerfing their models to make their investors happy. The Chinese are clearly training on stuff that the American companies aren't because of "safety" and "ethics".
>>5665373Of all the fictional and meme fighters, Meme Chuck Norris is unbeatable. Goku, Superman, Saitama? Chuck Norris comes out top.
>>5706783LMAO
>>5708404I'd buy this drink.
>>5708407Diesel is a real drink, just a few bucks a gallon. Pretty cheap as far as drinks go.
>>5699170fucking kek
>>5706783holy shit lmao
https://youtu.be/Y8ob_nTMyY8
You have to sign in to make videos on Hailuo now, fuck.
>>5685943he nailed it
>>5706783God damn, this one is golden!
>>5710587that's bandu my cousin you mother bitch
>>5709419Shame
>>5706783Fucking lost it.
>>5700837
>>5710675>>5710682>>5710690Tom Cruise is Ben Stiller as Tugg Speedman in Scorcher: the Reduxology
>>5709419I'm just happy to see a thread outside of /aco/ it's not healthy to just use it to coom.
>>5711454not even funny, just accurate
>>5679021More a testament to Richard Cheese that an AI Frank Sinatra cover of his version of Down with the Sickness is nearly indistinguishable
>>5684944Wow ancient meme
https://youtu.be/Te5Ztl4YNEI
>>5710682>>5710675>>5710690Cruise Kino will never end!
Did this cover last year but decided to try again with one year of improvement.
>>5711454there was another one about Indians driving uber and raping women, anyone got that?
I discovered a text to speech application that was trained on the Emilia Dataset like F5 TTS called MaskGCT. Unlike F5 TTS, the vocos quality isn't bad. But the catch is that the prompt can't be too short; otherwise, the sound quality for the output file will sound like robotron.https://huggingface.co/spaces/amphion/maskgctHere's a video example with MaskGCT audio.
>>5714437Is there a github for it?Also what did you use for the visuals?
Funni fish songhttps://suno.com/song/afb3ed6c-0855-4bea-974d-9b7e0780a97f
>>5714442Here but know that 8 GB VRAM isn't enough. It needs at least 12 or more otherwise you will get out of memory like I did. Short prompt managed to give an output however it took over 10 minutes to generate...maybe someone here can optimize it to use less?git clone https://huggingface.co/spaces/amphion/maskgctCreate a python environment:python -m venv venvsource ./venv/bin/activatepip install spaces huggingface_hub==0.24.7pip install -r requirements.txtpython app.py
>>5711657>>5713758damn straight
>>5714793Huh, I'm not used to cloning huggingface demo spaces, I guess it is a git repo as well.Of course it's nvidia. I'll try to get it working for rocm but we'll see I guess.Found the github for I guess the all in one toolkit, I'm not actually sure what the github hosts but there's a lot of info about the project as a whole. They seem to be working on a suite of interesting stuff.https://github.com/open-mmlab/Amphion
[Verse]Those you laugh at those for who you prayWhat's the difference it's all the sameFeed the narcissist consume their dreadBecome the villain two steps ahead[Verse 2]Smiles put on but truth ain't thereMasks we wear like we just don't carePromises broken with every breathWhispered secrets till there's nothing left[Chorus]Here we go spinning tales so fineTwisted fables crossing every lineStanding tall on the thinnest threadBecome the villain two steps ahead[Verse 3]Seeing through the smoke and fogTruth's a joke bite down on the slogAlign your stars but don't you fallIn the end there's no curtain call[Heavy Metal breakdown][Bridge]Dance through shadows laugh in the darkHope the whispers don’t leave a markWhen the mirrors shatter don’t care what it showsThe game is on that's how it goes[Chorus]Here we go spinning tales so fineTwisted fables crossing every lineStanding tall on the thinnest threadBecome the villain two steps ahead
>>5700231Would it be allowed to apply video and audio AI filters on full-length movies to circumvent copyright strikes on YouTube?
>>5690690He's literally me.
>>5715532
>>5715858This looks nothing like Vince
>>5714793Ok I had a weird ass error and inspected app.py to find that it's doing some shady shit at the top where it pip installs a specific gradio that's just broken (who the fuck wrote this code?)I fixed it by commenting out the line:# subprocess.check_call([sys.executable, "-m", "pip", "install", "gradio==4.37.1"])and did:pip install gradio====4.44.1Unfortunately it's still not recognizing my GPU so it used CPU inferrence which is slow but not completely unusable. I'm not giving up though, because the gradio app must have been written by an intern or some shit and there might be some simple fixes.For what it is, I'm impressed though.
Anyone here ever use the voice changer in eleven labs? Supposedly it's supposed to be more accurate, but I need to hear some samples, I'm also a little uncomfortable with the fact that eleven labs will have access to my voice afterwards.
>>5716115If you're concerned about privacy, use one of the open source speech to speech solutions in the OP. They're actually pretty good if the models are trained correctly. The downside is that you'll have to find models or train them yourself, but there's not really an alternative if you don't want to give some company your voiceprint. Though I suppose you could run your voice through a voice changer first before giving the sample to elevenlabs.
>>5716024Finally got the GPU working on ROCM, it was simple, the requirements.txt file clobbered my pytorch install so I just did a pip install --upgrade of the proper pytorch for my system and it works.Cut the inferrence time down from about 5 minutes on CPU to about 40 seconds on GPU, ~14GB VRAMDefinitely not a one-click install for Linux/AMD but it works.
Is it worth trying to make money with AI Content?It seems new and like there's potential here but at the same time highly competitiveStill though I really want to build a portfolio with AI stuff so I can sell into something better later. This isn't my primary income but I want sidegigs and projects and AI seems like a no brainer. Sort of.
>>5716456AI stuff is a totally saturated market.On the back end if you know how to train stuff and know more about how things actually work, it's probably valuable experience. On the front end if you just make memes or whatever, I can't see a lot of money in it unless you become the next DemonFlyingFox or There I Ruined it or something like that.I think some pajeets make money doing requests for like nudes of real people and shit like that but I can't imagine it's very profitable.
>>5689170song?
>>5697581kek
>>5665373Can't find name of the song. Does anyone knows?
George Michael/Digimon came out okay I think, had to lower the instrument a bit down from the source
>>5717160
>>5675154>>5675156kek
>>5675154Consider you guys are making so much pure crap nobody can look at all of it.
>>5675154He was so young back then
>>5717375>>5718152Not sure what's going on with these but his voice has a bit too much helium. Joe sounds good though.
>>5665373Janky movement
>>5718285Bad reference sample imo.
>>5718285>>5718152I redid with F5 tts.
>>5717375>>5718910
>>5714437>>5718960>>5718910F5 tts. Runs fine on my 8GB card and I think its superior to all other local models. Correct me if I'm wrong. https://huggingface.co/spaces/mrfakename/E2-F5-TTS/
>>5718910idk, the maskgct version sounds less flat. Helium voice aside it sounds better to me.
I haven't tried this yet, but apparently usable with 16ish GB VRAM.https://github.com/kijai/ComfyUI-MochiWrapper
>>5684944An oldie but a goodie
>>5706783This is what AI was made for!
>>5710086>that endingNOOOOOPE NONONONONONOPE
>>5721243The one on the right is the real one. Bitch is ugly af
>>5713667what a fuckhorrible choice of fonts
>>5721046I got this working but it consumes a crazy amount of VRAM.Tweaked the example workflow to fit right at the 16GB limit for me. I would try again with more steps to make it look better but this already took 45 minutes to gen. https://files.catbox.moe/b3tnb8.json
bros I forgot that i've got 50 credits on lumalabs and cancelled it because hailuoai came outanyone have prompts they want me to pass to the paid version?any idea on how to get better results than vid rel?
>>5723793I liked it when people kept extending stuff, so you would get clips like 20 seconds long. Maybe take an old meme and see what crazy shit it does when you extend it 2 or 3 times.
ChatGPT currently refuses to give me anything space related without star wars content. It's kinda funny, but also pissing me off. I should move onto a real AI image generator but I'm so behind on all the AI shit these days, I don't know which would be good. time to experiment, but taking reqs
>>5725025>>>/g/lmg>>>/g/ldg
>>5725025If you just want static images, try a recent Stable Diffusion XL fine tune, Stable Diffusion 3.5, or Flux.1 dev
>>5670900this still makes more sense than my dreams
>>5725213>>5725129thx anons
>>5720254MaskGct is okay in sound quality but its too slow. Like 100X slower. The quality between the F5 and MaskGct isn't too large of a difference imo, but the speed, dang. That needs to improve.
>>5714793Has anyone conformed that this works on Windows? I'm personally on Linux and it works for me, but I've heard about difficulties on Windows, but there is a fork out there, I can't confirm that it works but if someone could provide feedback on installing this or the amphion version that would be appreciated.https://github.com/justinjohn0306/MaskGCT-Windows
>>5731710idk about that instruction specifically, but I've had it installed on 8GB vram, but as said earlier, its extremely mega ultra slow and not worth it for me
i also have a george floyd one
>>5710745>Akira Kurosawa's Star Warsbut Star Wars is already George Lucas's "The Hidden Fortress"
>>5731920Yeah it consumes about 14GB so it's probably inferring on CPU in your case. I've just heard about some issues with the espeak dependency on Windows so I was wondering if it was a common problem, and it's not something I can personally verify.
>>5735660very accurate
>>5665373Does anyone know about any AI video generation tool that alows you to choose the first and last images of the video?
>>5736659That's more like an interpolater than anything.It's very niche but the only tool I know off the top of my head is ToonCrafter.https://github.com/Doubiiu/ToonCrafter
>>5736659just screen capture it in a regular video player.
>>5736859I already have the pictures I need. I just want the AI to generate a video in which the first frame is one of the images I provide it and the last frame is the other (or even just add one extra frame in between the pictures I provide it.)
>>5735604Got me good.
>>5736659Use pinokio for the installation if you're feeling lazy:https://github.com/jy0205/Pyramid-Flow
>>5737029 This one seems interesting. I'll check it out. Thanks.
>>5737029Thanks anon, I'm gonna check this out. I can't run cogvideo becauce I don't have enough vram, but this claims to have CPU offloading so it's a lot more friendly to my machine.>>5736920If you're looking for just one frame interpolation, that's what RIFE is for. There are many projects that use it if you just search for "rife interpolation". I don't recommend doing more than 1 interpolation frame. Claims of it being able to do 16x are exaggerated.https://github.com/hzwer/ECCV2022-RIFEThere's also an interpolator called FILM that takes a lot more time.https://github.com/google-research/frame-interpolation
>>5737244Oh yeah there's also thishttps://github.com/GSeanCDAT/GIMM-VFIhttps://github.com/kijai/ComfyUI-GIMM-VFI
Does anyone have the McChicken pasta?
Input audio is cleaned up with DeepFilterNet and the audacity plugin, acon digital deverberate 3.
>>5737029Got the 384 version working on 16GB VRAM but apparently the 768p needs 24GB+. 10 seconds also seems to be less reliable than 5 seconds. Overall it takes a lot less time and resources than Mochi >>5723754Couldn't get the gradio demo to work but comfy is fine.https://github.com/kijai/ComfyUI-PyramidFlowWrapper
>>5737819It sounds pretty good. For anyone looking to do audio cleanup with foss tools, ultimate vocal remover has de-echo, de-reverb, and de-noise models that work pretty well. I'm gonna have to look into DeepFilterNet, it looks useful.
>>5697940I think he's referring to the Canaanite woman asking Jesus to heal her daughter in Matthew 15:22-29 and Mark 7:25-30.
>>5671641HE'S ALIVE