Is there FOSS software that uses an AI model that can take text, voice, and tone (specific, not just style tags) as 3 separate inputs to generate speech? Or even just text and tone and then I can try to apply a voice onto that as another step.
Yes
>>106872917Elevenlabs. Ignore the tards
>>106872965>Ignore the tardsUsecase for ignoring the tards?
>>106872917There are models that do tts from text prompt where you can set up tone like gpt sovits tts and then you do voice clonning with stuff like rvc. I think this will be your pipeline.Im too lazy to google it rn
>>106872965>>106873066Thank you Anons
>>106872976Usecase for ebussy post?
>>106873114It's the new "4you".