STEAM

모든 소프트웨어 > 오디오 제작 > xVATrainer

커뮤니티 허브

xVATrainer

xVATrainer is a UI-based machine learning app used for training TTS voice models using video game voices, for use in xVASynth.

모든 평가

긍정적 (10) - 이 소프트웨어에 대한 사용자 평가 10개 중 100%가 긍정적입니다.

출시일:

2022년 4월 1일

개발자:

Dan Ruta

배급사:

Dan Ruta

태그

이 제품의 인기 태그:

+

로그인하셔서 게임을 찜 목록에 추가하거나, 팔로우하거나 또는 제외하기로 표시하세요.

기능

프로필 기능 제한

언어:

	인터페이스	음성 지원	자막
한국어	지원하지 않음
영어	✔	✔	✔

제목: xVATrainer
장르: 오디오 제작

개발자: Dan Ruta

배급사: Dan Ruta

출시일: 2022년 4월 1일

공식 Discord 서버

업데이트 기록 보기 관련 뉴스 보기 토론장 보기 커뮤니티 그룹 찾기

공유하기 포함

한국어(을)를 지원하지 않습니다

이 제품은 귀하의 로컬 언어를 지원하지 않습니다. 구매하기 전에 아래에 있는 지원하는 언어 목록을 확인해주세요.

xVATrainer 사용

무료

무료

소프트웨어 정보

xVATrainer is the companion app to xVASynth, the AI text-to-speech app using video game voices. xVATrainer is used for creating the voice models for xVASynth, and for curating and pre-processing the datasets used for training these models. Check the nexusmods page description for more details, instructions, and updates. Join the Discord for support, and community advice.

IMPORTANT: The "priors" files NEED to be installed for v3 voice training to be possible. Don't forget to download and install these. This is synthetic data (+ some real data from the NVIDIA HIFI TTS and VCTK datasets) to maintain multi-lingual and voice range capabilities when fine-tuning individual voices, similar to Dreambooth training. Due to steam filesize uploads, these can be freely downloaded from the nexusmods xVATrainer page.

Dataset annotation

The main screen of xVATrainer contains a dataset explorer, which gives you an easy way to view, analyze, and adjust the data samples in your dataset. It further provides recording capabilities, if you need to record a dataset of your own voice, straight through the app, into the correct format.

Trainer

xVATrainer contains AI model training, for the FastPitch1.1 (with modified training set-up), and HiFi-GAN models (the xVASynth "v2" models). The training follows a multi-stage approach especially optimized for maximum transfer learning (fine-tuning) quality. The generated models are exported into the correct format required by xVASynth, ready to use for generating audio with.

Batch training is also supported, allowing you to queue up any number of datasets to train, with cross-session persistence. The training panel shows a cmd-like textual log of the training progress, a tensorboard-like visual graph for the most relevant metrics, and a task manager-like set of system resources graphs.

Tools

There are several data pre-processing tools included in xVATrainer, to help you with almost any data preparation work you may need to do, to prepare your datasets for training. There is no step-by-step order that they need to be operated in, so long as your datasets end up as 22050Hz mono wav files of clean speech audio, up to about 10 seconds in length, with an associated transcript file with each audio file's transcript. Depending on what sources your data is from, you can pick which tools you need to use, to prepare your dataset to match that format. The included tools are:

Audio formatting - a tool to convert from most audio formats into the required 22050Hz mono .wav format
AI speaker diarization - an AI model that automatically extracts short slices of speech audio from otherwise longer audio samples (including feature length movie sized audio clips). The audio slices are additionally separated automatically into different individual speakers
AI source separation - an AI model that can remove background noise, music, and echo from an audio clip of speech
Audio Normalization - a tool which normalizes (EBU R128) audio to standard loudness
WEM to OGG - a tool to convert from a common audio format found in game files, to a playable .ogg format. Use the "Audio formatting" tool to convert this to the required .wav format
Cluster speakers - a tool which uses an AI model to encode audio files, and then clusters them into a known or unknown number of clusters, either separating multiple speakers, or single-speaker audio styles
Speaker similarity search - a tool which encoders some query files, a larger corpus of audio files, and then re-orders the larger corpus according to each file's similarity to all the query files
Speaker cluster similarity search - the same as the "Speaker similarity search" tool, but using clusters calculated via the "Cluster speakers" tool as data points in the corpus to sort
Transcribe - an AI model which automatically generates a text transcript for audio files
WER transcript evaluation - a tool which examines your dataset's transcript against one auto-generated via the "Transcribe" tool to check for quality. Useful when supplying your own transcript, and checking if there are any transcription errors.
Remove background noise - a more traditional noise removal tool, which uses a clip of just noise as reference to remove from a larger corpus of audio which consistently has matching background noise
Silence Split - A simple tool which splits long audio clips based on configurable silence detection

Special thanks:

D0lphin, flyingvelociraptor, Caden Black, Max Loef, LadyVaudry, Thuggysmurf, radbeetle, TomahawkJackson, Solstice_, Bungles, midori95, eldayualien, John Detwiler, Cecell, Wandering Youth, ellia, Retlaw83, Trixie, CHASE MCKELVY, Leif, ionite, Joshua Jones, Jaktt1337, David Keith vun Kannon, Netherworks (Jo-Jo), neci, Rachel Wiles, Imogen, Deer, Linthar, sadfer, Danielle, Hector Medima, Sh1tMagnet, ReaperStoleMyStyle, AshbeeGaming, TCG, Lady Steel, Mikkel Jensen, CookieGalaxy, GrumpyBen, Adrilz, ReyVenom, dog, bourbonicRecluse, ShiningEdge, Dozen9292, manlethamlet, smokeandash, Elias V, EnculerDeTaMere, SKiLLsSoLoN, J, finalfrog, Hound740, Buck, Yael van Dok, ChrisTheStranger, Isabel, Fuzzy Lonesome, Drake, Beto, AceAvenger, bobbigmac, Alexandra Whitton, yic17, Joebobslim, ThatGuyWithaFace, Sergey Trifonov, Zensho, AgitoRivers, beccatoria, valo999, Ne0nFLaSH, Caro Tuts, Jack in the Hinter, Hammerhead96 ., Bewitched, Para, Wht??? Why??, Shadowtigers, PConD, Lulzar, Ryan W, Wyntilda, Gorim, Krazon, Tako-kun, Walt, Katsuki, Ember2528, RetconReality, Hazel Louise Steele, Laura Almeida, Althecow, PatronGuy, squirecrow, cramonty, crash blue, Syrr, David, Hawkbar, John S., Autumn, pimphat, FeralByrd, Comical, Dogmeat114, Dezmar-Sama, Michael Gill, Jacob Garbe, NerfViking, Dinonugget, RedneckJP007, stormalize, Golem, Luckystroker, Hapax, Vahzah Vulom, Tempuc, CAW CAW, stljeffbb, bart, MrJoy, Zoenna, Calvin, Aosana Bluewing, Dan Brookes, CDante, HunterAP, Kadisra, candied_skull, hairahcaz, nairaiwu, Mar, Paraffine, Nawen_Syaka, Amy Parker, Loseron, katiefraggle, Freon, deepbluefrog, myles.app, hanbonzan, Scientist Salari-Ren, Roman Tinkov, zackc1play, An abstract kind of horror, L, Mihu123, Trisket, Aelarr, Flipdark95, Timo Steiner, humocs, Optimist Vamscenes, Patrick VanDusen, praxis22, Rui Orey, Craig Fedynich, FrenchToast, Dorpz, cesm23, BoB, Cutup, Botty Butler, tjn2222, Matthew Warren, Tom Green, Passionate Lobster, Precipitation, Veks, Baki Balcioglu, Fenris, Patrik K., Oddbrother, E.M.A, DrogerKerchva, Camurai, hthek, iggyzee, Moppy, Stee_Muttlet, asbestos my beloved, TrueBlue, something106, woah00z, Sam Darling, JoshuaJSlone, vvvpppmmm, OvrTheTopMan, munchyfly, DarkNemphis, Justin McGough, Billyro, DIY_Rene, kevmasters, Stu, Sasquatch Bill, Inconsistent, Gothic 3 The Age of War, www48, Slothman, mavrodya petrov, ronaldomoon, Kostin Oleksandr Anatoliiovych, Ryan Lippen, Edward Hyde, Echoes, Vape Gwagwa, Kelg Celcs, Kneelers, Meryl Coker, Alan Gonzalez, PTC001, Hector Medima, CinnaMewRoll, Grant Spielbusch, Sean Lyons, Charles Hufnagel, Kirill Akimov, Mister Lyosea, Anthony Crane, Sh1tMagnet

시스템 요구 사항

최소:

64비트 프로세서와 운영 체제가 필요합니다
운영 체제: Windows 10
프로세서: i7 4700k or later (the more cores/threads the better)
메모리: 8 GB RAM
그래픽: NVIDIA and CUDA, 6+GB VRAM
저장공간: 25 GB 사용 가능 공간

권장:

64비트 프로세서와 운영 체제가 필요합니다

모두 보기

비슷한 제품 더 보기