Ultimate RVC 💙
The type of source to retrieve a song from.
Select a song from the list of cached songs.
Select a model to use for voice conversion.
Whether to split the input voice track into smaller segments before converting it. This can improve output quality for longer voice tracks.
Whether to apply autotune to the converted voice.
Whether to clean the converted voice using noise reduction algorithms.
The model to use for generating speaker embeddings.
Select a custom embedder model from the dropdown.
The sample rate of the mixed output track.
The audio format of the mixed output track.
Show intermediate audio tracks produced during song cover generation.
The type of source to retrieve a song from.
Select a song from the list of cached songs.
Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.
The model to use for audio separation.
Select a model to use for voice conversion.
Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.
Whether to split the input voice track into smaller segments before converting it. This can improve output quality for longer voice tracks.
Whether to apply autotune to the converted voice.
Whether to clean the converted voice using noise reduction algorithms.
The model to use for generating speaker embeddings.
Select a custom embedder model from the dropdown.
Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.
Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.
Directory where intermediate audio files are stored and loaded from locally. When a new song is retrieved, its directory is chosen by default.
The sample rate of the mixed output track.
The audio format of the mixed output track.
The type of source to generate speech from.
Select a voice to use for text to speech conversion.
Select a model to use for voice conversion.
Whether to split the input voice track into smaller segments before converting it. This can improve output quality for longer voice tracks.
Whether to apply autotune to the converted voice.
Whether to clean the converted voice using noise reduction algorithms.
The model to use for generating speaker embeddings.
Select a custom embedder model from the dropdown.
The sample rate of the mixed output track.
The audio format of the mixed output track.
Show intermediate audio tracks produced during speech generation.
The type of source to generate speech from.
Select a voice to use for text to speech conversion.
Select a model to use for voice conversion.
Whether to split the input voice track into smaller segments before converting it. This can improve output quality for longer voice tracks.
Whether to apply autotune to the converted voice.
Whether to clean the converted voice using noise reduction algorithms.
The model to use for generating speaker embeddings.
Select a custom embedder model from the dropdown.
The sample rate of the mixed output track.
The audio format of the mixed output track.
- Filter voice models by selecting one or more tags and/or providing a search query.
- Select a row in the table to autofill the name and URL for the given voice model in the form fields below.
Public models table
Compa - Hyperdimension Neptunia | Yuigahama Yui from Yahari Ore no Seishun Love Comedy wa Machigatteiru (250 Epochs) | Anime,Other Language,Real person | dacoolkid44 & hijack & Maki Ligon | 2023-07-31 | https://huggingface.co/zeerowiibu/WiibuRVCCollection/resolve/main/Compa%20(Choujigen%20Game%20Neptunia)%20(JPN)%20(RVC%20v2)%20(150%20Epochs).zip |
Select the pretrained model you want to download.
Select the sample rate for the pretrained model.
- Find the .pth file for a locally trained RVC model (e.g. in your local weights folder) and optionally also a corresponding .index file (e.g. in your logs/[name] folder)
- Upload the files directly or save them to a folder, then compress that folder and upload the resulting .zip file
- Enter a unique name for the uploaded model
- Click 'Upload'
- Find the config.json file and pytorch_model.bin file for a custom embedder model stored locally.
- Upload the files directly or save them to a folder, then compress that folder and upload the resulting .zip file
- Enter a unique name for the uploaded embedder model
- Click 'Upload'
Select the type of dataset to preprocess.
The path to an existing dataset. Either select a path to a previously created dataset or provide a path to an external dataset.
Name of the model to preprocess the given dataset for. Either select an existing model from the dropdown or provide the name of a new model.
Target sample rate for the audio files in the provided dataset.
Whether to remove low-frequency sounds from the audio files in the provided dataset by applying a high-pass butterworth filter.
Whether to clean the audio files in the provided dataset using noise reduction algorithms.
The method to use for splitting the audio files in the provided dataset. Use the Skip
method to skip splitting if the audio files are already split. Use the Simple
method if excessive silence has already been removed from the audio files. Use the Automatic
method for automatic silence detection and splitting around it.
Name of the model with an associated preprocessed dataset to extract training features from. When a new dataset is preprocessed, its associated model is selected by default.
The method to use for extracting pitch features.
The model to use for generating speaker embeddings.
Select a custom embedder model from the dropdown.
The type of hardware acceleration to use. 'Automatic' will automatically select the first available GPU and fall back to CPU if no GPUs are available.
Name of the model to train. When training features are extracted for a new model, its name is selected by default.
Whether to detect overtraining to prevent the voice model from learning the training data too well and losing the ability to generalize to new data.
The vocoder to use for audio synthesis during training. HiFi-GAN provides basic audio fidelity, while RefineGAN provides the highest audio fidelity.
The method to use for generating an index file for the trained voice model. KMeans
is particularly useful for large datasets.
The type of pretrained model to finetune the voice model on. None
will train the voice model from scratch, while Default
will use a pretrained model tailored to the specific voice model architecture. Custom
will use a custom pretrained that you provide.
Select a custom pretrained model to finetune from the dropdown.
Whether to save a unique checkpoint at each save interval. If not enabled, only the latest checkpoint will be saved at each interval.
Whether to save unique voice model weights at each save interval. If not enabled, only the best voice model weights will be saved.
Whether to delete any existing training data associated with the voice model before training commences. Enable this setting only if you are training a new voice model from scratch or restarting training.
Whether to automatically upload the trained voice model so that it can be used for generation tasks within the Ultimate RVC app.
The type of hardware acceleration to use. 'Automatic' will automatically select the first available GPU and fall back to CPU if no GPUs are available.
Whether to preload all training data into GPU memory. This can improve training speed but requires a lot of VRAM.
Whether to reduce VRAM usage at the cost of slower training speed by enabling activation checkpointing. This is useful for GPUs with limited memory (e.g., <6GB VRAM) or when training with a batch size larger than what your GPU can normally accommodate.
The name of a configuration to load UI settings from