Collabora Ltd and LAION have released WhisperSpeech, an open source text-to-speech deep learning model built by inverting Whisper speech-to-text model.
Even though it is still a work in progress, as of 19th Jan 2024, WhisperSpeech’s performance is fantastic and it can be easily loaded and used for inference on a Colab notebook or Kaggle notebook.
In addition to converting text to speech, WhisperSpeech can also be used for voice cloning and it is much faster than any other voice cloning model or system I have used.
Let’s open a Colab notebook and try out WhisperSpeech right away!
Install WhisperSpeech
First I will enable the GPU in my notebook and then install the library WhisperSpeech.
!pip install -Uqq WhisperSpeech
from whisperspeech.pipeline import Pipeline
Next let’s load the fast SD S2A model for text-to-audio conversion.
pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-tiny-en+pl.model')
Convert text to speech using WhisperSpeech
Generating audio is pretty straightforward, use the generate_to_notebook()
method and pass the input text.
pipe.generate_to_notebook("""
As climate change becomes an increasingly urgent global concern,
individuals and communities are embracing eco-friendly practices to reduce
their environmental impact. From adopting renewable energy sources and
minimizing waste to promoting conscious consumption, the shift towards
sustainability is reshaping daily habits. Sustainable living not only
benefits the planet but also fosters a sense of responsibility and mindfulness
in individuals, creating a collective movement towards a more environmentally
conscious future.
""")
Output:
Voice cloning using WhisperSpeech
Now let me show the voice cloning capabilities of WhisperSpeech. The voice that I will clone is given below. Play it once.
Reference voice sample:
I will use the pipeline and the same model.
input_text = """
As climate change becomes an increasingly urgent global concern,
individuals and communities are embracing eco-friendly practices to reduce
their environmental impact. From adopting renewable energy sources and
minimizing waste to promoting conscious consumption, the shift towards
sustainability is reshaping daily habits. Sustainable living not only
benefits the planet but also fosters a sense of responsibility and mindfulness
in individuals, creating a collective movement towards a more environmentally
conscious future.
"""
speaker_path = "https://upload.wikimedia.org/wikipedia/commons/5/52/Australian_man_caught_driving_with_no_licence_twice_in_one_day.ogg"
pipe.generate_to_notebook(input_text, speaker=speaker_path)
Output:
This voice cloning took 43 seconds on the T4 GPU. Not bad!