WhisperSpeech – New Text-To-Speech Model In Town

Collabora Ltd and LAION have released WhisperSpeech, an open source text-to-speech deep learning model built by inverting Whisper speech-to-text model.

Even though it is still a work in progress, as of 19th Jan 2024, WhisperSpeech’s performance is fantastic and it can be easily loaded and used for inference on a Colab notebook or Kaggle notebook.

In addition to converting text to speech, WhisperSpeech can also be used for voice cloning and it is much faster than any other voice cloning model or system I have used.

Let’s open a Colab notebook and try out WhisperSpeech right away!

Install WhisperSpeech

First I will enable the GPU in my notebook and then install the library WhisperSpeech.

!pip install -Uqq WhisperSpeech
from whisperspeech.pipeline import Pipeline

Next let’s load the fast SD S2A model for text-to-audio conversion.

pipe = Pipeline(s2a_ref='collabora/whisperspeech:s2a-q4-tiny-en+pl.model')

Convert text to speech using WhisperSpeech

Generating audio is pretty straightforward, use the generate_to_notebook() method and pass the input text.

pipe.generate_to_notebook("""
As climate change becomes an increasingly urgent global concern, 
individuals and communities are embracing eco-friendly practices to reduce 
their environmental impact. From adopting renewable energy sources and 
minimizing waste to promoting conscious consumption, the shift towards 
sustainability is reshaping daily habits. Sustainable living not only 
benefits the planet but also fosters a sense of responsibility and mindfulness 
in individuals, creating a collective movement towards a more environmentally 
conscious future.
""")

Output:

Voice cloning using WhisperSpeech

Now let me show the voice cloning capabilities of WhisperSpeech. The voice that I will clone is given below. Play it once.

Reference voice sample:

I will use the pipeline and the same model.

input_text = """
As climate change becomes an increasingly urgent global concern, 
individuals and communities are embracing eco-friendly practices to reduce 
their environmental impact. From adopting renewable energy sources and 
minimizing waste to promoting conscious consumption, the shift towards 
sustainability is reshaping daily habits. Sustainable living not only 
benefits the planet but also fosters a sense of responsibility and mindfulness 
in individuals, creating a collective movement towards a more environmentally 
conscious future.
"""

speaker_path = "https://upload.wikimedia.org/wikipedia/commons/5/52/Australian_man_caught_driving_with_no_licence_twice_in_one_day.ogg"

pipe.generate_to_notebook(input_text, speaker=speaker_path)

Output:

This voice cloning took 43 seconds on the T4 GPU. Not bad!

Leave a Reply

Your email address will not be published. Required fields are marked *