Mozilla’s speech synthesizer: TTS

What is Mozilla TTS 🦊

TTS is a tool, a toolkit, and an infrastructure created in Python that can synthesize voice from text in a natural manner, mimicking human tone, thus not sounding like a machine “speaking.” Developed by the Mozilla Foundation 🦊 (the foundation behind Firefox).

Mozilla TTS is currently state-of-the-art in the field of human voice generation, being far superior to most of the alternatives that have been used in recent years (Loquendo, Festival, etc.). It even surpasses the voices of assistants like Google, Cortana, or Alexa.

Here is an example I created for Makiai.

This is possible with the integration of end-to-end models like VITS, which has revolutionized the industry.

With Mozilla TTS, you can synthesize voice in more than 20 languages using various models, not just VITS. In addition to being powerful and easy to use, it does not consume excessive resources like other leading options.

Moreover, it is OpenSource 🦙, meaning it is open-source and FreeSoftware, so it can be used with few legal limitations, as we will see later in the licensing section.

How to install Mozilla TTS on Linux 🍣

Here I will explain how to install this tool on Linux, specifically on Fedora, so the tutorial also applies to other RPM-based Linux operating systems like CentOS. If you are using a distro with DEB packaging like Debian or Ubuntu, then all you need to do to follow this tutorial is change the installer; whenever you see “dnf install,” replace it with “apt-get.”

🚧 One more thing: if you don’t have 8GB of RAM or more, it is possible that your computer might not be able to install or run it.

With that said, let’s get started… 😎

Step 1: check python version

Open the terminal and run the following command:

python3 --version

It is to check the Python version, if it is lower than version 3.7, update it.

Step 2: Install pips

Pips is used to install Python programs. Run the following command in the terminal:

sudo dnf install pips

Step 3: Install Mozilla TTS

pip install TTS

You can prepare a coffee while it is setting up as it can take more than 10 minutes perfectly. ☕

End of installation! Now you can use it.

How to use TTS 🎤

It can be used for many things, even to train it as a voice synthesizer or to use your own model, or to clone voices, etc. But its main function and the one that concerns us in this article is to synthesize voice. At the same time there are several ways to use it and I am going to show the one that I think is the easiest.

We will simply use a small script:

tts --text "Text you want to synthesize" --model_name "YourModelNameHere" --out_path "audiotts.wav"

All you need to do is run that command in the terminal, and it will synthesize the text to audio, creating a WAV audio file in the specified folder. If you’re on Linux, don’t worry; simply changing .wav to .mp3 in the filename will apply the MP3 format.

And you might wonder, how do I know which models are available? It’s very simple. Open the terminal and run the following command to list all available models in TTS:

tts --list_models

There are many models. When you run TTS to synthesize text to audio, it will download the selected model and use it. If you have used it before, it obviously won’t download that model again. The models are a few hundred MB each. So, you just need to choose a model from the list and include it in the command. As you will see, the models in the list have /es/, /en/, /de/, etc., in the middle. These indicate the languages in which they synthesize audio well. Some models are available for multiple languages. As I mentioned at the beginning, we can synthesize in up to 20 languages. And if I download an English model and synthesize Spanish text, what happens? Nothing, except that the quality will be worse and it will have an English accent.

To make things easier for the reader, here is what I believe to be the best Spanish model, VITS. You can use it with the following command:

tts --text "Test text from the Makiai.com website" --model_name "tts_models/es/css10/vits" --out_path "makiai.wav"

It will take a few seconds or minutes depending on the length of the text you execute and the power of your computer.

And that’s all.

Mozilla TTS License

Use a license MPL-2.0 license.

Summary: You can use it almost as you like; you can commercialize it, modify it, use it in your company, etc.

MPL-2.0 (Mozilla Public License 2.0) is an open-source license with weak copyleft that allows the use, modification, distribution, and use of software patents. It also permits commercial and private use. The license has certain limitations and conditions to ensure the availability of the source code and the preservation of copyright and license notifications.

Summary of MPL-2.0:

Permissions:

  • Commercial use
  • Modification
  • Distribution
  • Patent use
  • Private use

Limitations:

  • Liability
  • Trademark use
  • Warranty

Conditions:

  • Source code disclosure
  • License and copyright notification
  • Same license (file)

Examples of using the MPL-2.0 license:

  1. A developer creates software under the MPL-2.0 license. Other developers can use, modify, and distribute this software, but they must keep the same license for the modified files and provide the source code for those files.
  2. A developer creates a software library under MPL-2.0, and another developer integrates it into a larger application. The larger application can be distributed under different terms and without the source code of the additional files, as long as the conditions of MPL-2.0 for the original library files are respected.
  3. A company uses software under MPL-2.0 for its internal operations. The company must comply with the license conditions, such as providing the source code for any modifications made to the software and maintaining the copyright and license notifications in the modified files. However, the company is not required to share the source code of the additional files created for the internal application.
Like