End-to-end GAN-TTS architecture

A tensorflow implementation of GAN-TTS paper.

Text embeddings are generated by a tensorflow pre-trained BERT model.
Linguistic features are not predicted by external models, but they are predicted by a feature net that works together with the generator and the discriminator. The feature net is a simple CBHG module, which takes a text embedding in input and outputs a tensor of linguistic features.
You can explore the data flow and data dimensionality using the notebook . The discriminator used in the notebook is different because colab GPU couldn't handle the original discriminator
I trained the model on a really small dataset, 17 audio-texts from LJSpeech, because i didn't have a proper machine to use.
To evaluate this GAN i used the Frechét Distance, where all embeddings were calcuated with VGGish TensorFlow pre-trained model.

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
Images		Images
LJSpeech		LJSpeech
LJSpeechTest		LJSpeechTest
Models		Models
Preprocessing		Preprocessing
Tests		Tests
Training		Training
Utils		Utils
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
E2EGANTTS.ipynb		E2EGANTTS.ipynb
README.md		README.md

Provide feedback