This was a research project done as part of Major Project Course. Paper for the same has been accepted at ICACCP 2021.
Emotions are an essential part of speech or communication, which is why they cannot be neglected. The existing text-to-speech systems aren’t the most appropriate at conveying the emotions present behind the text. The systems can speak out the text monotonically lacking expressiveness. In this paper, an Expressive Textto-Speech Synthesis System (ETSSS) is proposed which considers the dominant emotions in the text provided. ETSSS works in two parts: first, it identifies the label behind the text, and second produces expressive speech. In the first part, the input text is given an emotional label. Later this label is used to generate expressive and prosodic speech. Labeling emotions in ETSSS is carried out using BERT which has an accuracy of 94%, 90%, and 90% for disgust, amused, and anger respectively. The speech synthesis with the emotion module of ETSSS achieves a good MOS of 3.8 for anger, 3.5 for disgust, and 3.2 for amused.