Please use this identifier to cite or link to this item:
Title: MAKEDONKA: Applied Deep Learning Model for Text-to-Speech Synthesis in Macedonian Language
Authors: Mishev, Kostadin 
Karovska Ristovska, Aleksandra 
Trajanov, Dimitar 
Eftimov, Tome
Simjanoska, Monika 
Issue Date: 2020
Publisher: MDPI AG
Journal: Applied Sciences
Abstract: This paper presents MAKEDONKA, the first open-source Macedonian language synthesizer that is based on the Deep Learning approach. The paper provides an overview of the numerous attempts to achieve a human-like reproducible speech, which has unfortunately shown to be unsuccessful due to the work invisibility and lack of integration examples with real software tools. The recent advances in Machine Learning, the Deep Learning-based methodologies, provide novel methods for feature engineering that allow for smooth transitions in the synthesized speech, making it sound natural and human-like. This paper presents a methodology for end-to-end speech synthesis that is based on a fully-convolutional sequence-to-sequence acoustic model with a position-augmented attention mechanism—Deep Voice 3. Our model directly synthesizes Macedonian speech from characters. We created a dataset that contains approximately 20 h of speech from a native Macedonian female speaker, and we use it to train the text-to-speech (TTS) model. The achieved MOS score of 3.93 makes our model appropriate for application in any kind of software that needs text-to-speech service in the Macedonian language. Our TTS platform is publicly available for use and ready for integration.
DOI: 10.3390/app10196882
Appears in Collections:Faculty of Computer Science and Engineering: Journal Articles

Files in This Item:
File Description SizeFormat 
applsci-10-06882.pdf1.3 MBAdobe PDFView/Open
Show full item record

Page view(s)

checked on Jun 18, 2024


checked on Jun 18, 2024

Google ScholarTM



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.