Syllable and Morpheme Segmentation of Macedonian Language
Date Issued
2023-05
Author(s)
Mitreska, Maja
Abstract
Communication is the key to human development. Approximately 5% of the world’s population
experience some form of hearing disability. Modern assistive devices and technologies can improve the communication skills of hearing impaired people by transcribing the speech into text. The creation of such an application depends on the language specific morphosyntactic properties. It usually starts with the syllabification. The research presented in this paper focuses on the development of an automatic system for rule-based and sonority-based syllable and morpheme
segmentation of Macedonian language, which can be easily incorporated into an efficient speech recognition system. The segmentation rules for breaking the words down into syllables and into morphemes were created according to the new orthography of the Macedonian language. For the
sonority-based approach, a novel phonological distance measure was introduced capable of efficient syllable clustering. The implementation of the framework is developed in Python using several data structures for optimized performance and CPU usage. Both segmentation strategies were evaluated using the electronic lexicon consisting of more than one million words. A linguistic
expert was consulted during the entire process. The consistency of the obtained results promises their sustainability for further speech processing applications.
experience some form of hearing disability. Modern assistive devices and technologies can improve the communication skills of hearing impaired people by transcribing the speech into text. The creation of such an application depends on the language specific morphosyntactic properties. It usually starts with the syllabification. The research presented in this paper focuses on the development of an automatic system for rule-based and sonority-based syllable and morpheme
segmentation of Macedonian language, which can be easily incorporated into an efficient speech recognition system. The segmentation rules for breaking the words down into syllables and into morphemes were created according to the new orthography of the Macedonian language. For the
sonority-based approach, a novel phonological distance measure was introduced capable of efficient syllable clustering. The implementation of the framework is developed in Python using several data structures for optimized performance and CPU usage. Both segmentation strategies were evaluated using the electronic lexicon consisting of more than one million words. A linguistic
expert was consulted during the entire process. The consistency of the obtained results promises their sustainability for further speech processing applications.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
19_ais_8584.pdf
Size
1.01 MB
Format
Adobe PDF
Checksum
(MD5):ba9b9058098f91af21b68f11968724fc
