Ве молиме користете го овој идентификатор да го цитирате или поврзете овој запис: http://hdl.handle.net/20.500.12188/33947
Наслов: Sonority Based Syllabification of Macedonian and Serbian
Authors: Zdravkova, Katerina 
Kuzmanova, Jana
Keywords: Macedonian, Serbian, phoneme sonority, syllabification
Issue Date: 21-ное-2024
Journal: Technical editors
Abstract: Phonetically, syllables are sequences of sounds that contain a single peak of prominence, while phonologically they are units of stress placement. According to the Sound Sequencing Principle, sonority within a syllable rises to the nucleus of the syllable and then falls in sonority. So far, there were several attempts to syllabify Macedonian and Serbian words. The accuracy of the Macedonian experiment was not evaluated on a specific corpus, while the Serbian syllabification exceeded 98%. The rule-based approach was rather complex, compared to sonority based syllabification that we proposed for Macedonian and extend to Serbian. The sonority of Macedonian phonemes depends on their basic classification: vowels (weight 12), sonorants (4), voiced (2) and voiceless (1). When the sonorant р (Latin transcription: r) is between two consonants, it becomes a syllable carrier, and therefore its sonority is higher, initially 6. Two adjacent vowels are separated by a fictitious consonant FC. The sonority of Serbian phonemes is more complex and embraces additional classes: vowels (12), sonorant р (8), sonorants and plosive voiced phonemes (4), plosive voiceless and fricative voiced (3), fricative voiceless and voiced affricates (2), and voiceless affricates (1). The syllable nuclei in both languages are the five vowels. In Macedonian, a nucleus can be the sonorant р appearing within a consonant group (крст, вр-ста, пр-вен-ство) or at the end of the word (ма-са-кр). In Serbian language, apart from the sonorant р (тврд, црв, тр-ка), the sonorants л and н can also become syllable nuclei (for example, би-ци-кл, Вл-та-ва, Њу-тн). They are determined by calculating the triplet difference between the sonority of the current phoneme and its left and right neighbours. Determination of syllable boundaries depends on the monotonically non-decreasing and decreasing sonority. In Macedonian, whenever the sonority of two consonants is non decreasing, they are split into two adjacent syllables. In Serbian, in the same case both consonants are part of the second syllable. In Macedonian, the accuracy of the baseline algorithm was rather low, mainly because the suffixes ски, ство and ствен and their inflections, which should remain within one syllable were separated. By adjusting this, we achieved an accuracy of 95.60% evaluated on a corpus of more than 1000 words. However, it affected the syllabification of the nouns: гус-ки, мас ки, прас-ки, in which ски is not a morpheme. Based on the sample of more than 3000 syllabified Serbian words, the accuracy of the baseline algorithm was 97.59%. By modifying the sonority of р to 6, the accuracy reached 98.54%, exceeding the rule-based syllabification accuracy. The approach we proposed is extremely simple and at the same time, very efficient. We intend to further improve it by taking into account the PoS tags for the Macedonian language and the exclusions for Serbian, hoping to reach an accuracy of over 99%.
URI: http://hdl.handle.net/20.500.12188/33947
Appears in Collections:Faculty of Computer Science and Engineering: Journal Articles

Files in This Item:
File SizeFormat 
JUDIG-2024-book of abstracts.pdf2.12 MBAdobe PDFView/Open
Прикажи целосна запис

Google ScholarTM

Проверете


Записите во DSpace се заштитени со авторски права, со сите права задржани, освен ако не е поинаку наведено.