Learning syntactic tagging of Macedonian language
Journal
Computer Science and Information Systems
Date Issued
2018
Author(s)
Bonchanoski, Martin
Abstract
This paper presents the creation of machine learning based systems for
Part-of-speech tagging of Macedonian language. Four well-known PoS tagger
systems implemented for English and Slavic languages: TnT, cyclic dependency
network, guided learning framework for bidirectional sequence classification, and
dynamic features induction were trained. Orwell’s novel “1984” was manually
tagged from the authors and it was used split into training and test set. After the
training of the models, a comparison between the models was made. At the end, a
POS tagger with an accuracy that reaches 97.5% was achieved, making it very
appropriate for the future grammatical tagging of the National corpus of
Macedonian language, which is currently in its initial stage. The Part-of-speech
tagger that was create is published online and free to use.
Part-of-speech tagging of Macedonian language. Four well-known PoS tagger
systems implemented for English and Slavic languages: TnT, cyclic dependency
network, guided learning framework for bidirectional sequence classification, and
dynamic features induction were trained. Orwell’s novel “1984” was manually
tagged from the authors and it was used split into training and test set. After the
training of the models, a comparison between the models was made. At the end, a
POS tagger with an accuracy that reaches 97.5% was achieved, making it very
appropriate for the future grammatical tagging of the National corpus of
Macedonian language, which is currently in its initial stage. The Part-of-speech
tagger that was create is published online and free to use.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
1820-02141800027B.pdf
Size
676.17 KB
Format
Adobe PDF
Checksum
(MD5):ded5e68b27bb56460ec2db8ee37d3d5f
