Resources for Machine Translation of the Macedonian Language
Date Issued
2009
Author(s)
Stolić, Milosh
Abstract
This paper focuses on creating new linguistic resources for the
Macedonian language. It presents a new parallel corpus between Macedonian
and Serbian language, build around the digitalized version of George Orwell's
"1984", developed during the MULTEXT-EAST project. The original corpus is
expanded with news articles from the Southeast European Times newspaper,
published in public domain. The paper describes the retrieval, conversion, preprocessing, filtering and sentence-alignment of the corpus, then discusses and
evaluates the alignment results.
Macedonian language. It presents a new parallel corpus between Macedonian
and Serbian language, build around the digitalized version of George Orwell's
"1984", developed during the MULTEXT-EAST project. The original corpus is
expanded with news articles from the Southeast European Times newspaper,
published in public domain. The paper describes the retrieval, conversion, preprocessing, filtering and sentence-alignment of the corpus, then discusses and
evaluates the alignment results.
Subjects
File(s)![Thumbnail Image]()
Loading...
Name
Resources_for_Machine_Translation_of_the_Macedonia.pdf
Size
117.94 KB
Format
Adobe PDF
Checksum
(MD5):c463ee177c023a7f7d1c9792517526dc
