Ве молиме користете го овој идентификатор да го цитирате или поврзете овој запис: http://hdl.handle.net/20.500.12188/27641
Наслов: NLP-based Typo Correction Model for Croatian Language
Authors: Mitreska, Maja
Mishev, Kostadin 
Simjanoska, Monika
Keywords: Natural language processing , Typo correction , Croatian language
Issue Date: 23-мај-2022
Publisher: IEEE
Conference: 2022 45th Jubilee International Convention on Information, Communication and Electronic Technology (MIPRO)
Abstract: Spelling correction plays an important role when applied in complex NLP-based applications and pipelines. Many of the existing models and techniques are developed to support the English language as it is the richest language in terms of resources available for training such models. The good occasion is that few of the methodologies provide the opportunity to adapt to other, low-resource languages. In this paper, we explore the power of the Neuspell Toolkit for training an original spelling correction model for the Croatian language. The toolkit itself comprises ten different models, but for the purposes of our work, we use the leverage of pre-trained transformer networks due to their experimentally proven spelling correction efficiency in the English language. The comparison is performed over different pre-trained Subword BERT architectures, including BERT Multilingual, DistilBERT, and XLM-RoBERTa, due to their subword representation support for the Croatian language. Furthermore, the training is done as a sequence labeling task on a newly created parallel Croatian dataset where the noisy examples are synthetically generated, and the misspelled words are labeled with their correct version. Finally, the model is tested in-vivo as part of our originally developed speech-to-text model for the Croatian language.
URI: http://hdl.handle.net/20.500.12188/27641
Appears in Collections:Faculty of Computer Science and Engineering: Conference papers

Прикажи целосна запис

Page view(s)

31
checked on 17.5.2024

Google ScholarTM

Проверете


Записите во DSpace се заштитени со авторски права, со сите права задржани, освен ако не е поинаку наведено.