Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/17588
Title: Систем за одговарање прашања со повеќекратен избор за тест-колекции на македонски и англиски јазик
Other Titles: Multiple-choice question answering system for Мacedonian and Еnglish test-collections
Authors: Јовановска, Јасмина
Keywords: information retrieval, question answering, natural language processing, word class, word form, string similarity metrics, window functions
Issue Date: 2017
Publisher: ФИНКИ, УКИМ, Скопје
Source: Јовановска, Јасмина (2017). Систем за одговарање прашања со повеќекратен избор за тест-колекции на македонски и англиски јазик. Докторска дисертација. Скопје: ФИНКИ, УКИМ.
Abstract: Question Answering (QA) is still a very active field for research, aiming to meet different demands of the actual users. The design of a modern question answering system requires incorporation of different engineering solutions from several areas. The most important are: information retrieval, natural language processing, artificial intelligence and machine learning. This process of combining techniques in order to build the most advantageous QA system is a very challenging task. Some of the difficulties arise from the specific properties of the natural language used to pose a particular question and the incorporated language/languages, where the answer is searched. Therefore, the successful realization of the QA system design for particular natural language requires an annotated corpus for knowledge extraction, lexical databases, as well as different tools and approaches specified for that language. The main objective of this doctoral thesis is the development of a multiple-choice question answering system for Macedonian and English test-collections. In absence of a testcollection for Macedonian language, this research emphasizes the necessity of creating a testcollection satisfying the standard prevalidation and postvalidation protocols, which are crucial to increase questions versatility, reliability and validity. Such a collection can be used to draw reliable conclusions from the results achieved using the existing, as well as, the new methods included in the process of question answering. For all the experiments in this dissertation, a test-collection from the field of philosophy is used. It fully satisfies the standard prevalidation protocols. In order to build a successful QA system for Macedonian language, the emphasis of this research is on the identification of morphological features of Macedonian language, which have a strong influence in the process of retrieving information and question answering. In particular, the first attempt is to determine the importance of the word class information of the query words, and how this information can be used to improve the retrieved results. In addition, three different strategies are tested for information retrieval: using only the query words, all the word forms of a query word, as well as, all the words that have the same stem with a query word (from the collection dictionary). This research also emphasizes the importance of word proximity in the process of finding the correct answer to a particular question. Therefore, the designed QA system includes a window function. The final results confirm the positive influence of incorporating word forms in the process of answering questions posed in Macedonian language. Moreover, the implementation of the Hanning window function further improves system accuracy. Considering the fact of absence of lemmatizer for Macedonian, this research uses a statistical approach for grouping words from the collection dictionary, which belong to a same lexeme. For that purpose, a new metric for string similarity based on Triangular window is defined. Including the groups generated by this metric in the process of information retrieval (question answering) results with better system accuracy then using the manually created groups of word forms. Concerning the importance of the word class information, the final results confirm that this word characteristic is not dominant (though is influential) in the process of information retrieval (as well as question answering). The designed QA system is tested on two additional test-collections from the field of information technology, written in two different languages: Macedonian and English. The detailed analysis of the overall achievements confirms that the system can be successfully used for answering question posed in Macedonian language from other fields. Furthermore, the results for the English collection show that it can be effectively used for answering question posed in English language, too. The conclusions pave the way to future system improvements including the syntactic features of Macedonian language.
Description: Докторска дисертација одбранета во 2017 година на Факултетот за информатички науки и компјутерско инженерство во Скопје, под менторство на проф. д–р Катерина Здравкова.
URI: http://hdl.handle.net/20.500.12188/17588
Appears in Collections:UKIM 01: Dissertations preceding the Doctoral School / Дисертации пред Докторската школа
UKIM 01: Dissertations preceding the Doctoral School / Дисертации пред Докторската школа

Files in This Item:
File SizeFormat 
JasminaJovanovska2017.pdf4.4 MBAdobe PDFView/Open
Show full item record

Page view(s)

12
checked on Jun 23, 2022

Download(s)

8
checked on Jun 23, 2022

Google ScholarTM

Check


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.