Please use this identifier to cite or link to this item: http://hdl.handle.net/20.500.12188/33563
Title: Combining Semantic Matching, Word Embeddings, Transformers, and LLMs for Enhanced Document Ranking: Application in Systematic Reviews
Authors: Mitrov, Goran
Stanoev, Boris
Gievska, Sonja 
Mirceva, Georgina
Zdravevski, Eftim 
Issue Date: 4-Sep-2024
Publisher: MDPI AG
Journal: Big Data and Cognitive Computing
Abstract: The rapid increase in scientific publications has made it challenging to keep up with the latest advancements. Conducting systematic reviews using traditional methods is both time-consuming and difficult. To address this, new review formats like rapid and scoping reviews have been introduced, reflecting an urgent need for efficient information retrieval. This challenge extends beyond academia to many organizations where numerous documents must be reviewed in relation to specific user queries. This paper focuses on improving document ranking to enhance the retrieval of relevant articles, thereby reducing the time and effort required by researchers. By applying a range of natural language processing (NLP) techniques, including rule-based matching, statistical text analysis, word embeddings, and transformer- and LLM-based approaches like Mistral LLM, we assess the article’s similarities to user-specific inputs and prioritize them according to relevance. We propose a novel methodology, Weighted Semantic Matching (WSM) + MiniLM, combining the strengths of the different methodologies. For validation, we employ global metrics such as precision at K, recall at K, average rank, median rank, and pairwise comparison metrics, including higher rank count, average rank difference, and median rank difference. Our proposed algorithm achieves optimal performance, with an average recall at 1000 of 95% and an average median rank of 185 for selected articles across the five datasets evaluated. These findings give promising results in pinpointing the relevant articles and reducing the manual work.
URI: http://hdl.handle.net/20.500.12188/33563
DOI: 10.3390/bdcc8090110
Appears in Collections:Faculty of Computer Science and Engineering: Journal Articles

Files in This Item:
File SizeFormat 
BDCC-08-00110-v2.pdf713.99 kBAdobe PDFView/Open
Show full item record

Google ScholarTM

Check

Altmetric


Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.