Faculty of Computer Science and Engineering

Permanent URI for this communityhttps://repository.ukim.mk/handle/20.500.12188/5

The Faculty of Computer Science and Engineering (FCSE) within UKIM is the largest and most prestigious faculty in the field of computer science and technologies in Macedonia, and among the largest faculties in that field in the region. The FCSE teaching staff consists of 50 professors and 30 associates. These include many “best in field” personnel, such as the most referenced scientists in Macedonia and the most influential professors in the ICT industry in the Republic of Macedonia.

Browse

Search Results

Now showing 1 - 4 of 4
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex)
    (Institute of Electrical and Electronics Engineers (IEEE), 2024-01)
    Rizinski, Maryan
    ;
    Peshov, Hristijan
    ;
    ;
    ;
    Lexicon-based sentiment analysis in finance leverages specialized, manually annotated lexicons created by human experts to extract sentiment from financial texts effectively. Although lexicon-based methods are simple to implement and fast to operate on textual data, they require considerable manual annotation efforts to create, maintain, and update the lexicons. These methods are also considered inferior to the deep learning-based approaches, such as transformer models, which have become dominant in various natural language processing (NLP) tasks due to their remarkable performance. However, their efficacy comes at a cost: these models require extensive data and computational resources for both training and testing. Additionally, they involve significant prediction times, making them unsuitable for real-time production environments or systems with limited processing capabilities. In this paper, we introduce a novel methodology named eXplainable Lexicons (XLex) that combines the advantages of both lexicon-based methods and transformer models. We propose an approach that utilizes transformers and SHapley Additive exPlanations (SHAP) for explainability to automatically learn financial lexicons. Our study presents four main contributions. Firstly, we demonstrate that transformer-aided explainable lexicons can enhance the vocabulary coverage of the benchmark Loughran-McDonald (LM) lexicon. This enhancement leads to a significant reduction in the need for human involvement in the process of annotating, maintaining, and updating the lexicons. Secondly, we show that the resulting lexicon outperforms the standard LM lexicon in sentiment analysis of financial datasets. Our experiments show that XLex outperforms LM when applied to general financial texts, resulting in enhanced word coverage and an overall increase in classification accuracy by 0.431. Furthermore, by employing XLex to extend LM, we create a combined dictionary, XLex+LM, which achieves an even higher accuracy improvement of 0.450. Thirdly, we illustrate that the lexicon-based approach is significantly more efficient in terms of model speed and size compared to transformers. Lastly, the proposed XLex approach is inherently more interpretable than transformer models. This interpretability is advantageous as lexicon models rely on predefined rules, unlike transformers, which have complex inner workings. The interpretability of the models allows for better understanding and insights into the results of sentiment analysis, making the XLex approach a valuable tool for financial decision-making.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Review of Natural Language Processing in Pharmacology
    (American Society for Pharmacology & Experimental Therapeutics (ASPET), 2023-07)
    ;
    Trajkovski, Vangel
    ;
    Dimitrieva, Makedonka
    ;
    Dobreva, Jovana
    ;
    Natural language processing (NLP) is an area of artificial intelligence that applies information technologies to process the human language, understand it to a certain degree, and use it in various applications. This area has rapidly developed in the past few years and now employs modern variants of deep neural networks to extract relevant patterns from large text corpora. The main objective of this work is to survey the recent use of NLP in the field of pharmacology. As our work shows, NLP is a highly relevant information extraction and processing approach for pharmacology. It has been used extensively, from intelligent searches through thousands of medical documents to finding traces of adversarial drug interactions in social media. We split our coverage into five categories to survey modern NLP: methodology, commonly addressed tasks, relevant textual data, knowledge bases, and useful programming libraries. We split each of the five categories into appropriate subcategories, describe their main properties and ideas, and summarize them in a tabular form. The resulting survey presents a comprehensive overview of the area, useful to practitioners and interested observers. SIGNIFICANCE STATEMENT: The main objective of this work is to survey the recent use of NLP in the field of pharmacology in order to provide a comprehensive overview of the current state in the area after the rapid developments that occurred in the past few years. The resulting survey will be useful to practitioners and interested observers in the domain.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Learning Robust Food Ontology Alignment
    (IEEE, 2022-12-17)
    Mijalcheva, Viktorija
    ;
    Davcheva, Ana
    ;
    ;
    ;
    In today’s knowledge society, large number of information systems use many different individual schemes to represent data. Ontologies are a promising approach for formal knowledge representation and their number is growing rapidly. The semantic linking of these ontologies is a necessary prerequisite for establishing interoperability between the large number of services that structure the data with these ontologies. Consequently, the alignment of ontologies becomes a central issue when building a worldwide Semantic Web. There is a need to develop automatic or at least semi-automatic techniques to reduce the burden of manually creating and maintaining alignments. Ontologies are seen as a solution to data heterogeneity on the Web. However, the available ontologies are themselves a source of heterogeneity. On the Web, there are multiple ontologies that refer to the same domain, and with that comes the challenge of a given graph-based system using multiple ontologies whose taxonomy is different, but the semantics are the same. This can be overcome by aligning the ontologies or by finding the correspondence between their components.In this paper, we propose a method for indexing ontologies as a support to a solution for ontology alignment based on a neural network. In this process, for each semantic resource we combine the graph based representations from the RDF2vec model, together with the text representation from the BERT model in order to capture the semantic and structural features. This methodology is evaluated using the FoodOn and OntoFood ontologies, based on the Food Onto Map alignment dataset, which contains 155 unique and validly aligned resources. Using these limited resources, we managed to obtain accuracy of 74% and F1 score of 75% on the test set, which is a promising result that can be further improved in future. Furthermore, the methodology presented in this paper is both robust and ontology-agnostic. It can be applied to any ontology, regardless of the domain.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    PharmKE: Knowledge Extraction Platform for Pharmaceutical Texts Using Transfer Learning
    (MDPI, 2023-01-09)
    Jofche, Nasi
    ;
    ;
    ;
    ;
    Even though named entity recognition (NER) has seen tremendous development in recent years, some domain-specific use-cases still require tagging of unique entities, which is not well handled by pre-trained models. Solutions based on enhancing pre-trained models or creating new ones are efficient, but creating reliable labeled training for them to learn on is still challenging. In this paper, we introduce PharmKE, a text analysis platform tailored to the pharmaceutical industry that uses deep learning at several stages to perform an in-depth semantic analysis of relevant publications. The proposed methodology is used to produce reliably labeled datasets leveraging cutting-edge transfer learning, which are later used to train models for specific entity labeling tasks. By building models for the well-known text-processing libraries spaCy and AllenNLP, this technique is used to find Pharmaceutical Organizations and Drugs in texts from the pharmaceutical domain. The PharmKE platform also incorporates the NER findings to resolve co-references of entities and examine the semantic linkages in each phrase, creating a foundation for further text analysis tasks, such as fact extraction and question answering. Additionally, the knowledge graph created by DBpedia Spotlight for a specific pharmaceutical text is expanded using the identified entities. The obtained results with the proposed methodology result in about a 96% F1-score on the NER tasks, which is up to 2% better than those of the fine-tuned BERT and BioBERT models developed using the same dataset. The ultimate benefits of the platform are that pharmaceutical domain specialists may more easily identify the knowledge extracted from the input texts thanks to the platform’s visualization of the model findings. Likewise, the proposed techniques can be integrated into mobile and pervasive systems to give patients more relevant and comprehensive information from scanned medication guides. Similarly, it can provide preliminary insights to patients and even medical personnel on whether a drug from a different vendor is compatible with the patient’s prescription medication.