Now showing 1 - 10 of 36
  • Some of the metrics are blocked by your 
    Item type:Publication,
    PharmKE: Knowledge Extraction Platform for Pharmaceutical Texts using Transfer Learning
    (2021-02-25)
    Jofche, Nasi
    ;
    ;
    ;
    ;
    The challenge of recognizing named entities in a given text has been a very dynamic field in recent years. This is due to the advances in neural network architectures, increase of computing power and the availability of diverse labeled datasets, which deliver pre-trained, highly accurate models. These tasks are generally focused on tagging common entities, but domain-specific use-cases require tagging custom entities which are not part of the pre-trained models. This can be solved by either fine-tuning the pre-trained models, or by training custom models. The main challenge lies in obtaining reliable labeled training and test datasets, and manual labeling would be a highly tedious task. In this paper we present PharmKE, a text analysis platform focused on the pharmaceutical domain, which applies deep learning through several stages for thorough semantic analysis of pharmaceutical articles. It performs text classification using state-of-the-art transfer learning models, and thoroughly integrates the results obtained through a proposed methodology. The methodology is used to create accurately labeled training and test datasets, which are then used to train models for custom entity labeling tasks, centered on the pharmaceutical domain. The obtained results are compared to the fine-tuned BERT and BioBERT models trained on the same dataset. Additionally, the PharmKE platform integrates the results obtained from named entity recognition tasks to resolve co-references of entities and analyze the semantic relations in every sentence, thus setting up a baseline for additional text analysis tasks, such as question answering and fact extraction. The recognized entities are also used to expand the knowledge graph generated by DBpedia Spotlight for a given pharmaceutical text.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Performance Evaluation of Word and Sentence Embeddings for Finance Headlines Sentiment Analysis
    (Springer International Publishing, 2019)
    ;
    Gjorgjevikj, Ana
    ;
    ;
    ;
    Vodenska, Irena
  • Some of the metrics are blocked by your 
    Item type:Publication,
    PharmKE: Knowledge Extraction Platform for Pharmaceutical Texts Using Transfer Learning
    (MDPI, 2023-01-09)
    Jofche, Nasi
    ;
    ;
    ;
    ;
    Even though named entity recognition (NER) has seen tremendous development in recent years, some domain-specific use-cases still require tagging of unique entities, which is not well handled by pre-trained models. Solutions based on enhancing pre-trained models or creating new ones are efficient, but creating reliable labeled training for them to learn on is still challenging. In this paper, we introduce PharmKE, a text analysis platform tailored to the pharmaceutical industry that uses deep learning at several stages to perform an in-depth semantic analysis of relevant publications. The proposed methodology is used to produce reliably labeled datasets leveraging cutting-edge transfer learning, which are later used to train models for specific entity labeling tasks. By building models for the well-known text-processing libraries spaCy and AllenNLP, this technique is used to find Pharmaceutical Organizations and Drugs in texts from the pharmaceutical domain. The PharmKE platform also incorporates the NER findings to resolve co-references of entities and examine the semantic linkages in each phrase, creating a foundation for further text analysis tasks, such as fact extraction and question answering. Additionally, the knowledge graph created by DBpedia Spotlight for a specific pharmaceutical text is expanded using the identified entities. The obtained results with the proposed methodology result in about a 96% F1-score on the NER tasks, which is up to 2% better than those of the fine-tuned BERT and BioBERT models developed using the same dataset. The ultimate benefits of the platform are that pharmaceutical domain specialists may more easily identify the knowledge extracted from the input texts thanks to the platform’s visualization of the model findings. Likewise, the proposed techniques can be integrated into mobile and pervasive systems to give patients more relevant and comprehensive information from scanned medication guides. Similarly, it can provide preliminary insights to patients and even medical personnel on whether a drug from a different vendor is compatible with the patient’s prescription medication.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    ISO-standardized smart city platform architecture and dashboard
    (IEEE, 2017-03-31)
    ;
    Kocarev, Ljupco
    ;
    ;
    A concept guided by the ISO 37120 standard for city services and quality of life is suggested as unified framework for smart city dashboards. The slow (annual, quarterly, or monthly) ISO 37120 indicators are enhanced and complemented with more detailed and person-centric indicators that can further accelerate the transition toward smart cities. The architecture supports three tasks: acquire and manage data from heterogeneous sensors; process data originated from heterogeneous sources (sensors, OpenData, social data, blogs, news, and so on); and implement such collection and processing on the cloud. A prototype application based on the proposed architecture concept is developed for the city of Skopje, Macedonia. This article is part of a special issue on smart cities.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Publishing Skopje Air Quality Data as Linked Data
    (Faculty of Computer Science and Engineering, Skopje, 2015-04)
    Jovanovik, Milos
    ;
    ;
    Kjosevski, Angjel
    ;
    Kalemdzhievski, Nikola
    ;
    Koteli, Nikola
    — Publishing raw data as Linked Open Data gives an opportunity of data reusability and data understandability for the computer machines. Today, the air pollution problem is one of the biggest in the whole world. Republic of Macedonia, especially its capital Skopje, has big problems with the PM2.5 and PM10 particles in the air approved by several measurement stations positioned on several locations in Skopje. In this paper, we demonstrate the process of centralizing of all the data collected from different measurement stations in one database. Also, we enable interpolation of collected data providing information about the current air quality state in the area between the measurement stations using previously implemented eco models. Interpolated data is saved in the same database providing interfaces that transform saved data into four-star and five-star data, by reusing the existing ontologies from the domain and linking them to the physical places where the measurements were taken and the interpolations were calculated. As a use case scenario, we provide and heat map about the values from various pollutants in the areas in Skopje providing information about the regions that have problems with air pollution.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex)
    (2023-06-06)
    Rizinski, Maryan
    ;
    Peshov, Hristijan
    ;
    ;
    Jovanovik, Milos
    ;
    Lexicon-based sentiment analysis in finance leverages specialized, manually annotated lexicons created by human experts to effectively extract sentiment from financial texts. Although lexiconbased methods are simple to implement and fast to operate on textual data, they require considerable manual annotation efforts to create, maintain, and update the lexicons. These methods are also considered inferior to the deep learning-based approaches, such as transformer models, which have become dominant in various natural language processing (NLP) tasks due to their remarkable performance. However, their efficacy comes at a cost: these models require extensive data and computational resources for both training and testing. Additionally, they involve significant prediction times, making them unsuitable for real-time production environments or systems with limited processing capabilities. In this paper, we introduce a novel methodology named eXplainable Lexicons (XLex) that combines the advantages of both lexicon-based methods and transformer models. We propose an approach that utilizes transformers and SHapley Additive exPlanations (SHAP) for explainability to automatically learn financial lexicons. Our study presents four main contributions. Firstly, we demonstrate that transformer-aided explainable lexicons can enhance the vocabulary coverage of the benchmark Loughran-McDonald (LM) lexicon. This enhancement leads to a significant reduction in the need for human involvement in the process of annotating, maintaining, and updating the lexicons. Secondly, we show that the resulting lexicon outperforms the standard LM lexicon in sentiment analysis of financial datasets. Thirdly, we illustrate that the lexicon-based approach is significantly more efficient in terms of model speed and size compared to transformers. Lastly, the proposed XLex approach is inherently more interpretable than transformer models. This interpretability is advantageous as lexicon models rely on predefined rules, unlike transformers, which have complex inner workings. The interpretability of the models allows for better understanding and insights into the results of sentiment analysis, making the XLex approach a valuable tool for financial decision-making.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Evaluation of Sentiment Analysis in Finance: From Lexicons to Transformers
    (Institute of Electrical and Electronics Engineers (IEEE), 2020-06)
    ;
    Gjorgjevikj, Ana
    ;
    Vodenska, Irena
    ;
    Chitkushev, Lubomir T.
    ;
    Financial and economic news is continuously monitored by financial market participants. According to the efficient market hypothesis, all past information is reflected in stock prices and new information is instantaneously absorbed in determining future stock prices. Hence, prompt extraction of positive or negative sentiments from news is very important for investment decision-making by traders, portfolio managers and investors. Sentiment analysis models can provide an efficient method for extracting actionable signals from the news. However, financial sentiment analysis is challenging due to domain-specific language and unavailability of large labeled datasets. General sentiment analysis models are ineffective when applied to specific domains such as finance. To overcome these challenges, we design an evaluation platform which we use to assess the effectiveness and performance of various sentiment analysis approaches, based on combinations of text representation methods and machine-learning classifiers. We perform more than one hundred experiments using publicly available datasets, labeled by financial experts. We start the evaluation with specific lexicons for sentiment analysis in finance and gradually build the study to include word and sentence encoders, up to the latest available NLP transformers. The results show improved efficiency of contextual embeddings in sentiment analysis compared to lexicons and fixed word and sentence encoders, even when large datasets are not available. Furthermore, distilled versions of NLP transformers produce comparable results to their larger teacher models, which makes them suitable for use in production environments.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Ethically Responsible Machine Learning in Fintech
    (IEEE, 2022-08-29)
    Rizinski, Maryan
    ;
    Peshov, Hristijan
    ;
    ;
    Chitkushev, Ljubomir
    ;
    Vodenska, Irena
    Rapid technological developments in the last decade have contributed to using machine learning (ML) in various economic sectors. Financial institutions have embraced technology and have applied ML algorithms in trading, portfolio management, and investment advising. Large-scale automation capabilities and cost savings make the ML algorithms attractive for personal and corporate finance applications. Using ML applications in finance raises ethical issues that need to be carefully examined. We engage a group of experts in finance and ethics to evaluate the relationship between ethical principles of finance and ML. The paper compares the experts’ findings with the results obtained using natural language processing (NLP) transformer models, given their ability to capture the semantic text similarity. The results reveal that the finance principles of integrity and fairness have the most significant relationships with ML ethics. The study includes a use case with SHapley Additive exPlanations (SHAP) and Microsoft Responsible AI Widgets explainability tools for error analysis and visualization of ML models. It analyzes credit card approval data and demonstrates that the explainability tools can address ethical issues in fintech, and improve transparency, thereby increasing the overall trustworthiness of ML models. The results show that both humans and machines could err in approving credit card requests despite using their best judgment based on the available information. Hence, human-machine collaboration could contribute to improved decision-making in finance. We propose a conceptual framework for addressing ethical challenges in fintech such as bias, discrimination, differential pricing, conflict of interest, and data protection.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Validation of language agnostic models for discourse marker detection
    (2023)
    Damova, Mariana
    ;
    ;
    Valunaite Oleskeviciene, Giedre
    ;
    Liebeskind, Chaya
    ;
    da Purificação Silvano, Maria
    Using language models to detect or predict the presence of language phenomena in the text has become a mainstream research topic. With the rise of generative models, experiments using deep learning and transformer models trigger intense interest. Aspects like precision of predictions, portability to other languages or phenomena, scale have been central to the research community. Discourse markers, as language phenomena, perform important functions, such as signposting, signalling, and rephrasing, by facilitating discourse organization. Our paper is about discourse markers detection, a complex task as it pertains to a language phenomenon manifested by expressions that can occur as content words in some contexts and as discourse markers in others. We have adopted language agnostic model trained in English to predict the discourse marker presence in texts in 8 other unseen by the model languages with the goal to evaluate how well the model performs in different structure and lexical properties languages. We report on the process of evaluation and validation of the model's performance across European Portuguese, Hebrew, German, Polish, Romanian, Bulgarian, Macedonian, and Lithuanian and about the results of this validation. This research is a key step towards multilingual language processing.
  • Some of the metrics are blocked by your 
    Item type:Publication,
    Sentiment Analysis in Finance: From Transformers Back to eXplainable Lexicons (XLex)
    (Institute of Electrical and Electronics Engineers (IEEE), 2024-01)
    Rizinski, Maryan
    ;
    Peshov, Hristijan
    ;
    ;
    ;
    Lexicon-based sentiment analysis in finance leverages specialized, manually annotated lexicons created by human experts to extract sentiment from financial texts effectively. Although lexicon-based methods are simple to implement and fast to operate on textual data, they require considerable manual annotation efforts to create, maintain, and update the lexicons. These methods are also considered inferior to the deep learning-based approaches, such as transformer models, which have become dominant in various natural language processing (NLP) tasks due to their remarkable performance. However, their efficacy comes at a cost: these models require extensive data and computational resources for both training and testing. Additionally, they involve significant prediction times, making them unsuitable for real-time production environments or systems with limited processing capabilities. In this paper, we introduce a novel methodology named eXplainable Lexicons (XLex) that combines the advantages of both lexicon-based methods and transformer models. We propose an approach that utilizes transformers and SHapley Additive exPlanations (SHAP) for explainability to automatically learn financial lexicons. Our study presents four main contributions. Firstly, we demonstrate that transformer-aided explainable lexicons can enhance the vocabulary coverage of the benchmark Loughran-McDonald (LM) lexicon. This enhancement leads to a significant reduction in the need for human involvement in the process of annotating, maintaining, and updating the lexicons. Secondly, we show that the resulting lexicon outperforms the standard LM lexicon in sentiment analysis of financial datasets. Our experiments show that XLex outperforms LM when applied to general financial texts, resulting in enhanced word coverage and an overall increase in classification accuracy by 0.431. Furthermore, by employing XLex to extend LM, we create a combined dictionary, XLex+LM, which achieves an even higher accuracy improvement of 0.450. Thirdly, we illustrate that the lexicon-based approach is significantly more efficient in terms of model speed and size compared to transformers. Lastly, the proposed XLex approach is inherently more interpretable than transformer models. This interpretability is advantageous as lexicon models rely on predefined rules, unlike transformers, which have complex inner workings. The interpretability of the models allows for better understanding and insights into the results of sentiment analysis, making the XLex approach a valuable tool for financial decision-making.