Faculty of Computer Science and Engineering
Permanent URI for this communityhttps://repository.ukim.mk/handle/20.500.12188/5
The Faculty of Computer Science and Engineering (FCSE) within UKIM is the largest and most prestigious faculty in the field of computer science and technologies in Macedonia, and among the largest
faculties in that field in the region.
The FCSE teaching staff consists of 50 professors and 30 associates. These include many “best in field” personnel, such as the most referenced scientists in Macedonia and the most influential professors in the ICT industry in the Republic of Macedonia.
Browse
6 results
Search Results
- Some of the metrics are blocked by yourconsent settings
Item type:Publication, Evaluating a Nationally Localized AI Chatbot for Personalized Primary Care Guidance: Insights from the HomeDOCtor Deployment in Slovenia(MDPI AG, 2025-07-29) ;Gams, Matjaž ;Horvat, Tadej ;Kolar, Žiga ;Kocuvan, PrimožBackground/Objectives: The demand for accessible and reliable digital health services has increased significantly in recent years, particularly in regions facing physician shortages. HomeDOCtor, a conversational AI platform developed in Slovenia, addresses this need with a nationally adapted architecture that combines retrieval-augmented generation (RAG) and a Redis-based vector database of curated medical guidelines. The objective of this study was to assess the performance and impact of HomeDOCtor in providing AI-powered healthcare assistance. Methods: HomeDOCtor is designed for human-centered communication and clinical relevance, supporting multilingual and multimedia citizen inputs while being available 24/7. It was tested using a set of 100 international clinical vignettes and 150 internal medicine exam questions from the University of Ljubljana to validate its clinical performance. Results: During its six-month nationwide deployment, HomeDOCtor received overwhelmingly positive user feedback with minimal criticism, and exceeded initial expectations, especially in light of widespread media narratives warning about the risks of AI. HomeDOCtor autonomously delivered localized, evidence-based guidance, including self-care instructions and referral suggestions, with average response times under three seconds. On international benchmarks, the system achieved ≥95% Top-1 diagnostic accuracy, comparable to leading medical AI platforms, and significantly outperformed stand-alone ChatGPT-4o in the national context (90.7% vs. 80.7%, p = 0.0135). Conclusions: Practically, HomeDOCtor eases the burden on healthcare professionals by providing citizens with 24/7 autonomous, personalized triage and self-care guidance for less complex medical issues, ensuring that these cases are self-managed efficiently. The system also identifies more serious cases that might otherwise be neglected, directing them to professionals for appropriate care. Theoretically, HomeDOCtor demonstrates that domain-specific, nationally adapted large language models can outperform general-purpose models. Methodologically, it offers a framework for integrating GDPR-compliant AI solutions in healthcare. These findings emphasize the value of localization in conversational AI and telemedicine solutions across diverse national contexts. - Some of the metrics are blocked by yourconsent settings
Item type:Publication, Exploring the Potential of Topological Data Analysis for Explainable Large Language Models: A Scoping Review(Zenodo, 2026) ;Sekuloski, Petar ;Kitanovski, Dimitar; ; Large language models (LLMs) have become central to modern artificial intelligence, yet their internal decision-making processes remain difficult to interpret. As interest grows in making these models more transparent and reliable, topological data analysis (TDA) has emerged as a promising mathematical approach for exploring their structure. This scoping review maps the current landscape of research where TDA tools—such as persistent homology and Mapper—are used to examine LLM components like attention patterns, latent representations, and training dynamics. By analyzing topological features across layers and tasks, these methods provide new ways to understand how language models generalize, respond to unfamiliar inputs, and shift under fine-tuning. The review also considers how TDA-based techniques contribute to broader goals in interpretability and robustness, especially in detecting hallucinations, out-of-distribution behavior, and representational collapse. Overall, the findings suggest that TDA offers a rigorous and versatile framework for studying LLMs, helping researchers uncover deeper patterns in how these models learn and reason. - Some of the metrics are blocked by yourconsent settings
Item type:Publication, Enhancing LLMs with LoRA Fine-Tuning Using Medical Data and Knowledge Graph Enrichment for Improved Healthcare Outcomes(IEEE, 2025-06-02) ;Jankov, A.; This research paper investigates the enhancement of large language models (LLMs) within the medical domain, focusing on members of the Llama family of LLMs. While LLMs have demonstrated remarkable success across various general-purpose natural language processing tasks, their application in specialized domains like medicine is often hindered by limited training on domain-specific data, resulting in suboptimal accuracy and contextual relevance. To address these limitations, this research employs low-rank adaptation (LoRA) to fine-tune LLMs on real-world patientphysician dialogues, effectively capturing the intricacies of medical discourse. Additionally, the knowledge of the LLM is enriched with the SPOKE knowledge graph, a structured repository of medical domain information, allowing the model to generate outputs that are both contextually and scientifically grounded. The experimental results underscore the transformative impact of this dual approach, demonstrating significant advancements in tasks such as automatic diagnosis generation and personalized drug recommendation. However, this research should be viewed as an exploratory proof of concept. Significant limitations, including the constrained evaluation scope and the critical need for expert clinical validation and thorough ethical review, must be addressed in future work before considering real-world applicability. - Some of the metrics are blocked by yourconsent settings
Item type:Publication, Exploring Large Language Models for Data Augmentation: A Case Study for Text Style Transfer(IEEE, 2025-06-02); ; Text style transfer is the task that involves modifying a sentence to adapt to a desired target style while preserving its original meaning. It often requires high-quality parallel datasets that are not always available. This paper explores data augmentation techniques for text style transfer, leveraging large language models (LLMs) to address the challenge of dataset scarcity. Our approach generates synthetic parallel data by prompting LLMs to paraphrase and/or rewrite sentences in diverse styles, enabling the creation of larger and more varied datasets. We demonstrate the applicability of this approach across three tasks: formality transfer with the GYAFC dataset, sentiment transfer with the Yelp dataset, and personal style transfer with the Shakespeare dataset. This work introduces an approach to enhance dataset availability, aiming to foster further research in the field and support a broader application of LLMs. The experiments were performed only with English language datasets. - Some of the metrics are blocked by yourconsent settings
Item type:Publication, Preserving Macedonian Culinary Heritage: Fine-Tuning a Large Language Model for Recipe Generation in a Low-Resource Language(IEEE, 2025-12-08) ;Peshevski, Dimitar ;Sasanski, Darko; We introduce the first fine-tuned large language model for recipe instruction generation in Macedonian. Building on VezilkaLLM-Instruct, a 4-billion parameter model, we fine-tune it using a curated dataset of 36,000 recipes with detailed cooking instructions. Our key contributions include: (1) the development of a domain-adapted language model for a low-resource language; (2) the demonstration that relatively small LLMs can be effectively adapted to specialized culinary tasks; and (3) the proposal of a dual evaluation framework that combines semantic similarity and verb overlap analyses to assess both content generalization and procedural accuracy. Fine-tuning results in a mean cosine similarity of 0.90 and significantly increases the overlap of domain-specific cooking verbs, indicating improved generation quality. These results highlight the potential of targeted fine-tuning approaches for domain-specific applications in underrepresented languages and provide a foundation for further research in computational culinary heritage. - Some of the metrics are blocked by yourconsent settings
Item type:Publication, Advancing AI in Higher Education: A Comparative Study of Large Language Model-Based Agents for Exam Question Generation, Improvement, and Evaluation(MDPI AG, 2025-03-04) ;Nikolovski, Vlatko; The transformative capabilities of large language models (LLMs) are reshaping educational assessment and question design in higher education. This study proposes a systematic framework for leveraging LLMs to enhance question-centric tasks: aligning exam questions with course objectives, improving clarity and difficulty, and generating new items guided by learning goals. The research spans four university courses—two theory-focused and two application-focused—covering diverse cognitive levels according to Bloom’s taxonomy. A balanced dataset ensures representation of question categories and structures. Three LLM-based agents—VectorRAG, VectorGraphRAG, and a fine-tuned LLM—are developed and evaluated against a meta-evaluator, supervised by human experts, to assess alignment accuracy and explanation quality. Robust analytical methods, including mixed-effects modeling, yield actionable insights for integrating generative AI into university assessment processes. Beyond exam-specific applications, this methodology provides a foundational approach for the broader adoption of AI in post-secondary education, emphasizing fairness, contextual relevance, and collaboration. The findings offer a comprehensive framework for aligning AI-generated content with learning objectives, detailing effective integration strategies, and addressing challenges such as bias and contextual limitations. Overall, this work underscores the potential of generative AI to enhance educational assessment while identifying pathways for responsible implementation.
